Edge Computing Container Architectures: Docker vs. Kubernetes for Real-Time Robotics¶
Under the Hood — How containers orchestrate closed-loop control systems at the edge: namespace isolation, network routing, scheduling latency, and MPC feedback timing across ROS message buses.
1. The Edge Computing Compute Model¶
Edge computing positions compute resources physically adjacent to sensors and actuators — eliminating the round-trip latency to cloud datacenters that makes real-time closed-loop control impossible. For Unmanned Aerial Vehicles (UAVs) running Model Predictive Control (MPC), every millisecond in the control loop matters: the sum of robot→edge + MPC_exec + edge→robot latencies directly affects trajectory tracking stability.
flowchart LR
subgraph UAV["UAV (Robot Side)"]
SENSOR["IMU + GPS\nSensor Fusion"]
ODOM["Odometry\nPublisher\n/odometry topic"]
CTRL["Attitude\nController\n(low-level)"]
ACTUATOR["Motor ESCs\nThrust + Roll + Pitch"]
end
subgraph EDGE["Edge Node (Container Side)"]
MPC["MPC Node\nOptimization Engine\n100Hz, N=100"]
ROSMASTER["ROS Master\nTopic Registry"]
end
SENSOR --> ODOM
ODOM -- "x(k), TCP WiFi\nd1=9-14ms" --> MPC
MPC -- "u(k), TCP WiFi\nd3=13-18ms" --> CTRL
CTRL --> ACTUATOR
ROSMASTER -.->|"register\nnodes"| MPC
ROSMASTER -.->|"register\nnodes"| ODOM
style EDGE fill:#1a3a5c,color:#fff
style UAV fill:#2d4a1e,color:#fff
Round-trip timing formula:
T_rtt = T_robot→edge + T_mpc_exec + T_edge→robot
Docker RTT: 14.2ms + 16.1ms + 17.6ms = ~47.9ms
Kubernetes RTT: 9.5ms + 16.9ms + 13.1ms = ~39.5ms
2. Container Isolation Architecture: What Actually Gets Isolated¶
Both Docker and Kubernetes use the same Linux kernel primitives for container isolation. Understanding which namespaces exist and which are shared is critical for ROS networking.
flowchart TD
subgraph HOST["Host Kernel"]
PID_NS["PID Namespace\nProcess tree isolation\nContainers see PID 1 = entrypoint"]
MNT_NS["Mount Namespace\nOverlayFS filesystem view\nper-container /proc /sys"]
UTS_NS["UTS Namespace\nhostname + domainname"]
IPC_NS["IPC Namespace\nShared memory, semaphores\nisolated per container"]
NET_NS["NET Namespace\nVirtual network interface\nveth pair, private IP"]
USER_NS["User Namespace\nUID/GID mapping\nroot-in-container ≠ root-on-host"]
CGROUP["cgroups v2\nCPU, Memory, I/O\nThrottling + Accounting"]
end
subgraph ROS_ISSUE["ROS Networking Problem"]
PRIVATE_IP["Container gets\nprivate subnet IP\ne.g. 172.17.0.2"]
ROS_COMM["ROS communicates\nvia random TCP ports\nbetween nodes"]
MISMATCH["Port mismatch:\ncontainer registers\ncontainer IP, not host IP"]
end
NET_NS --> PRIVATE_IP
PRIVATE_IP --> MISMATCH
ROS_COMM --> MISMATCH
style HOST fill:#1a1a2e,color:#fff
style ROS_ISSUE fill:#4a1a1a,color:#fff
The --network=host Fix: Bypassing NET Namespace¶
flowchart LR
subgraph DEFAULT["Default Network Mode"]
C1["Container\nveth0: 172.17.0.2\nports: random"]
BRIDGE["docker0 bridge\n172.17.0.1"]
HOST_IF["Host interface\neth0: 192.168.1.10"]
C1 -->|"veth pair"| BRIDGE --> HOST_IF
end
subgraph HOST_NET["--network=host Mode"]
C2["Container\nshares host NET namespace\nth0: 192.168.1.10\nports: directly on host"]
HOST_IF2["Host interface\neth0: 192.168.1.10"]
C2 -.->|"same namespace\nno veth"| HOST_IF2
end
style DEFAULT fill:#3a2a1a,color:#fff
style HOST_NET fill:#1a3a1a,color:#fff
With --network=host, the container's ROS master registers itself on the host's actual IP (e.g., 192.168.1.10) instead of a private Docker subnet IP, enabling cross-device ROS topic subscription to work correctly over WiFi.
3. Docker Single-Node Edge Architecture: Internal Data Flow¶
sequenceDiagram
participant UAV as UAV (Gazebo/ROS)
participant ROSCORE as roscore Container
participant MPC_C as MPC Container
participant KERNEL as Linux Kernel
UAV->>ROSCORE: TCP connect to ROS_MASTER_URI
UAV->>ROSCORE: Register /odometry publisher
MPC_C->>ROSCORE: Register /odometry subscriber
ROSCORE-->>MPC_C: Return UAV's TCP endpoint
loop 100Hz MPC Control Loop
UAV->>MPC_C: Publish x(k): pos + vel + quaternion
Note over MPC_C: Solve QP optimization<br/>horizon N=100 steps<br/>exec time ~16ms
MPC_C->>UAV: Publish u(k): thrust, φd, θd
Note over UAV: Attitude controller<br/>executes motor commands
end
MPC_C->>KERNEL: CPU user-space: 9.2%<br/>kernel-space: 0.8%
Container Process Hierarchy (Docker)¶
flowchart TD
subgraph HOST["Host OS (Ubuntu 20.04)"]
DOCKERD["dockerd\nContainer daemon\nHTTP REST API"]
CONTAINERD["containerd\nContainer lifecycle\nImage pulls, snapshots"]
RUNC["runc\nOCI runtime\nclone() + exec()"]
end
subgraph C1["roscore Container"]
INIT1["PID 1: entrypoint.sh"]
ROSCORE["roscore\nROS Master\nxmlrpc port :11311"]
end
subgraph C2["MPC Container"]
INIT2["PID 1: entrypoint.sh"]
MPCNODE["mpc_node\nC++ ROS node\nEigen + OSQP solver"]
LIBOSQP["libosqp.so\nQuadratic Program Solver\nADMM algorithm"]
end
DOCKERD --> CONTAINERD --> RUNC
RUNC -->|"clone(CLONE_NEWPID|CLONE_NEWNS\n|CLONE_NEWNET)"| C1
RUNC -->|"clone()"| C2
INIT1 --> ROSCORE
INIT2 --> MPCNODE --> LIBOSQP
style HOST fill:#1a1a2e,color:#fff
4. Kubernetes Multi-Node Edge Architecture: Control Plane Data Flow¶
flowchart TD
subgraph CONTROL_PLANE["K8s Control Plane (Master Node)"]
API["kube-apiserver\nREST + Watch API\nValidation + Admission"]
ETCD["etcd\nRaft consensus KV\nCluster state store"]
SCHED["kube-scheduler\nPredicate filter +\nPriority scoring"]
CM["controller-manager\nReplicaSet + Deployment\nReconciliation loops"]
end
subgraph WORKER["Worker Node (Edge Machine)"]
KUBELET["kubelet\nCRI gRPC interface\nPod lifecycle manager"]
CRI["containerd (CRI)\nImage pull + unpack\nContainer creation"]
RUNC2["runc\nOCI runtime"]
subgraph POD1["Pod: roscore"]
ROSCORE2["roscore container\nhost network"]
end
subgraph POD2["Pod: mpc-controller"]
MPC2["mpc_node container\nhost network"]
end
KUBEPROXY["kube-proxy\niptables DNAT rules\nService VIP→Pod IP"]
SERVICE["Service: ClusterIP\nVirtual IP for pod discovery"]
end
API --> ETCD
API -->|"Watch"| CM
CM -->|"Schedule pod"| SCHED
SCHED -->|"Node binding"| API
API -->|"Pod spec"| KUBELET
KUBELET -->|"RunPodSandbox\nCreateContainer"| CRI --> RUNC2
RUNC2 --> POD1
RUNC2 --> POD2
KUBEPROXY --> SERVICE
SERVICE -->|"routes to"| POD1
SERVICE -->|"routes to"| POD2
style CONTROL_PLANE fill:#1a2a3a,color:#fff
style WORKER fill:#1a3a2a,color:#fff
Kubernetes Pod Network Isolation vs. Host Network¶
flowchart LR
subgraph DEFAULT_POD["Default Pod Networking"]
POD["Pod\ncni0: 10.244.0.5/24\nVeth pair to node bridge"]
CNI_BRIDGE["cni0 bridge\n10.244.0.1/24"]
VETH["veth pair\nOne end in pod netns\nOne end on host"]
POD <--> VETH <--> CNI_BRIDGE
end
subgraph HOST_NET_POD["hostNetwork: true"]
POD2["Pod\nShares host net namespace\neth0: 192.168.1.10\nSees all host ports"]
HOST_NIC["Host NIC\neth0: 192.168.1.10"]
POD2 -.->|"same netns"| HOST_NIC
end
NOTE["ROS requires hostNetwork: true\nfor cross-device topic pub/sub\nvia actual network IP"]
style DEFAULT_POD fill:#3a1a1a,color:#fff
style HOST_NET_POD fill:#1a3a1a,color:#fff
5. MPC Optimization: What the Compute Offload Actually Does¶
The MPC solve is a quadratic program (QP) solved at 100Hz. This is what gets offloaded to the edge container.
flowchart TD
subgraph MPC_INTERNALS["MPC Compute Flow (16ms budget at 100Hz)"]
STATE["x(k) = [p, v, φ, θ]ᵀ\nReceived from UAV\nwith delay d1"]
PREDICT["State Prediction\nForward Euler integration\nN=100 steps"]
COST["Cost Function Assembly\nJ = Σ[state_cost + input_cost + smoothness_cost]\nFor j=1..N"]
QP["QP Solver (OSQP/ADMM)\nFind u* minimizing J\nSubject to: actuator limits"]
OUTPUT["u(k) = [T, φd, θd]ᵀ\nFirst control action\nReceding horizon"]
STATE --> PREDICT --> COST --> QP --> OUTPUT
end
subgraph COST_TERMS["Cost Function Terms"]
T1["State Cost\n(xd - xk+j|k)ᵀ Qx (xd - xk+j|k)\nDeviaton from desired trajectory"]
T2["Input Cost\n(ud - uk+j|k)ᵀ Qu (ud - uk+j|k)\nHovering: ud = [g, 0, 0]"]
T3["Smoothness Cost\n(δu)ᵀ Qδu (δu)\nInput rate penalty"]
end
COST -.-> T1 & T2 & T3
style MPC_INTERNALS fill:#1a1a3a,color:#fff
UAV Kinematic Model in Memory¶
flowchart LR
subgraph STATE_VECTOR["State Vector x ∈ ℝ¹²"]
P["p = [px, py, pz]ᵀ\nGlobal position"]
V["v = [vx, vy, vz]ᵀ\nLinear velocity"]
ATT["φ, θ (roll, pitch)\nEuler angles"]
Q["quaternion\n[qw, qx, qy, qz]"]
end
subgraph CONTROL["Control Input u ∈ ℝ³"]
T["T: total thrust\n≥ 0"]
PHI["φd: desired roll"]
THETA["θd: desired pitch"]
end
subgraph DYNAMICS["Dynamics (Forward Euler)"]
ACC["v̇ = R(φ,θ)·[0,0,T]ᵀ + [Ax,Ay,Az]·v - [0,0,g]"]
ATT_DYN["φ̇ = (Kφ·φref - φ) / τφ\nθ̇ = (Kθ·θref - θ) / τθ"]
end
STATE_VECTOR --> DYNAMICS
CONTROL --> DYNAMICS
6. Docker vs. Kubernetes: Resource Overhead Under the Hood¶
block-beta
columns 3
A["Metric"]:1 B["Docker Standalone"]:1 C["Kubernetes"]:1
D["CPU user-space"]:1 E["9.2%"]:1 F["18.8%"]:1
G["CPU kernel-space"]:1 H["0.8%"]:1 I["4.5%"]:1
J["Combined CPU"]:1 K["10.0%"]:1 L["23.3%"]:1
M["Robot→Edge RTT"]:1 N["14.2ms"]:1 O["9.5ms"]:1
P["MPC Exec Time"]:1 Q["16.1ms"]:1 R["16.9ms"]:1
S["Edge→Robot RTT"]:1 T["17.6ms"]:1 U["13.1ms"]:1
V["Total RTT"]:1 W["47.9ms"]:1 X["39.5ms"]:1
The K8s network RTT is lower than Docker's despite higher CPU overhead — likely because the K8s setup's Service/kube-proxy routing reduces ARP resolution overhead and the iptables DNAT chains are pre-warmed. However K8s overhead doubles CPU usage due to: kubelet polling loops, etcd heartbeats, kube-proxy iptables sync, and controller reconciliation goroutines.
flowchart TD
subgraph K8S_OVERHEAD["Kubernetes Extra Processes (per worker node)"]
KUBELET2["kubelet\nPod health polling\nCRI gRPC calls\n~50MB RAM"]
KUBEPROXY2["kube-proxy\niptables sync loop\nEvery 30s full resync\n~20MB RAM"]
PAUSE["pause container\nPer-pod network namespace holder\n~700KB per pod"]
CADVISOR["cAdvisor\nContainer metrics\nRead cgroup files\n~30MB RAM"]
DNS["CoreDNS\nCluster DNS\nService discovery\n~50MB RAM"]
end
subgraph DOCKER_SIMPLE["Docker Standalone"]
DOCKERD2["dockerd\n~100MB RAM"]
CONTAINERD2["containerd\n~30MB RAM"]
end
style K8S_OVERHEAD fill:#2a1a3a,color:#fff
style DOCKER_SIMPLE fill:#1a2a1a,color:#fff
7. ROS Node Communication: Pub/Sub Over TCP¶
sequenceDiagram
participant MASTER as ROS Master (:11311)
participant UAV_PUB as UAV Publisher Node
participant EDGE_SUB as Edge Subscriber (MPC)
UAV_PUB->>MASTER: registerPublisher("/odometry", "nav_msgs/Odometry")
MASTER-->>UAV_PUB: [statusCode, msg, subscriberAPIs]
EDGE_SUB->>MASTER: registerSubscriber("/odometry", "nav_msgs/Odometry")
MASTER-->>EDGE_SUB: [statusCode, msg, [UAV_PUB_URI]]
EDGE_SUB->>UAV_PUB: requestTopic("/odometry", [TCPROS])
UAV_PUB-->>EDGE_SUB: [statusCode, "TCPROS", HOST, PORT]
EDGE_SUB->>UAV_PUB: TCP connect to HOST:PORT
UAV_PUB-->>EDGE_SUB: Header exchange (MD5 sum, type, topic)
loop 100Hz Message Stream
UAV_PUB->>EDGE_SUB: Serialized nav_msgs/Odometry\n[uint32 length][serialized bytes]
end
The critical insight: ROS Master returns the publisher's advertised IP to subscribers. If the publisher is in a Docker container with a private IP (172.17.x.x), the subscriber cannot connect — hence --network=host is mandatory for cross-device ROS communication.
8. Container Failure Recovery: Docker vs. K8s State Machines¶
stateDiagram-v2
state "Docker Standalone" as DOCKER {
[*] --> Running: docker run
Running --> Dead: crash/OOM
Dead --> [*]: manual restart\nor --restart=always
Dead --> Running: --restart=always\ncreates new container
}
state "Kubernetes Pod" as K8S {
[*] --> Pending: Pod scheduled
Pending --> Running: Container started
Running --> Succeeded: normal exit
Running --> Failed: crash/OOM/signal
Failed --> Running: kubelet restartPolicy=Always\nExponential backoff: 10s→20s→40s...→5min
Running --> CrashLoopBackOff: >5 consecutive failures\nbackoff capped at 5min
CrashLoopBackOff --> Running: manual delete+recreate\nor fix root cause
}
For production UAV control, K8s automatic pod restarts are critical: if the MPC container crashes mid-flight, K8s will attempt to restart it with exponential backoff — Docker requires external tooling (--restart=always) to achieve equivalent behavior, but with no health check or readiness probe support.
9. Edge Architecture Decision Matrix¶
flowchart TD
REQ["Mission Requirements"] --> Q1{"Multi-container\ncoordination?"}
Q1 -->|No| Q2{"Failure\nauto-recovery?"}
Q1 -->|Yes| K8S_BRANCH["Kubernetes Path"]
Q2 -->|No| DOCKER_SIMPLE2["Docker: simplest\n--rm flag\nno orchestration"]
Q2 -->|Yes| DOCKER_RESTART["Docker: --restart=always\nor docker-compose\nrestart: always"]
K8S_BRANCH --> Q3{"Resource\nconstraints?"}
Q3 -->|"Embedded\n<4GB RAM"| K3S["k3s / MicroK8s\nLightweight K8s\n~300MB overhead"]
Q3 -->|"Edge server\n>8GB RAM"| FULL_K8S["Full Kubernetes\nFull feature set\nHighest resilience"]
Q3 -->|"Extreme edge\n<1GB RAM"| NOMAD["HashiCorp Nomad\nor bare containers\nMinimal orchestration"]
style K8S_BRANCH fill:#1a2a3a,color:#fff
10. Latency Budget Analysis: Why Edge Beats Cloud for Real-Time Control¶
flowchart LR
subgraph CLOUD["Cloud Architecture"]
C_ROBOT["UAV\nWiFi AP"]
C_INTERNET["Internet\n~50ms RTT\n+jitter"]
C_CLOUD["Cloud VM\nMPC Node"]
C_ROBOT -->|"50ms+"| C_INTERNET -->|"50ms+"| C_CLOUD
NOTE_C["Total RTT: >100ms\nMPC at 10Hz max\nUnstable at N=100"]
end
subgraph EDGE["Edge Architecture"]
E_ROBOT["UAV\nWiFi AP"]
E_LOCAL["Local WiFi\n<15ms RTT"]
E_EDGE["Edge Server\nMPC Node"]
E_ROBOT -->|"~14ms"| E_LOCAL -->|"~14ms"| E_EDGE
NOTE_E["Total RTT: ~48ms\nMPC at 100Hz\nN=100 horizon stable"]
end
subgraph ONBOARD["On-board Compute"]
OB_ROBOT["UAV + MPC\nSame device"]
NOTE_OB["0ms network RTT\nBut CPU-limited:\nN=100 may not fit\nin 10ms budget"]
end
style CLOUD fill:#3a1a1a,color:#fff
style EDGE fill:#1a3a1a,color:#fff
style ONBOARD fill:#1a1a3a,color:#fff
The 100Hz control rate with N=100 prediction horizon requires the full MPC solve to complete in <10ms. On a constrained UAV CPU (ARM Cortex-A57), the OSQP solver for this problem size takes ~50ms. On the edge Intel i5-8400, it takes 16ms — just within the 100Hz budget with network latency absorbed.
11. Container Image Layer Architecture for ROS¶
flowchart TD
subgraph ROSCORE_IMAGE["roscore Container Image Layers"]
L1_A["ubuntu:20.04\n~73MB\nBase OS"]
L2_A["ros-noetic-ros-base\n~150MB\nCore ROS packages"]
L3_A["entrypoint.sh\n~1KB\nroscore startup"]
L1_A --> L2_A --> L3_A
end
subgraph MPC_IMAGE["MPC Container Image Layers"]
L1_B["ubuntu:20.04\n~73MB\n(shared cache layer)"]
L2_B["ros-noetic-ros-base\n~150MB\n(shared cache layer)"]
L3_B["ros-noetic-mavros +\nros-noetic-geometry\n~80MB\nRobotics packages"]
L4_B["Eigen3 + OSQP\nOptimization libraries\n~50MB"]
L5_B["mpc_package (custom)\n~10MB\nUAV controller code"]
L6_B["entrypoint.sh\n~1KB\nrosrun mpc mpc_node"]
L1_B --> L2_B --> L3_B --> L4_B --> L5_B --> L6_B
end
OVERLAY["OverlayFS Union Mount\nlowerdir: read-only layers\nupperdir: writable CoW layer\nmerged: container's view"]
L6_B --> OVERLAY
L3_A --> OVERLAY
style ROSCORE_IMAGE fill:#1a2a1a,color:#fff
style MPC_IMAGE fill:#1a1a3a,color:#fff
12. Extending the Architecture: Multi-Robot Coordination via Edge¶
flowchart TD
subgraph EDGE_CLUSTER["Kubernetes Edge Cluster"]
subgraph NS_UAV1["Namespace: uav-1"]
MPC1["mpc-controller Pod"]
ROS1["roscore Pod"]
end
subgraph NS_UAV2["Namespace: uav-2"]
MPC2["mpc-controller Pod"]
ROS2["roscore Pod"]
end
COORD["coordination-node Pod\nShared state:\n- Collision avoidance\n- Formation control\n- Path planning"]
MPC1 <-->|"ROS cross-namespace\ntopics"| COORD
MPC2 <-->|"ROS cross-namespace\ntopics"| COORD
end
UAV1["UAV 1\nWiFi"] --> MPC1
UAV2["UAV 2\nWiFi"] --> MPC2
MPC1 --> UAV1
MPC2 --> UAV2
NOTE["K8s enables:\n- Per-UAV namespace isolation\n- Shared coordination pod\n- Automatic pod recovery\n- Resource quotas per UAV\nDocker Compose cannot\nachieve this cleanly"]
style EDGE_CLUSTER fill:#1a2a3a,color:#fff
Summary: Internal Architecture Tradeoffs¶
| Dimension | Docker Standalone | Kubernetes |
|---|---|---|
| Network overhead | Host-mode: zero overhead | Host-mode: still passes kube-proxy DNAT |
| CPU overhead | ~10% (MPC only) | ~23% (MPC + kubelet + kube-proxy + etcd client) |
| Failure recovery | Manual or --restart flag |
Automatic with probe-gated readiness |
| Scaling | Single node only | Multi-node pod migration |
| Network RTT | 47.9ms total | 39.5ms total (warmed iptables chains) |
| Kubernetes overhead source | — | etcd heartbeats, kubelet CRI polls, kube-proxy iptables resync |
| Namespace isolation | PID + MNT + IPC (not NET with --network=host) |
Same + pause container per pod |
| Real-time deadline safety | Simple, predictable | Scheduler jitter possible under load |
The core principle: containers do not create isolation for free — every namespace boundary has a cost in setup latency, routing overhead, and memory footprint. For hard real-time control loops, the tradeoff between isolation/orchestration features and raw determinism must be carefully evaluated against the mission's RTT budget.