docker containers under the hood

A container is a regular Linux process — but wrapped in several kernel isolation features that make it feel like its own machine. There is no virtualisation happening. If you run ps aux on the host, you’ll see the container’s processes right there alongside everything else.

The kernel provides four key primitives:

Namespaces
cgroups
OverlayFS
Seccomp

1. Namespaces — What the Process Can See

Namespaces limit the process’s view of the system. Docker uses six:

Namespace	Isolates
`pid`	Process tree — the container sees only its own processes; its main process thinks it’s PID 1
`net`	Network stack — its own interfaces, routing table, ports
`mnt`	Filesystem mounts — its own root filesystem (the image)
`uts`	Hostname and domain name
`ipc`	Shared memory and message queues
`user`	User/group IDs — can map container root to an unprivileged host user

The process hasn’t moved — it’s still on the host kernel. It just has a filtered view of the world.

2. cgroups — What the Process Can Use

Control groups (cgroups) limit and account for resource consumption: CPU, memory, disk I/O, network bandwidth. This is how docker run --memory 512m is enforced — the kernel will OOM-kill the container if it exceeds the limit, exactly like any other cgroup.

docker run --memory 512m --cpus 1.0 nginx

Internally, Docker writes to the cgroup hierarchy at /sys/fs/cgroup/. You can inspect a container’s limits:

docker inspect <container> | grep -A5 Memory

3. OverlayFS — What the Process Can Read and Write

The container gets a merged filesystem view: the image layers as read-only lower dirs, plus a fresh writable upper layer on top. From the process’s perspective it looks like a normal root filesystem. Under the hood it’s stacked tarballs — see docker images under the hood

4. Seccomp + Capabilities — What the Process Can Do

By default Docker drops most Linux capabilities (e.g. CAP_NET_ADMIN, CAP_SYS_MODULE) and applies a seccomp filter that blocks ~44 system calls. This limits what the process can ask the kernel to do, even if it escapes its namespace.

You can inspect what capabilities a running container has:

docker inspect <container> | grep -A20 CapAdd

Putting It Together

When you run docker run nginx, the sequence at the OS level is roughly:

Docker asks the kernel to clone() a new process with new namespaces
The process is placed in a cgroup with its resource limits
Its root is switched (via pivot_root) to the image’s merged OverlayFS directory
Capabilities and seccomp filters are applied
PID 1 (nginx) starts — it thinks it’s alone on a machine; it’s actually just a process on the host

Container Lifecycle

created → running → stopped → removed
                ↑         |
                └─restart─┘

State	What it means
created	Namespaces and cgroup allocated, not yet started
running	PID 1 is alive
stopped	PID 1 exited or was stopped; cgroup and filesystem still exist
removed	Writable layer deleted; cgroup released

A stopped container still has its writable layer on disk — you can docker start it again and the filesystem changes are still there. Only docker rm destroys them.

Signals and Graceful Shutdown

docker stop sends SIGTERM to PID 1, waits 10 seconds (configurable), then sends SIGKILL. This means your application’s PID 1 must handle SIGTERM to shut down gracefully.

If your Dockerfile uses shell form (CMD gunicorn ...) instead of exec form (CMD ["gunicorn", ...]), the process is wrapped in /bin/sh -c and /bin/sh becomes PID 1 — which typically doesn’t forward signals. Use exec form.

One sentence: a container is a Linux process with a namespaced view of the system, a cgroup-enforced resource budget, an OverlayFS root filesystem, and a restricted set of kernel syscalls.

Notes

docker containers under the hood

1. Namespaces — What the Process Can See

2. cgroups — What the Process Can Use

3. OverlayFS — What the Process Can Read and Write

4. Seccomp + Capabilities — What the Process Can Do

Putting It Together

Container Lifecycle

Signals and Graceful Shutdown

See Also

Graph View

Table of Contents

Backlinks