A container is just a regular Linux process — but wrapped in several kernel isolation features that make it feel like its own machine.
The kernel provides four key primitives:
1. Namespaces — what the process can see
Namespaces limit the process’s view of the system. Docker uses six of them:
| Namespace | Isolates |
|---|---|
pid | Process tree — the container sees only its own processes; its main process thinks it’s PID 1 |
net | Network stack — its own interfaces, routing table, ports |
mnt | Filesystem mounts — its own root filesystem (the image) |
uts | Hostname and domain name |
ipc | Shared memory and message queues |
user | User/group IDs — can map container root to an unprivileged host user |
The process hasn’t moved — it’s still on the host kernel. It just has a filtered view of the world.
2. cgroups — what the process can use
Control groups limit and account for resource consumption: CPU, memory, disk I/O, network bandwidth. This is how docker run --memory 512m is enforced — the kernel will OOM-kill the container if it exceeds the limit, just like any other cgroup.
3. OverlayFS — what the process can read and write
As covered with images — the container gets a merged filesystem view with a private writable layer on top. From the process’s perspective it looks like a normal root filesystem. Under the hood it’s stacked tarballs.
4. Seccomp + capabilities — what the process can do
By default Docker drops most Linux capabilities (e.g. CAP_NET_ADMIN, CAP_SYS_MODULE) and applies a seccomp filter that blocks ~44 system calls. This limits what the process can ask the kernel to do, even if it escapes its namespace.
Putting it together
When you run docker run nginx, what actually happens at the OS level is roughly:
- Docker asks the kernel to
clone()a new process with new namespaces - The process is placed in a cgroup with its resource limits
- Its root is
chroot’d (via themntnamespace) to the image’s merged OverlayFS directory - Capabilities and seccomp filters are applied
- PID 1 (
nginx) starts — it thinks it’s alone on a machine; it’s actually just a process on the host
So in one sentence: a container is a Linux process with a namespaced view of the system, a cgroup-enforced resource budget, an OverlayFS root filesystem, and a restricted set of kernel syscalls.
There is no virtualisation happening — if you run ps aux on the host, you’ll see the container’s processes right there alongside everything else.