A container is just a regular Linux process — but wrapped in several kernel isolation features that make it feel like its own machine.

The kernel provides four key primitives:

1. Namespaces — what the process can see

Namespaces limit the process’s view of the system. Docker uses six of them:

NamespaceIsolates
pidProcess tree — the container sees only its own processes; its main process thinks it’s PID 1
netNetwork stack — its own interfaces, routing table, ports
mntFilesystem mounts — its own root filesystem (the image)
utsHostname and domain name
ipcShared memory and message queues
userUser/group IDs — can map container root to an unprivileged host user

The process hasn’t moved — it’s still on the host kernel. It just has a filtered view of the world.

2. cgroups — what the process can use

Control groups limit and account for resource consumption: CPU, memory, disk I/O, network bandwidth. This is how docker run --memory 512m is enforced — the kernel will OOM-kill the container if it exceeds the limit, just like any other cgroup.

3. OverlayFS — what the process can read and write

As covered with images — the container gets a merged filesystem view with a private writable layer on top. From the process’s perspective it looks like a normal root filesystem. Under the hood it’s stacked tarballs.

4. Seccomp + capabilities — what the process can do

By default Docker drops most Linux capabilities (e.g. CAP_NET_ADMIN, CAP_SYS_MODULE) and applies a seccomp filter that blocks ~44 system calls. This limits what the process can ask the kernel to do, even if it escapes its namespace.


Putting it together

When you run docker run nginx, what actually happens at the OS level is roughly:

  1. Docker asks the kernel to clone() a new process with new namespaces
  2. The process is placed in a cgroup with its resource limits
  3. Its root is chroot’d (via the mnt namespace) to the image’s merged OverlayFS directory
  4. Capabilities and seccomp filters are applied
  5. PID 1 (nginx) starts — it thinks it’s alone on a machine; it’s actually just a process on the host

So in one sentence: a container is a Linux process with a namespaced view of the system, a cgroup-enforced resource budget, an OverlayFS root filesystem, and a restricted set of kernel syscalls.

There is no virtualisation happening — if you run ps aux on the host, you’ll see the container’s processes right there alongside everything else.