The four primitives used by docker are:
- filesystem (OverlayFS)
- isolation (namespaces)
- resources (cgroups)
- networking (veth + bridge).
1. Filesystem — Build a Root Filesystem
First we need something to run. We’ll use debootstrap to create a minimal Ubuntu userspace:
bash
sudo apt install debootstrap
sudo debootstrap --arch=amd64 jammy /tmp/rootfs http://archive.ubuntu.com/ubuntuThis gives us a real Linux root filesystem at /tmp/rootfs — exactly what a base image is.
2. OverlayFS — Add a Writable Layer on Top
Just like Docker, we don’t want to write directly to the base rootfs. We stack a writable layer on top:
bash
mkdir -p /tmp/container/{upper,work,merged}
sudo mount -t overlay overlay \
-o lowerdir=/tmp/rootfs,upperdir=/tmp/container/upper,workdir=/tmp/container/work \
/tmp/container/mergedNow /tmp/container/merged is the container’s root view:
- reads come from
/tmp/rootfs(the image) - writes land in
/tmp/container/upper(the writable layer) /tmp/rootfsis never touched
3. Networking — Veth Pair + Bridge
Set up the virtual network before launching the container:
bash
# Create a bridge (our "docker0")
sudo ip link add name br0 type bridge
sudo ip addr add 10.10.0.1/24 dev br0
sudo ip link set br0 up
# Create a veth pair
sudo ip link add veth0 type veth peer name veth1
# Plug the host end into the bridge
sudo ip link set veth0 master br0
sudo ip link set veth0 up
# veth1 will go into the container's namespace — hold off for nowEnable NAT so the container can reach the internet:
bash
# Enable IP forwarding
echo 1 | sudo tee /proc/sys/net/ipv4/ip_forward
# Masquerade outbound traffic (replace eth0 with your host interface)
sudo iptables -t nat -A POSTROUTING -s 10.10.0.0/24 -o eth0 -j MASQUERADE4. Launch — Unshare into Namespaces
unshare is the userspace tool for creating namespaces. This one command is the equivalent of Docker’s docker run:
bash
sudo unshare \
--pid \
--mount \
--uts \
--ipc \
--fork \
--kill-child \
chroot /tmp/container/merged /bin/bashYou’re now inside an isolated environment with its own PID tree, mounts, hostname, and IPC — with the OverlayFS root as /.
5. Inside the Container — Finish Setup
These steps would normally happen automatically. Run them inside the shell you just launched:
bash
# Mount essential kernel filesystems
mount -t proc proc /proc
mount -t sysfs sys /sys
mount -t tmpfs tmpfs /tmp
# Set a hostname
hostname mycontainer
# Bring up loopback
ip link set lo upNow attach the network interface. Back on the host in a second terminal:
bash
# Get the PID of our unshare'd process
PID=$(pgrep -f "unshare")
# Move veth1 into the container's net namespace
sudo ip link set veth1 netns $PIDBack inside the container:
bash
# Configure the interface
ip link set veth1 up
ip addr add 10.10.0.2/24 dev veth1
ip route add default via 10.10.0.1
# Test
ping 10.10.0.1 # host gateway
ping 8.8.8.8 # internet6. Cgroups — Limit Resources
From the host, create a cgroup and assign the container process to it:
bash
# Create a cgroup (v2)
sudo mkdir /sys/fs/cgroup/mycontainer
# Limit to 512MB RAM and 1 CPU
echo "512M" | sudo tee /sys/fs/cgroup/mycontainer/memory.max
echo "100000 100000" | sudo tee /sys/fs/cgroup/mycontainer/cpu.max
# (quota period) = 100% of one core
# Assign the container's PID
echo $PID | sudo tee /sys/fs/cgroup/mycontainer/cgroup.procsThe kernel will now enforce those limits on everything running under that PID tree.
What You’ve Built Vs what Docker Does
| Step | What we did manually | What Docker does |
|---|---|---|
| Root filesystem | debootstrap + OverlayFS mount | Pulls image layers, mounts OverlayFS |
| Process isolation | unshare with namespace flags | clone() syscall with namespace flags |
| Filesystem isolation | chroot into merged dir | pivot_root (more secure than chroot) |
| Networking | ip link, ip addr, iptables | Does all of this automatically per container |
| Resource limits | Manual cgroup writes | --memory, --cpus flags write cgroups |
| Cleanup | Manual umount + ip link del | docker rm tears everything down |
The main things Docker adds on top of this are: image management (pulling, layering, caching), automatic lifecycle management (networking/cgroup setup and teardown), a content-addressable store for layers, and seccomp/capabilities hardening. But the kernel primitives underneath are exactly what you just used.