A Dockerfile is a plain text recipe that tells Docker how to build an image layer by layer. Each instruction adds a new layer on top of the previous one. The result is an immutable, shareable image.

Reference: Dockerfile reference | Docker Docs

Instruction Reference

FROM ubuntu:22.04              # base image — always the first instruction
RUN apt-get update             # executes a shell command; result baked into a layer
COPY src/ /app/src/            # copies files from build context into the image
ADD archive.tar.gz /app/       # like COPY but also extracts archives and fetches URLs
WORKDIR /app                   # sets the working directory for subsequent instructions
ENV PORT=8000                  # sets an environment variable available at build and run time
ARG BUILD_ENV=production       # build-time variable only — not present in the final image
EXPOSE 8000                    # documents which port the container listens on (informational only)
ENTRYPOINT ["gunicorn"]        # the fixed executable — hard to override
CMD ["myapp.wsgi:application"] # default arguments — easily overridden at docker run
USER appuser                   # switch to a non-root user for subsequent instructions
VOLUME ["/data"]               # declares a mount point (rarely needed — prefer Compose volumes)

CMD vs ENTRYPOINT

These two are often confused. The mental model: ENTRYPOINT is the executable, CMD is its default arguments.

PatternResult
ENTRYPOINT ["gunicorn"] + CMD ["myapp.wsgi"]gunicorn myapp.wsgi (CMD overridable)
CMD ["gunicorn", "myapp.wsgi"]Same, but the whole thing is overridable
ENTRYPOINT ["gunicorn"] onlyRuns gunicorn with no args unless you pass some

In practice: use ENTRYPOINT when you want the container to behave like a single executable (you can still pass flags). Use CMD alone when you want the whole command to be easily replaced.

Always use the exec form (["executable", "arg"]) rather than shell form (executable arg). Shell form wraps the command in /bin/sh -c, which means your process isn’t PID 1 — it won’t receive signals correctly, so docker stop will be slow.

COPY vs ADD

Prefer COPY in almost all cases. ADD has two extra features — extracting tar archives and fetching remote URLs — but both are footguns. Extracting archives silently, and fetching URLs at build time bypasses caching in unexpected ways. Use explicit RUN curl + RUN tar if you need those behaviours.


Layer Caching

Every instruction in a Dockerfile produces a layer. Docker caches layers and reuses them if the instruction and all its inputs are unchanged. Once a layer is invalidated, all subsequent layers are rebuilt.

The practical rule: put things that change rarely at the top, things that change often at the bottom.

Bad — cache busted on every code change:

FROM python:3.12-slim
WORKDIR /app
COPY . .                          # copies everything — changes constantly
RUN pip install -r requirements.txt  # rebuilds pip on every change to any file

Good — dependencies cached separately:

FROM python:3.12-slim
WORKDIR /app
COPY requirements.txt .           # only changes when dependencies change
RUN pip install -r requirements.txt  # cached until requirements.txt changes
COPY . .                          # code changes don't invalidate the pip layer

What invalidates a layer

  • RUN: the instruction text changes
  • COPY/ADD: any file in the source changes (Docker checksums the files)
  • Any layer above it is invalidated

A RUN apt-get update at the top of a Dockerfile is a classic mistake — it’s cached after the first build, so subsequent builds may use a stale package list. Always chain update and install: RUN apt-get update && apt-get install -y ....


Multi-Stage Builds

Multi-stage builds let you use multiple FROM instructions in one Dockerfile. Each stage is independent; you can selectively copy artifacts from one stage into another. The final image contains only what the last stage produces.

The main use case: compile or install dependencies in a fat builder stage, then copy only the outputs into a lean final image.

Example — Django with a development and production stage

# ── Stage 1: base ─────────────────────────────────────────────────────────────
# Shared foundation — OS packages, user setup
FROM python:3.12-slim AS base
 
RUN apt-get update && apt-get install -y --no-install-recommends \
    libpq-dev \
 && rm -rf /var/lib/apt/lists/*
 
# Create a non-root user
RUN useradd --create-home appuser
WORKDIR /app
USER appuser
 
# ── Stage 2: dependencies ─────────────────────────────────────────────────────
# Install Python packages — separate stage so this layer is cached independently
FROM base AS dependencies
 
COPY --chown=appuser requirements.txt .
RUN pip install --user --no-cache-dir -r requirements.txt
 
 
# ── Stage 3: development ──────────────────────────────────────────────────────
# Used by docker-compose.dev.yml: build: { target: development }
# Source code is bind-mounted at runtime — not baked in
FROM dependencies AS development
 
COPY --chown=appuser requirements-dev.txt .
RUN pip install --user --no-cache-dir -r requirements-dev.txt
 
EXPOSE 8000
CMD ["python", "manage.py", "runserver", "0.0.0.0:8000"]
 
 
# ── Stage 4: production ───────────────────────────────────────────────────────
# Used by CI/CD: docker build --target production
# Source code is baked into the image at build time
FROM dependencies AS production
 
COPY --chown=appuser . .
 
# Collect static files at build time
RUN python manage.py collectstatic --noinput
 
EXPOSE 8000
CMD ["gunicorn", "myproject.wsgi:application", "--bind", "0.0.0.0:8000", "--workers", "4"]

Building a specific stage

# Build only up to the development stage (for local dev)
docker build --target development -t myapp:dev .
 
# Build the full production image
docker build --target production -t myapp:latest .
 
# CI/CD — build and tag with git SHA for pinning
docker build --target production -t myapp:$(git rev-parse --short HEAD) .

The docker-compose.dev.yml file targets the development stage; the production Compose file uses a pre-built image tag (no build: key at all). Both share the same Dockerfile.


Keeping Images Small

Image size matters for pull times, attack surface, and storage costs.

Use a slim or alpine base:

FROM python:3.12-slim    # ~50MB vs ~900MB for python:3.12
FROM python:3.12-alpine  # ~20MB, but musl libc can cause compatibility issues

Clean up in the same RUN instruction: Each RUN creates a layer. If you install something in one layer and delete it in the next, the deleted files still exist in the earlier layer and are still part of the image.

# Wrong — the apt cache exists in the install layer
RUN apt-get update && apt-get install -y curl
RUN rm -rf /var/lib/apt/lists/*
 
# Right — single layer, cache never stored
RUN apt-get update && apt-get install -y --no-install-recommends curl \
 && rm -rf /var/lib/apt/lists/*

Don’t install unnecessary tools in production images. Dev dependencies, compilers, and debuggers have no place in a production image. Multi-stage builds handle this naturally — the builder stage can be fat; the final stage is lean.

Run as a non-root user. Besides being good security hygiene, it’s a reminder to keep the image minimal — if you need root for something in production, that’s a smell.


.dockerignore

Like .gitignore, but for the build context. Docker sends the entire build context to the daemon on every build — if your context includes node_modules, a Python virtualenv, or .git, builds are slow and layers are bloated.

# .dockerignore
.git
.venv
__pycache__
*.pyc
node_modules
.env
.env.*
*.log
dist/

Security Practices

Don’t bake secrets into images. Anything passed via ENV or ARG and baked in is visible in docker inspect and image history. Inject secrets at runtime via environment variables or a secrets manager.

# Wrong — secret visible in image history
ARG SECRET_KEY
ENV SECRET_KEY=$SECRET_KEY
 
# Right — inject at runtime, never in the image
# (set via docker run -e SECRET_KEY=... or Compose environment:)

Pin base image versions by digest for production builds. Tags are mutable — python:3.12-slim today might not be the same image in three months. For reproducible production builds, pin by digest:

FROM python:3.12-slim@sha256:abc123...

In practice, pin in CI/CD and let dev use the tag.


See Also