Containerising a Backend Service: From Docker to Kubernetes
A practical walkthrough of containerising a Python backend service with Docker, deploying it to Kubernetes on ECS, and the production gaps that only show up once real traffic hits.
Introduction
Containerisation is one of those things that sounds straightforward until you try to run containers in production and discover all the ways a “works on my machine” Docker image can behave unexpectedly at scale. I’ve been through this journey a few times now, and I want to share the practical bits — not just the happy path.
Why Containers?
Before I started containerising our services, deployments were painful. Environment drift between dev, staging, and production caused subtle bugs. Onboarding a new engineer meant half a day of “install this version of Python, this version of postgres-client…”. Rollbacks were nerve-wracking.
Containers solve the environment problem decisively. The image is the artefact — the same bits that ran in CI run in production.
Writing a Production-Ready Dockerfile
Most Dockerfile tutorials show you the basics. Here’s what a production image actually needs:
# Multi-stage build — keeps the final image lean
FROM python:3.12-slim AS builder
WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir --prefix=/install -r requirements.txt
# --- Final stage ---
FROM python:3.12-slim
# Non-root user for security
RUN addgroup --system appgroup && adduser --system --ingroup appgroup appuser
WORKDIR /app
# Copy installed deps from builder
COPY --from=builder /install /usr/local
COPY --chown=appuser:appgroup . .
USER appuser
EXPOSE 8000
# Use exec form — proper signal handling (PID 1 receives SIGTERM)
CMD ["uvicorn", "app.main:app", "--host", "0.0.0.0", "--port", "8000"]
A few things worth calling out:
Multi-stage builds keep the final image small. The builder stage installs dependencies; the final stage copies only what’s needed. Our image went from 900MB to 180MB.
Non-root user. Running as root inside a container is a security risk. If the container is compromised, the attacker gets root. Creating a dedicated user takes two lines.
Exec form CMD. The CMD ["uvicorn", ...] exec form (not shell form) means your application runs as PID 1 and receives SIGTERM directly. This is essential for graceful shutdown — Kubernetes sends SIGTERM before killing a pod. Shell form (CMD uvicorn ...) wraps it in /bin/sh -c, which typically doesn’t forward signals.
Local Development with Docker Compose
For local development, I use Docker Compose to spin up the full dependency stack:
# docker-compose.yml
services:
api:
build: .
ports:
- "8000:8000"
environment:
- DATABASE_URL=postgresql://dev:dev@postgres:5432/appdb
- REDIS_URL=redis://redis:6379/0
volumes:
- .:/app # Hot reload in dev
depends_on:
postgres:
condition: service_healthy
postgres:
image: postgres:16-alpine
environment:
POSTGRES_USER: dev
POSTGRES_PASSWORD: dev
POSTGRES_DB: appdb
healthcheck:
test: ["CMD-SHELL", "pg_isready -U dev"]
interval: 5s
timeout: 3s
retries: 5
redis:
image: redis:7-alpine
The condition: service_healthy on depends_on is important — without it, the API container starts before Postgres is ready to accept connections and crashes on the first DB connection attempt.
Moving to Kubernetes
Once the Docker image is solid, Kubernetes takes over orchestration. I’ll show the core manifests for a typical backend service.
Deployment
apiVersion: apps/v1
kind: Deployment
metadata:
name: api
namespace: production
spec:
replicas: 3
selector:
matchLabels:
app: api
template:
metadata:
labels:
app: api
spec:
containers:
- name: api
image: 123456789.dkr.ecr.ap-southeast-1.amazonaws.com/api:v1.4.2
ports:
- containerPort: 8000
env:
- name: DATABASE_URL
valueFrom:
secretKeyRef:
name: api-secrets
key: database-url
resources:
requests:
cpu: "100m"
memory: "256Mi"
limits:
cpu: "500m"
memory: "512Mi"
readinessProbe:
httpGet:
path: /health
port: 8000
initialDelaySeconds: 5
periodSeconds: 10
livenessProbe:
httpGet:
path: /health
port: 8000
initialDelaySeconds: 15
periodSeconds: 20
Resource requests and limits are not optional in production. Without them, a runaway pod can starve other pods on the same node. requests determines scheduling (what the pod is guaranteed); limits is the ceiling.
Readiness vs liveness probes serve different purposes. Readiness controls whether the pod receives traffic — if the DB connection pool isn’t ready yet, return 503 on /health and Kubernetes won’t route traffic to that pod. Liveness controls whether Kubernetes restarts the pod — only fail liveness for truly unrecoverable states.
Graceful Shutdown in Python
Kubernetes sends SIGTERM when it wants to terminate a pod, then waits terminationGracePeriodSeconds (default 30s) before sending SIGKILL. Your app needs to handle SIGTERM cleanly:
import signal
import asyncio
from contextlib import asynccontextmanager
from fastapi import FastAPI
@asynccontextmanager
async def lifespan(app: FastAPI):
# Startup
await db.connect()
yield
# Shutdown — runs on SIGTERM
await db.disconnect()
app = FastAPI(lifespan=lifespan)
With FastAPI’s lifespan context manager, cleanup code runs reliably on shutdown. Uvicorn handles SIGTERM and triggers this cleanup path.
The Production Gaps That Caught Me Out
Gap 1: Image tag discipline. Early on I was tagging images as latest for every deployment. When a pod restarted, Kubernetes pulled latest — which might be a newer image than what was running. Now every release is tagged with the git commit SHA. latest is never used in production manifests.
Gap 2: Secret rotation. Kubernetes Secrets are base64-encoded, not encrypted. We moved to AWS Secrets Manager with the External Secrets Operator to sync secrets into Kubernetes as native Secret objects while keeping the source of truth in AWS. This also means rotating a secret doesn’t require a code deployment.
Gap 3: Log aggregation. kubectl logs works for one pod at a time. In production with 3+ replicas, you need centralised logging. We ship container logs to CloudWatch using the Fluent Bit DaemonSet — it’s lightweight and the CloudWatch integration is solid on EKS.
Gap 4: Pod disruption budgets. When a node is drained for maintenance, Kubernetes might terminate all replicas of your deployment simultaneously if nothing stops it. A PodDisruptionBudget ensures at least one replica stays up:
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
name: api-pdb
spec:
minAvailable: 1
selector:
matchLabels:
app: api
CI/CD Pipeline
Our GitLab CI pipeline builds, tests, and deploys:
build:
stage: build
script:
- docker build -t $ECR_REPO:$CI_COMMIT_SHA .
- docker push $ECR_REPO:$CI_COMMIT_SHA
deploy:
stage: deploy
script:
- kubectl set image deployment/api api=$ECR_REPO:$CI_COMMIT_SHA -n production
- kubectl rollout status deployment/api -n production
only:
- main
kubectl rollout status blocks the pipeline until the deployment is healthy. If pods crash on startup, the pipeline fails and you still have the previous version running.
Conclusion
The journey from Docker to Kubernetes has a real learning curve, but the operational benefits at scale are worth it. The key things to get right from the start:
- Write a proper Dockerfile: multi-stage, non-root, exec-form CMD
- Handle
SIGTERMin your application code — Kubernetes depends on it - Set resource requests and limits on every container
- Distinguish readiness from liveness probes
- Use immutable image tags in production
Once these are in place, deployments become boring — and boring is exactly what you want in production.
Related Articles
Docker for Backend Developers: A Practical Introduction
Learn how Docker works, why backend developers need it, and how to containerize your first Python or Go application in under 30 minutes.
Environment Variables Explained: Keeping Secrets Out of Code
Learn what environment variables are and why every developer needs them. This guide covers how to use .env files, os.environ in Python, process.env in Node.js, and best practices.
Introduction to FastAPI: Modern Python APIs Made Simple
Get started with FastAPI Python to build fast, modern APIs with automatic documentation. This beginner guide covers installation, routes, request validation, and more.