Hero image for Hardening Container Images: A Defense-in-Depth Approach for Production Workloads

Hardening Container Images: A Defense-in-Depth Approach for Production Workloads


Your container passed the vulnerability scan, yet attackers still compromised your production cluster. The scan found zero CVEs, but the image ran as root, included a shell, and exposed unnecessary capabilities—three attack vectors that scanners routinely miss. This scenario plays out in production environments more often than security teams want to admit.

Container security requires defense in depth: multiple overlapping controls that assume any single layer will fail. A vulnerability scanner catches known CVEs. A minimal base image eliminates attack tools. Non-root execution prevents privilege escalation. Runtime policies enforce security even when CI/CD gates are bypassed. Each layer compensates for gaps in the others.

This guide walks through building hardened container images from the ground up, integrating security gates into your CI/CD pipeline, and enforcing policies at runtime. You’ll leave with a concrete checklist and working examples you can adapt to your own infrastructure.


Why Vulnerability Scanning Alone Fails

Security teams love vulnerability scanners because they produce quantifiable results. “Zero high-severity CVEs” looks great in a compliance report. But this metric creates dangerous false confidence.

CVE databases lag behind real-world exploits by weeks or months. The median time from vulnerability discovery to CVE publication is 35 days. Attackers actively exploit this window. Your scanner reports a clean bill of health while known exploits circulate in the wild.

More fundamentally, scanners focus on known vulnerabilities in known packages. They miss entire categories of security issues:

Configuration weaknesses: A container running as root with all Linux capabilities enabled presents a massive attack surface—but triggers zero CVE alerts. The CAP_SYS_ADMIN capability alone grants near-complete control over the host system.

Supply chain attacks: Scanners check package versions against vulnerability databases. They don’t detect malicious code injected into otherwise legitimate packages. The xz utils backdoor in 2024 demonstrated how sophisticated actors can compromise build infrastructure.

Unnecessary attack surface: Your Python application ships with curl, wget, a shell, and a package manager. An attacker who gains code execution can use these tools to download additional malware, establish persistence, or pivot to other systems. Scanners won’t flag these as vulnerabilities.

Runtime behavior: Static analysis can’t predict how an application will behave. A container with minimal CVEs can still make dangerous system calls, access sensitive file paths, or establish unexpected network connections.

The pattern is consistent: organizations achieve “zero critical CVEs” in their scanning dashboards while leaving fundamental security gaps unaddressed. Scanning is necessary but nowhere near sufficient.

A robust container security posture requires controls at every layer: build-time hardening, CI/CD policy enforcement, and runtime protection. Each layer catches issues the others miss.


Building Minimal Base Images That Attackers Can’t Exploit

The most effective security control is removing attack surface entirely. An attacker can’t exploit a shell that doesn’t exist. They can’t download tools through a package manager that isn’t installed.

Choosing Your Base Image

Three primary options exist for minimal base images:

Base TypeSizeContentsBest For
scratch0 MBNothingStatically compiled Go binaries
distroless2-20 MBRuntime only (libc, SSL certs)Python, Java, Node.js applications
Alpine5 MBmusl libc, busybox, apkApps requiring minimal tooling

Scratch contains literally nothing—no shell, no libc, no SSL certificates. Use it for statically compiled Go binaries that embed all dependencies.

Distroless images from Google include only the runtime your application needs. The Python distroless image contains the Python interpreter and standard library but no shell, package manager, or debugging tools. This is the sweet spot for most production workloads.

Alpine provides a minimal Linux userland with a package manager. It’s useful when you need to install additional packages but want to keep the image small. However, that package manager becomes an attack vector—consider removing apk after installation.

Multi-Stage Builds

Multi-stage builds separate your build environment from your runtime environment. Build tools, compilers, and development dependencies never reach production.

Dockerfile
# Build stage: includes all build dependencies
FROM python:3.12-slim AS builder
WORKDIR /app
# Install build dependencies
RUN apt-get update && apt-get install -y --no-install-recommends \
gcc \
libpq-dev \
&& rm -rf /var/lib/apt/lists/*
# Create virtual environment
RUN python -m venv /opt/venv
ENV PATH="/opt/venv/bin:$PATH"
# Install Python dependencies
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
# Copy application code
COPY src/ ./src/
# Runtime stage: minimal image with only what's needed
FROM gcr.io/distroless/python3-debian12
WORKDIR /app
# Copy virtual environment from builder
COPY --from=builder /opt/venv /opt/venv
# Copy application code
COPY --from=builder /app/src ./src
# Set environment
ENV PATH="/opt/venv/bin:$PATH"
ENV PYTHONUNBUFFERED=1
# Run as non-root user (distroless default is nonroot)
USER nonroot
# Application entrypoint
ENTRYPOINT ["python", "-m", "src.main"]

This Dockerfile produces an image with no shell, no package manager, no gcc, and no apt. An attacker who achieves code execution has no tools to work with.

Removing Attack Tools

If you must use a base image with a shell (sometimes necessary for debugging during development), explicitly remove dangerous tools before shipping to production:

Dockerfile.alpine
FROM python:3.12-alpine AS builder
# ... build steps ...
FROM python:3.12-alpine AS production
# Copy application from builder
COPY --from=builder /app /app
# Remove attack surface
RUN apk del apk-tools && \
rm -rf /sbin/apk /usr/share/apk /etc/apk /var/cache/apk && \
rm -f /bin/sh /bin/ash /bin/busybox && \
rm -f /usr/bin/wget /usr/bin/curl
USER 10001:10001
ENTRYPOINT ["python", "/app/main.py"]

⚠️ Warning: Removing the shell makes debugging extremely difficult. Use this approach only for production images, and maintain a separate debug variant for troubleshooting.


Non-Root Containers and Linux Capability Restrictions

Running containers as root is the default—and the default is dangerous. A root process inside a container can exploit kernel vulnerabilities to escape to the host. Even without an escape, root can modify files owned by other processes, access secrets in environment variables, and interfere with other containers sharing the same node.

Explicit User Configuration

Define a non-root user in your Dockerfile and switch to it before the entrypoint:

Dockerfile.nonroot
FROM node:20-slim
# Create application directory
WORKDIR /app
# Create non-root user with explicit UID/GID
RUN groupadd --gid 10001 appgroup && \
useradd --uid 10001 --gid appgroup --shell /usr/sbin/nologin appuser
# Install dependencies as root (needs write access to node_modules)
COPY package*.json ./
RUN npm ci --only=production && npm cache clean --force
# Copy application code
COPY --chown=appuser:appgroup src/ ./src/
# Switch to non-root user
USER 10001:10001
# Expose non-privileged port
EXPOSE 8080
CMD ["node", "src/server.js"]

Use numeric UIDs rather than usernames. Some base images don’t include /etc/passwd, causing username lookups to fail.

Dropping Linux Capabilities

Linux capabilities divide root privileges into distinct units. By default, Docker grants containers a subset of capabilities including CAP_CHOWN, CAP_SETUID, and CAP_NET_BIND_SERVICE. Your application probably needs none of these.

In Kubernetes, specify security context at both pod and container levels:

deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: secure-app
namespace: production
spec:
replicas: 3
selector:
matchLabels:
app: secure-app
template:
metadata:
labels:
app: secure-app
spec:
securityContext:
runAsNonRoot: true
runAsUser: 10001
runAsGroup: 10001
fsGroup: 10001
seccompProfile:
type: RuntimeDefault
containers:
- name: app
image: myregistry/secure-app:v1.2.3
securityContext:
allowPrivilegeEscalation: false
readOnlyRootFilesystem: true
capabilities:
drop:
- ALL
resources:
limits:
memory: "256Mi"
cpu: "500m"
requests:
memory: "128Mi"
cpu: "100m"
ports:
- containerPort: 8080
protocol: TCP
volumeMounts:
- name: tmp
mountPath: /tmp
- name: cache
mountPath: /app/.cache
volumes:
- name: tmp
emptyDir: {}
- name: cache
emptyDir: {}

Key security settings in this manifest:

  • runAsNonRoot: true prevents the container from running as UID 0, even if the Dockerfile specifies root
  • allowPrivilegeEscalation: false blocks setuid binaries and ptrace-based attacks
  • readOnlyRootFilesystem: true prevents attackers from writing malicious scripts
  • capabilities.drop: [ALL] removes every Linux capability
  • seccompProfile.type: RuntimeDefault applies the container runtime’s default seccomp profile, blocking dangerous system calls

📝 Note: The readOnlyRootFilesystem setting requires mounting writable volumes for directories your application writes to—typically /tmp and application-specific cache directories.

Custom Seccomp Profiles

The default seccomp profile blocks approximately 44 dangerous system calls. For highly sensitive workloads, create a custom profile that allows only the specific syscalls your application uses:

seccomp-profile.json
{
"defaultAction": "SCMP_ACT_ERRNO",
"architectures": ["SCMP_ARCH_X86_64"],
"syscalls": [
{
"names": [
"read", "write", "close", "fstat", "mmap",
"mprotect", "munmap", "brk", "rt_sigaction",
"rt_sigprocmask", "ioctl", "access", "pipe",
"select", "sched_yield", "dup2", "getpid",
"socket", "connect", "accept", "sendto",
"recvfrom", "bind", "listen", "getsockname",
"getpeername", "setsockopt", "getsockopt",
"clone", "execve", "exit", "wait4", "fcntl",
"getdents64", "getcwd", "chdir", "openat",
"newfstatat", "futex", "epoll_create1",
"epoll_ctl", "epoll_wait", "exit_group"
],
"action": "SCMP_ACT_ALLOW"
}
]
}

Generate application-specific profiles using tools like strace or Falco to trace syscalls during normal operation.


Automating Security Gates in Your CI/CD Pipeline

Manual security reviews don’t scale. By the time a human reviews a configuration, the code has already shipped. Automated gates catch issues at the point of commit—before they reach production.

Vulnerability Scanning with Trivy

Trivy scans container images, filesystems, and Git repositories for vulnerabilities and misconfigurations. Integrate it into GitHub Actions to fail builds that introduce high-severity CVEs:

.github/workflows/security-scan.yaml
name: Container Security Scan
on:
push:
branches: [main]
pull_request:
branches: [main]
env:
REGISTRY: ghcr.io
IMAGE_NAME: ${{ github.repository }}
jobs:
build-and-scan:
runs-on: ubuntu-latest
permissions:
contents: read
packages: write
security-events: write
steps:
- name: Checkout repository
uses: actions/checkout@v4
- name: Set up Docker Buildx
uses: docker/setup-buildx-action@v3
- name: Build image for scanning
uses: docker/build-push-action@v5
with:
context: .
load: true
tags: ${{ env.IMAGE_NAME }}:scan
cache-from: type=gha
cache-to: type=gha,mode=max
- name: Run Trivy vulnerability scanner
uses: aquasecurity/trivy-action@master
with:
image-ref: ${{ env.IMAGE_NAME }}:scan
format: 'sarif'
output: 'trivy-results.sarif'
severity: 'CRITICAL,HIGH'
exit-code: '1'
- name: Upload Trivy scan results
uses: github/codeql-action/upload-sarif@v3
if: always()
with:
sarif_file: 'trivy-results.sarif'
- name: Run Trivy config scanner
uses: aquasecurity/trivy-action@master
with:
scan-type: 'config'
scan-ref: '.'
exit-code: '1'
severity: 'CRITICAL,HIGH'

This workflow scans both the built image for CVEs and the repository for misconfigurations (insecure Dockerfiles, exposed secrets in Kubernetes manifests).

Policy Enforcement with OPA Conftest

Vulnerability scanners check for known CVEs. Policy engines enforce organizational standards: no root containers, required resource limits, approved base images.

.github/workflows/policy-check.yaml
policy-check:
runs-on: ubuntu-latest
steps:
- name: Checkout repository
uses: actions/checkout@v4
- name: Setup Conftest
uses: instrumenta/conftest-action@master
with:
version: '0.46.0'
- name: Test Dockerfile policies
run: conftest test Dockerfile --policy policies/
- name: Test Kubernetes manifest policies
run: conftest test k8s/ --policy policies/

Define policies in Rego, OPA’s policy language:

policies/container.rego
package main
# Deny containers running as root
deny[msg] {
input.kind == "Deployment"
container := input.spec.template.spec.containers[_]
not container.securityContext.runAsNonRoot
msg := sprintf("Container '%s' must set runAsNonRoot: true", [container.name])
}
# Deny containers without resource limits
deny[msg] {
input.kind == "Deployment"
container := input.spec.template.spec.containers[_]
not container.resources.limits
msg := sprintf("Container '%s' must specify resource limits", [container.name])
}
# Deny containers with privileged security context
deny[msg] {
input.kind == "Deployment"
container := input.spec.template.spec.containers[_]
container.securityContext.privileged == true
msg := sprintf("Container '%s' must not run as privileged", [container.name])
}
# Require approved base images
deny[msg] {
input.kind == "Dockerfile"
not startswith(input.stages[0].from, "gcr.io/distroless/")
not startswith(input.stages[0].from, "cgr.dev/chainguard/")
msg := "Base image must be from distroless or chainguard"
}

Image Signing with Cosign

Supply chain attacks target the path between your build system and production. An attacker who compromises your registry can replace legitimate images with malicious ones. Cryptographic signing ensures images haven’t been tampered with.

.github/workflows/sign-image.yaml
sign-and-push:
needs: [build-and-scan, policy-check]
runs-on: ubuntu-latest
permissions:
contents: read
packages: write
id-token: write # Required for keyless signing
steps:
- name: Checkout repository
uses: actions/checkout@v4
- name: Install Cosign
uses: sigstore/cosign-installer@v3
- name: Log in to registry
uses: docker/login-action@v3
with:
registry: ${{ env.REGISTRY }}
username: ${{ github.actor }}
password: ${{ secrets.GITHUB_TOKEN }}
- name: Build and push image
id: build
uses: docker/build-push-action@v5
with:
context: .
push: true
tags: ${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}:${{ github.sha }}
- name: Sign image with Cosign
env:
COSIGN_EXPERIMENTAL: "true"
run: |
cosign sign --yes \
${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}@${{ steps.build.outputs.digest }}

Configure your Kubernetes cluster to verify signatures before pulling images using a policy engine like Kyverno or Sigstore’s policy controller.


Runtime Security with Pod Security Standards

CI/CD gates prevent insecure configurations from reaching production—in theory. In practice, engineers bypass pipelines during incidents, deployments fail silently, and manual kubectl apply commands circumvent all automation. Runtime enforcement catches what CI/CD misses.

Pod Security Admission

Kubernetes 1.25+ includes Pod Security Admission (PSA), a built-in admission controller that enforces Pod Security Standards. Three profiles provide increasing levels of restriction:

ProfilePurposeKey Restrictions
PrivilegedUnrestrictedNone—allows everything
BaselineMinimal restrictionsBlocks hostNetwork, hostPID, privileged containers
RestrictedHardenedRequires non-root, drops capabilities, blocks privilege escalation

Apply PSA at the namespace level using labels:

namespace-restricted.yaml
apiVersion: v1
kind: Namespace
metadata:
name: production
labels:
# Enforce restricted profile - reject non-compliant pods
pod-security.kubernetes.io/enforce: restricted
pod-security.kubernetes.io/enforce-version: latest
# Warn on baseline violations (useful during migration)
pod-security.kubernetes.io/warn: restricted
pod-security.kubernetes.io/warn-version: latest
# Audit all violations to cluster logs
pod-security.kubernetes.io/audit: restricted
pod-security.kubernetes.io/audit-version: latest

The restricted profile requires:

  • Running as non-root
  • Dropping all capabilities (only NET_BIND_SERVICE can be added back)
  • Read-only root filesystem
  • No privilege escalation
  • Seccomp profile set to RuntimeDefault or Localhost

💡 Pro Tip: Start with warn mode to identify non-compliant workloads, then switch to enforce after fixing violations. The audit logs show exactly which pods would be rejected.

Handling Exceptions

Some workloads legitimately require elevated privileges—CNI plugins, log collectors, security agents. Create dedicated namespaces with appropriate PSA profiles rather than weakening security cluster-wide:

system-namespace.yaml
apiVersion: v1
kind: Namespace
metadata:
name: kube-system-extensions
labels:
# Baseline allows necessary privileges while blocking the worst offenders
pod-security.kubernetes.io/enforce: baseline
pod-security.kubernetes.io/warn: restricted
---
apiVersion: v1
kind: Namespace
metadata:
name: monitoring
labels:
pod-security.kubernetes.io/enforce: baseline
pod-security.kubernetes.io/warn: restricted

Document why each exception exists. Review exceptions quarterly—system components often reduce their privilege requirements in newer versions.


Network Policies and Secrets Management for Container Isolation

A compromised container is contained only if network segmentation limits its blast radius. Default Kubernetes networking allows any pod to communicate with any other pod—an attacker who compromises one service can probe the entire cluster.

Default-Deny Network Policies

Start with a default-deny policy that blocks all ingress and egress, then explicitly whitelist required connections:

network-policies.yaml
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: default-deny-all
namespace: production
spec:
podSelector: {} # Applies to all pods in namespace
policyTypes:
- Ingress
- Egress
---
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: api-server-policy
namespace: production
spec:
podSelector:
matchLabels:
app: api-server
policyTypes:
- Ingress
- Egress
ingress:
# Allow traffic from ingress controller
- from:
- namespaceSelector:
matchLabels:
name: ingress-nginx
podSelector:
matchLabels:
app: ingress-nginx
ports:
- protocol: TCP
port: 8080
egress:
# Allow connections to PostgreSQL
- to:
- podSelector:
matchLabels:
app: postgresql
ports:
- protocol: TCP
port: 5432
# Allow DNS resolution
- to:
- namespaceSelector: {}
podSelector:
matchLabels:
k8s-app: kube-dns
ports:
- protocol: UDP
port: 53

⚠️ Warning: Forgetting to allow DNS (port 53 to kube-dns) is the most common network policy mistake. Your pods will fail to resolve any hostnames, including internal service names.

Secrets Management

Kubernetes secrets stored in etcd are base64-encoded, not encrypted. Anyone with API access can read them. Environment variables appear in process listings and crash dumps. Production secrets require better protection.

Mount secrets as files instead of environment variables:

pod-with-secrets.yaml
apiVersion: v1
kind: Pod
metadata:
name: app-with-secrets
spec:
containers:
- name: app
image: myregistry/app:v1.0.0
volumeMounts:
- name: db-credentials
mountPath: /secrets/db
readOnly: true
- name: api-keys
mountPath: /secrets/api
readOnly: true
volumes:
- name: db-credentials
secret:
secretName: database-credentials
defaultMode: 0400 # Read-only for owner
- name: api-keys
secret:
secretName: external-api-keys
defaultMode: 0400

Use external secrets operators to sync secrets from dedicated secret managers:

external-secret.yaml
apiVersion: external-secrets.io/v1beta1
kind: ExternalSecret
metadata:
name: database-credentials
namespace: production
spec:
refreshInterval: 1h
secretStoreRef:
name: aws-secrets-manager
kind: ClusterSecretStore
target:
name: database-credentials
creationPolicy: Owner
data:
- secretKey: username
remoteRef:
key: production/database
property: username
- secretKey: password
remoteRef:
key: production/database
property: password

This approach keeps secrets out of your Git repository, etcd, and Kubernetes API—they’re fetched directly from AWS Secrets Manager (or Vault, GCP Secret Manager, etc.) at runtime.


Putting It All Together: A Security Checklist for Production

Security is an ongoing process, not a one-time implementation. Use this checklist before every production deployment and during quarterly security reviews.

Pre-Deployment Checklist

Build Phase:

  • Base image is distroless, chainguard, or scratch
  • Multi-stage build separates build and runtime dependencies
  • No shells, package managers, or debugging tools in final image
  • Dockerfile includes explicit USER directive with numeric UID
  • Image tagged with immutable digest, not latest

Scan Phase:

  • Trivy or Grype scan passes with zero high/critical CVEs
  • OPA Conftest validates Dockerfile and Kubernetes manifests
  • SBOM generated and stored for audit trail

Sign Phase:

  • Image signed with Cosign using keyless signing
  • Signature verification configured in cluster admission policy

Deploy Phase:

  • Pod security context specifies runAsNonRoot, drops all capabilities
  • Read-only root filesystem enabled
  • Resource limits defined for CPU and memory
  • Network policies restrict ingress and egress
  • Secrets mounted as files from external secrets operator

Monitoring and Alerting

Security controls only work if you know when they’re violated:

  • Audit logs: Kubernetes audit logs capture all API requests. Alert on pod creations that trigger PSA warnings.
  • Runtime detection: Tools like Falco detect anomalous behavior—shells spawned in containers, unexpected network connections, sensitive file access.
  • Image drift: Alert when running images don’t match signed manifests in your deployment repository.

Incident Response Considerations

When a container is compromised:

  1. Isolate immediately: Apply a network policy that blocks all egress from the affected pod
  2. Preserve evidence: Don’t delete the pod—snapshot its filesystem and memory for forensics
  3. Check lateral movement: Review network flow logs for connections to other pods
  4. Rotate secrets: Assume any secret mounted in the container is compromised
  5. Rebuild from source: Don’t trust the running image—rebuild and redeploy from your verified source

Key Takeaways

  • Start every Dockerfile FROM a distroless or scratch base and use multi-stage builds to exclude build tools
  • Add USER directives with explicit non-root UIDs and drop all Linux capabilities except those explicitly required
  • Implement automated security gates using Trivy, OPA Conftest, and Cosign in your CI/CD pipeline
  • Enable Kubernetes Pod Security Admission with the ‘restricted’ profile as your default enforcement level
  • Apply default-deny NetworkPolicies to every namespace and whitelist only required pod-to-pod communication

Resources