Feb 10, 2026

From Docker Build to GKE Production: A Practical Deployment Pipeline

Your container runs perfectly on your laptop. The tests pass, the logs look clean, and docker run does exactly what you expect. Then you try to deploy it to production Kubernetes, and suddenly you’re debugging YAML indentation at 11 PM while your deployment sits in ImagePullBackOff for reasons that feel deliberately obscure.

The gap between “it works locally” and “it runs reliably in production” is where most container deployments fall apart. It’s not that Kubernetes is unnecessarily complex—it’s that the journey from a working Dockerfile to a production-grade deployment crosses multiple boundaries: image registries with their own authentication schemes, cluster configurations that assume you already know the right defaults, and workload specifications where a single missing field means your pods never schedule.

GKE removes a significant portion of this friction, but only if you understand what it’s actually doing for you. Google’s managed Kubernetes handles the control plane, node provisioning, and cluster upgrades—the infrastructure work that would otherwise consume your weekends. What it doesn’t do is magically transform your container into a production-ready workload. That part is still on you.

This guide walks through the complete pipeline: building images that work beyond your laptop, pushing them to Artifact Registry, configuring a GKE cluster that matches your actual requirements, and deploying workloads that survive the realities of production traffic. No theoretical abstractions—just the specific commands, configurations, and decisions you’ll face along the way.

Let’s start with why container orchestration changes the game in the first place, and what GKE is actually managing on your behalf.

Why Container Orchestration Changes Everything

Running a container locally is simple. You build an image, run docker run, and your application works. But production demands more than a single container on a single machine.

Visual: Container orchestration managing multiple workloads across nodes

Consider what happens when that container crashes at 3 AM. Or when traffic spikes tenfold during a product launch. Or when you need to deploy updates without downtime. Suddenly, that straightforward docker run command reveals its limitations.

The gap between local development and production isn’t just about scale—it’s about reliability, automation, and operational sanity.

What You’d Otherwise Build Yourself

Container orchestration platforms like Google Kubernetes Engine handle the infrastructure concerns that would otherwise consume your engineering time:

Scheduling determines where containers run across your cluster. Instead of manually assigning workloads to specific nodes, the orchestrator places containers based on resource requirements, affinity rules, and node availability. When a node fails, workloads automatically redistribute to healthy nodes.

Scaling responds to demand without manual intervention. Horizontal Pod Autoscaling adjusts replica counts based on CPU, memory, or custom metrics. Vertical scaling right-sizes container resources. Cluster autoscaling adds or removes nodes as workload demands change.

Self-healing keeps applications running despite failures. Kubernetes continuously monitors container health through liveness and readiness probes. Failed containers restart automatically. Traffic routes only to healthy instances.

Building these capabilities from scratch requires significant investment in distributed systems expertise, custom tooling, and ongoing maintenance. GKE provides them as managed infrastructure.

GKE Standard vs Autopilot

GKE offers two operational modes with distinct trade-offs.

GKE Standard gives you full control over node configuration. You choose machine types, manage node pools, and handle node-level security. This mode suits workloads requiring specific hardware (GPUs, high-memory instances), custom node configurations, or fine-grained cost optimization through committed use discounts.

GKE Autopilot manages nodes entirely. You deploy workloads; Google handles node provisioning, scaling, and maintenance. Billing shifts from node-hours to pod resource requests. Autopilot enforces security best practices by default and eliminates node management overhead.

💡 Pro Tip: Start with Autopilot for new projects unless you have specific hardware requirements. You can migrate to Standard later if you need node-level control, but most teams find Autopilot reduces operational burden without limiting capability.

The choice between modes shapes your operational model more than your application architecture. Both run the same Kubernetes workloads—the difference is who manages the underlying compute.

With this foundation in place, the next step is building container images that perform reliably in either mode.

Building Production-Ready Docker Images

The Docker image you build locally for development rarely belongs in production. Development images often include build tools, run as root, and carry unnecessary dependencies that inflate image size and expand your attack surface. Production images need a different approach—one that prioritizes security, minimizes size, and integrates cleanly with your CI/CD pipeline.

Multi-Stage Builds for Lean Images

Multi-stage builds separate your build environment from your runtime environment. The first stage compiles your application with all necessary build tools. The final stage contains only the compiled artifacts and runtime dependencies.

## Build stage
FROM node:20-alpine AS builder
WORKDIR /app
COPY package*.json ./
RUN npm ci --only=production
COPY . .
RUN npm run build

## Production stage
FROM gcr.io/distroless/nodejs20-debian12
WORKDIR /app
COPY --from=builder /app/dist ./dist
COPY --from=builder /app/node_modules ./node_modules
EXPOSE 8080
CMD ["dist/server.js"]

This pattern typically reduces image size by 60-80%. The build stage includes npm, TypeScript compiler, and other tools—none of which ship in the final image. Google’s distroless images take this further by excluding package managers, shells, and other utilities that attackers commonly exploit.

Running as Non-Root

Containers running as root pose a significant security risk. If an attacker escapes the container, they gain root access to the host system. Google’s container security guidelines explicitly recommend running containers as non-privileged users.

FROM golang:1.22-alpine AS builder
WORKDIR /app
COPY go.* ./
RUN go mod download
COPY . .
RUN CGO_ENABLED=0 GOOS=linux go build -o /app/server

FROM gcr.io/distroless/static-debian12
COPY --from=builder /app/server /server
USER nonroot:nonroot
EXPOSE 8080
ENTRYPOINT ["/server"]

Distroless images include a nonroot user by default. For other base images, create a dedicated user:

FROM python:3.12-slim
RUN groupadd -r appgroup && useradd -r -g appgroup appuser
WORKDIR /app
COPY --chown=appuser:appgroup requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
COPY --chown=appuser:appgroup . .
USER appuser
CMD ["python", "app.py"]

💡 Pro Tip: Always set file ownership with --chown in your COPY commands. This prevents permission issues when the container runs as a non-root user.

Tagging Strategies for CI/CD

Avoid the latest tag in production. It’s non-deterministic and makes rollbacks difficult. Instead, use a tagging convention that provides traceability:

## Pattern: <registry>/<project>/<image>:<git-sha>-<build-number>
us-central1-docker.pkg.dev/my-project/my-repo/api:abc1234-42

Git commit SHAs link images directly to source code. Build numbers ensure uniqueness even if you rebuild the same commit. This combination enables precise rollbacks and simplifies debugging production issues.

For release versions, semantic versioning works well alongside SHA tags:

## Tag the same image with both
docker tag myapp:abc1234-42 us-central1-docker.pkg.dev/my-project/my-repo/api:v1.2.3
docker tag myapp:abc1234-42 us-central1-docker.pkg.dev/my-project/my-repo/api:abc1234-42

Your Kubernetes manifests reference the specific SHA tag, while humans reference the semantic version for communication.

With production-ready images built, you need a secure, reliable registry to store them. Artifact Registry provides native integration with GKE and handles the authentication complexity that trips up many teams.

Pushing Images to Artifact Registry

With your production-ready Docker image built, you need a secure, reliable place to store it. Artifact Registry is Google Cloud’s container registry service, replacing the older Container Registry with better security controls, regional storage options, and native integration with GKE’s vulnerability scanning. Beyond simple storage, Artifact Registry provides fine-grained access control through IAM policies, allowing you to restrict who can push, pull, or manage images at the repository level.

Creating Your Repository

Artifact Registry organizes images into repositories within specific regions. Create one that matches your GKE cluster’s region to minimize latency during pulls:

## Enable the Artifact Registry API
gcloud services enable artifactregistry.googleapis.com

## Create a Docker repository
gcloud artifacts repositories create my-app-repo \
    --repository-format=docker \
    --location=us-central1 \
    --description="Production container images"

## Verify the repository exists
gcloud artifacts repositories list --location=us-central1

Your image path follows this pattern: REGION-docker.pkg.dev/PROJECT_ID/REPOSITORY/IMAGE:TAG. This differs from the old gcr.io paths, so update any existing scripts accordingly. When planning your repository structure, consider creating separate repositories for development and production images—this separation simplifies access control and makes it easier to implement different retention policies for each environment.

Authentication Patterns

You have two primary authentication approaches, each suited to different contexts.

For local development and CI systems outside Google Cloud, configure Docker to use gcloud credentials:

## Configure Docker to authenticate with Artifact Registry
gcloud auth configure-docker us-central1-docker.pkg.dev

## Tag your local image for the registry
docker tag my-app:v1.2.0 us-central1-docker.pkg.dev/my-project/my-app-repo/my-app:v1.2.0

## Push the image
docker push us-central1-docker.pkg.dev/my-project/my-app-repo/my-app:v1.2.0

For workloads running on Google Cloud—including Cloud Build, GKE, and Cloud Run—Workload Identity provides automatic, keyless authentication. GKE nodes authenticate to Artifact Registry without any credential management on your part, as long as they have the appropriate IAM roles. This approach eliminates the security risks associated with long-lived service account keys and reduces operational overhead since credentials rotate automatically.

💡 Pro Tip: Grant the roles/artifactregistry.reader role to your GKE node service account. This allows pods to pull images without storing any secrets in your cluster.

Vulnerability Scanning

Artifact Registry integrates with Container Analysis to scan images automatically. Enable scanning and query results before deploying:

## Enable on-push vulnerability scanning
gcloud artifacts repositories update my-app-repo \
    --location=us-central1 \
    --enable-vulnerability-scanning

## Check scan results for a specific image
gcloud artifacts docker images describe \
    us-central1-docker.pkg.dev/my-project/my-app-repo/my-app:v1.2.0 \
    --show-package-vulnerability

The scanner detects known CVEs in OS packages and language-specific dependencies. Block deployments with critical vulnerabilities by integrating these checks into your CI pipeline—reject any image with HIGH or CRITICAL findings before it reaches production. For comprehensive security, schedule periodic rescans of deployed images to catch newly discovered vulnerabilities that may affect your running workloads.

Image Lifecycle Management

Consider implementing an image retention policy to manage storage costs and maintain repository hygiene. Artifact Registry supports cleanup policies that automatically delete images older than a specified duration or keep only the most recent N tags per image. You can configure these policies through the console or via gcloud commands, targeting specific tag patterns while preserving critical releases. This automation prevents unbounded storage growth while ensuring you retain the images you actually need.

With your images securely stored and scanned, you’re ready to deploy them. The next step is configuring a GKE cluster that handles your workloads at scale.

GKE Cluster Configuration That Scales

A well-configured GKE cluster determines whether your production workloads run smoothly at 3 AM or page you with cascading failures. The decisions you make during cluster creation—node pools, networking, and cost controls—compound over time. Getting them right from the start saves months of migration headaches and prevents the architectural debt that accumulates when teams patch around poor initial choices.

Visual: GKE cluster architecture with node pools and networking

Node Pool Design

Node pools let you run different workload types on appropriately sized machines. A common pattern separates your always-on system components from autoscaling application workloads, ensuring that infrastructure services never compete with application pods for resources.

apiVersion: container.google.cloud.google.com/v1
kind: GKECluster
spec:
  name: production-cluster
  location: us-central1

  nodePools:
    - name: system-pool
      machineType: e2-standard-4
      initialNodeCount: 2
      autoscaling:
        enabled: true
        minNodeCount: 2
        maxNodeCount: 4
      nodeLabels:
        workload-type: system
      taints:
        - key: dedicated
          value: system
          effect: NoSchedule

    - name: application-pool
      machineType: e2-standard-8
      initialNodeCount: 3
      autoscaling:
        enabled: true
        minNodeCount: 1
        maxNodeCount: 20
      nodeLabels:
        workload-type: application

The system pool runs cluster-critical components like CoreDNS, metrics collection, and ingress controllers. By tainting these nodes, you prevent application pods from consuming resources meant for infrastructure. The application pool handles your actual workloads with aggressive autoscaling—scaling down to a single node during low traffic and expanding to handle peak load.

Machine type selection matters more than most teams realize. The e2-standard series offers the best price-to-performance ratio for general workloads. CPU-intensive applications benefit from c2 or c3 machines, while memory-heavy workloads run better on n2-highmem instances. Consider your workload’s actual resource consumption patterns—overprovisioning wastes money, while underprovisioning causes throttling and degraded performance during traffic spikes.

When setting autoscaling boundaries, account for pod disruption budgets and rolling deployment requirements. Your maximum node count should accommodate both peak traffic and simultaneous deployments. Setting minimum counts too low risks cold-start latency when traffic arrives faster than nodes can provision.

VPC-Native Clusters and Private Nodes

VPC-native clusters use alias IP addresses, giving each pod a routable IP from your VPC subnet. This eliminates the double-NAT overhead of legacy routes-based networking and enables direct connectivity with other GCP services. Pod-to-pod communication across nodes becomes more efficient, and network policies gain the granularity needed for zero-trust architectures.

spec:
  networkConfig:
    network: projects/my-project/global/networks/production-vpc
    subnetwork: projects/my-project/regions/us-central1/subnetworks/gke-subnet

  privateClusterConfig:
    enablePrivateNodes: true
    enablePrivateEndpoint: false
    masterIpv4CidrBlock: 172.16.0.0/28

  ipAllocationPolicy:
    clusterSecondaryRangeName: pods
    servicesSecondaryRangeName: services

Private nodes remove public IP addresses from your worker machines, forcing all egress through Cloud NAT. This reduces your attack surface significantly—compromised pods cannot directly reach the internet. Setting enablePrivateEndpoint: false keeps the Kubernetes API accessible from authorized networks while maintaining node privacy. For highly regulated environments, enable private endpoints and access the control plane exclusively through bastion hosts or Cloud Interconnect.

💡 Pro Tip: Size your secondary ranges generously. A /14 for pods and /20 for services supports substantial growth. Expanding these ranges later requires cluster recreation—a painful migration that disrupts production workloads.

Cost Optimization Strategies

Spot VMs reduce compute costs by 60-91% compared to on-demand pricing. For stateless, fault-tolerant workloads, they’re transformative. The tradeoff is preemption—GCP can reclaim these instances with minimal notice when capacity demands shift.

nodePools:
  - name: spot-application-pool
    machineType: e2-standard-8
    spotConfig:
      enabled: true
    autoscaling:
      enabled: true
      minNodeCount: 0
      maxNodeCount: 50
    nodeLabels:
      cloud.google.com/gke-spot: "true"

Run your baseline capacity on regular nodes and burst to spot instances for variable load. Kubernetes handles the complexity—when GCP reclaims spot nodes, pods reschedule automatically to available capacity. Implement pod disruption budgets to ensure graceful termination and maintain service availability during preemption events.

For predictable baseline workloads, committed use discounts lock in 1-year or 3-year pricing at 20-57% savings. Combine committed use for your minimum node count with spot VMs for overflow capacity. This hybrid approach balances cost optimization with reliability—you get guaranteed capacity for steady-state traffic while paying minimal premiums for burst handling.

With your cluster configured for scale, security, and cost efficiency, the next step is deploying your containerized applications using Kubernetes manifests that translate your infrastructure intent into running workloads.

Deploying Workloads with Kubernetes Manifests

With your GKE cluster running and images pushed to Artifact Registry, you need Kubernetes manifests that translate container definitions into production-ready deployments. Three resource types form the foundation: Deployments manage your application pods, Services expose them internally, and Ingress routes external traffic. Understanding how these resources interact—and configuring them correctly—determines whether your application survives real-world traffic patterns or collapses under load.

The Essential Trio

A Deployment defines how your application runs—replica count, container image, resource allocation, and update strategy. Here’s a production-ready configuration:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: api-server
  labels:
    app: api-server
spec:
  replicas: 3
  selector:
    matchLabels:
      app: api-server
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxSurge: 1
      maxUnavailable: 0
  template:
    metadata:
      labels:
        app: api-server
    spec:
      containers:
        - name: api-server
          image: us-central1-docker.pkg.dev/my-project/my-repo/api-server:v1.2.0
          ports:
            - containerPort: 8080
          resources:
            requests:
              cpu: 100m
              memory: 256Mi
            limits:
              cpu: 500m
              memory: 512Mi
          readinessProbe:
            httpGet:
              path: /health
              port: 8080
            initialDelaySeconds: 5
            periodSeconds: 10
          livenessProbe:
            httpGet:
              path: /health
              port: 8080
            initialDelaySeconds: 15
            periodSeconds: 20
          lifecycle:
            preStop:
              exec:
                command: ["/bin/sh", "-c", "sleep 10"]
      terminationGracePeriodSeconds: 30

The maxUnavailable: 0 setting ensures zero-downtime deployments by requiring all existing pods to remain healthy before terminating any during updates. Combined with maxSurge: 1, Kubernetes creates one new pod, waits for it to pass readiness checks, then terminates one old pod—repeating until the rollout completes.

A Service provides stable network identity for your pods, abstracting away the ephemeral nature of pod IP addresses:

apiVersion: v1
kind: Service
metadata:
  name: api-server
spec:
  selector:
    app: api-server
  ports:
    - port: 80
      targetPort: 8080
  type: ClusterIP

The ClusterIP type creates an internal-only endpoint. Other pods in the cluster can reach your application at api-server.default.svc.cluster.local, while the Service automatically load-balances across all healthy pods matching the selector.

For external access, an Ingress routes traffic through GKE’s load balancer:

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: api-ingress
  annotations:
    kubernetes.io/ingress.global-static-ip-name: api-static-ip
    networking.gke.io/managed-certificates: api-certificate
spec:
  rules:
    - host: api.example.com
      http:
        paths:
          - path: /
            pathType: Prefix
            backend:
              service:
                name: api-server
                port:
                  number: 80

The annotations leverage GKE-specific features: global-static-ip-name binds a reserved IP address to your load balancer, and managed-certificates provisions and auto-renews TLS certificates through Google-managed SSL.

Resource Requests and Limits

Resource configuration directly impacts cluster stability. Requests guarantee minimum resources and drive scheduling decisions—the scheduler places pods on nodes with sufficient available capacity. Limits cap maximum consumption and trigger throttling (CPU) or OOM kills (memory) when exceeded.

Set requests based on steady-state usage observed in staging. Set limits at 2-4x requests to accommodate traffic spikes without letting runaway processes destabilize the node. Under-provisioned requests lead to overcommitted nodes that degrade under load. Missing limits allow a single misbehaving pod to starve neighbors of CPU cycles or exhaust node memory entirely, triggering cascading failures.

💡 Pro Tip: Run your application under realistic load and use kubectl top pods to measure actual consumption before setting production resource values. Metrics from a few hours of production traffic are more valuable than synthetic benchmarks.

Health Checks and Graceful Shutdown

Readiness probes control traffic routing. Kubernetes removes pods from Service endpoints when readiness checks fail, preventing requests from reaching unhealthy instances. Configure initialDelaySeconds to match your application’s startup time—too short and pods receive traffic before they’re ready; too long and deployments stall unnecessarily.

Liveness probes trigger container restarts when applications become unrecoverable—use them sparingly and with longer intervals than readiness probes. A common mistake is setting aggressive liveness probes that restart pods during temporary slowdowns, creating a feedback loop of restarts under high load.

The preStop lifecycle hook and terminationGracePeriodSeconds handle graceful shutdown. When Kubernetes terminates a pod, it sends SIGTERM and removes the pod from endpoints simultaneously. The sleep 10 in preStop gives load balancers time to stop routing traffic before your application begins shutdown, preventing connection errors during deployments. Ensure terminationGracePeriodSeconds exceeds the sum of your preStop delay plus your application’s drain time.

Deploy all three resources with a single command:

kubectl apply -f deployment.yaml -f service.yaml -f ingress.yaml

With manifests defining your workload configuration, the next step is automating this deployment process through a CI/CD pipeline that builds, tests, and deploys on every code change.

Automating the Pipeline with GitHub Actions

A CI/CD pipeline that stores service account keys as repository secrets is a security liability waiting to happen. Workload Identity Federation eliminates this risk entirely by allowing GitHub Actions to authenticate with Google Cloud using short-lived tokens—no keys to rotate, no secrets to leak. This approach follows the principle of least privilege while removing the operational burden of credential management from your team.

Configuring Workload Identity Federation

Workload Identity Federation works by establishing a trust relationship between Google Cloud and GitHub’s OIDC provider. When your workflow runs, GitHub issues a signed JWT token containing claims about the workflow’s identity—including the repository name, branch, and actor. Google Cloud validates this token against the configured provider and, if the claims match your attribute conditions, issues temporary Google credentials.

First, create a Workload Identity Pool and connect it to GitHub’s OIDC provider:

## Create the workload identity pool
gcloud iam workload-identity-pools create "github-actions-pool" \
  --location="global" \
  --display-name="GitHub Actions Pool"

## Add GitHub as an OIDC provider
gcloud iam workload-identity-pools providers create-oidc "github-provider" \
  --location="global" \
  --workload-identity-pool="github-actions-pool" \
  --display-name="GitHub Provider" \
  --attribute-mapping="google.subject=assertion.sub,attribute.repository=assertion.repository" \
  --issuer-uri="https://token.actions.githubusercontent.com"

## Grant the pool permission to impersonate your service account
gcloud iam service-accounts add-iam-policy-binding "deploy-sa@${PROJECT_ID}.iam.gserviceaccount.com" \
  --role="roles/iam.workloadIdentityUser" \
  --member="principalSet://iam.googleapis.com/projects/${PROJECT_NUMBER}/locations/global/workloadIdentityPools/github-actions-pool/attribute.repository/${GITHUB_ORG}/${GITHUB_REPO}"

The attribute mapping restricts authentication to your specific repository—other repositories cannot impersonate your service account even if they try. You can further constrain access by adding conditions based on the branch name or environment, ensuring that only production workflows can access production credentials.

The Complete Workflow

This workflow builds your Docker image, pushes it to Artifact Registry, and deploys to GKE on every push to main:

name: Deploy to GKE

on:
  push:
    branches: [main]

env:
  PROJECT_ID: my-project
  REGION: us-central1
  CLUSTER_NAME: production-cluster
  REPOSITORY: my-app-repo
  IMAGE_NAME: api-service

jobs:
  deploy:
    runs-on: ubuntu-latest
    permissions:
      contents: read
      id-token: write

    steps:
      - uses: actions/checkout@v4

      - id: auth
        uses: google-github-actions/auth@v2
        with:
          workload_identity_provider: projects/${{ env.PROJECT_NUMBER }}/locations/global/workloadIdentityPools/github-actions-pool/providers/github-provider
          service_account: deploy-sa@${{ env.PROJECT_ID }}.iam.gserviceaccount.com

      - uses: google-github-actions/setup-gcloud@v2

      - name: Configure Docker for Artifact Registry
        run: gcloud auth configure-docker ${{ env.REGION }}-docker.pkg.dev --quiet

      - name: Build and push image
        run: |
          IMAGE_TAG="${{ env.REGION }}-docker.pkg.dev/${{ env.PROJECT_ID }}/${{ env.REPOSITORY }}/${{ env.IMAGE_NAME }}:${{ github.sha }}"
          docker build -t $IMAGE_TAG .
          docker push $IMAGE_TAG

      - uses: google-github-actions/get-gke-credentials@v2
        with:
          cluster_name: ${{ env.CLUSTER_NAME }}
          location: ${{ env.REGION }}

      - name: Deploy to GKE
        run: |
          kubectl set image deployment/api-service \
            api-service=${{ env.REGION }}-docker.pkg.dev/${{ env.PROJECT_ID }}/${{ env.REPOSITORY }}/${{ env.IMAGE_NAME }}:${{ github.sha }}
          kubectl rollout status deployment/api-service --timeout=300s

The id-token: write permission is essential—it allows the workflow to request an OIDC token from GitHub that Google Cloud accepts for authentication. Without this permission explicitly declared, the authentication step will fail silently, leaving you with cryptic error messages about missing credentials.

Deployment Verification and Rollbacks

The kubectl rollout status command blocks until the deployment completes successfully or times out. If pods fail health checks, the deployment stalls and the workflow fails. This behavior provides immediate feedback when something goes wrong, but you need additional safeguards for production workloads.

When a deployment goes wrong, roll back to the previous version:

kubectl rollout undo deployment/api-service

Kubernetes maintains a revision history for each deployment, allowing you to roll back to any previous state. By default, it keeps the last 10 revisions, which you can adjust using the revisionHistoryLimit field in your deployment spec.

💡 Pro Tip: Tag your images with the git commit SHA rather than latest. This creates an immutable link between your code and deployed artifacts, making rollbacks deterministic and audit trails clear.

For additional safety, add a verification step that hits a health endpoint after deployment:

- name: Verify deployment
  run: |
    ENDPOINT=$(kubectl get svc api-service -o jsonpath='{.status.loadBalancer.ingress[0].ip}')
    curl --fail --retry 5 --retry-delay 10 "http://${ENDPOINT}/health"

Consider implementing automated rollbacks when verification fails. You can add a conditional step that triggers kubectl rollout undo if the health check returns a non-zero exit code, ensuring your production environment recovers automatically from failed deployments.

With your pipeline automated and deployments flowing to production, the next challenge is understanding what’s happening inside your cluster once traffic starts hitting your workloads.

Observability and Troubleshooting in Production

Deploying to GKE is only half the battle. Knowing what’s happening inside your cluster—and responding quickly when things break—separates production-ready systems from fragile ones.

Cloud Operations Suite Integration

GKE clusters come with Google Cloud Operations (formerly Stackdriver) enabled by default. This gives you three critical capabilities out of the box:

Cloud Logging aggregates container stdout/stderr, Kubernetes events, and audit logs into a searchable interface. Filter by namespace, pod name, or custom labels to isolate specific workloads.

Cloud Monitoring collects CPU, memory, and network metrics from every node and pod. The GKE dashboard surfaces cluster health at a glance, while custom dashboards let you track application-specific signals.

Error Reporting groups similar exceptions and tracks their frequency over time, helping you prioritize fixes based on actual user impact.

Diagnosing Common Deployment Failures

When pods refuse to start, the diagnosis follows a predictable pattern:

ImagePullBackOff means Kubernetes cannot pull your container image. Check that your image exists in Artifact Registry and that your cluster’s service account has the artifactregistry.reader role.

CrashLoopBackOff indicates your container starts but immediately exits. Examine logs with kubectl logs <pod-name> --previous to see the final output before the crash.

Pending pods that never schedule typically point to resource constraints. Either your nodes lack sufficient CPU/memory, or your resource requests exceed available capacity.

💡 Pro Tip: Add kubectl get events --sort-by='.lastTimestamp' to your troubleshooting toolkit. Events often reveal scheduling failures or probe timeouts before they appear in pod status.

Log-Based Alerting

Reactive monitoring catches issues after users report them. Proactive alerting catches them first.

Create log-based metrics in Cloud Logging to track specific error patterns—failed authentication attempts, database connection timeouts, or HTTP 5xx responses. Attach alerting policies to these metrics with appropriate thresholds and notification channels.

Start with alerts on pod restart counts exceeding normal baselines. A container that restarts five times in ten minutes signals a problem worth investigating immediately.

With observability in place, your deployment pipeline transforms from a one-way push into a feedback loop. You ship code, measure its behavior, and respond to anomalies—all within the same ecosystem.

Key Takeaways

Use multi-stage Docker builds with non-root users to create smaller, more secure images for GKE deployment
Configure Workload Identity Federation in your CI/CD pipeline to eliminate long-lived service account keys
Set resource requests and limits on every container to prevent noisy neighbor problems and enable effective autoscaling
Start with GKE Autopilot for new projects unless you need specific node configurations that require Standard mode