Hero image for Building a Cost-Efficient GitLab Runner Fleet: From Single Server to Auto-Scaled Infrastructure

Building a Cost-Efficient GitLab Runner Fleet: From Single Server to Auto-Scaled Infrastructure


Your team’s CI/CD pipelines are waiting 20 minutes in queue during peak hours, burning developer time and sprint velocity. Meanwhile, your three dedicated GitLab Runners sit at 15% utilization overnight. You’ve got either feast or famine—developers staring at “Pending” status during morning deployments, or idle hardware bleeding infrastructure costs during off-hours.

This isn’t a resource problem. It’s an architecture problem.

Most teams start with the obvious fix: spin up more runners. But static capacity just shifts the problem—you’re either over-provisioned for typical load or under-provisioned for peaks. The real solution is treating your runner infrastructure like any other production service: elastic, monitored, and cost-optimized.

Building an auto-scaled runner fleet isn’t just about deploying GitLab’s autoscaling configuration. It’s about understanding the fundamental trade-offs in executor models, designing for burst capacity without cascading failures, and instrumenting your fleet to catch capacity issues before they hit developer workflows. The difference between a runner fleet that scales and one that falls over under load often comes down to decisions made in the foundation—before you ever touch autoscaling parameters.

The journey from a single runner to a production-grade fleet happens in stages, each with its own inflection point where the previous architecture breaks down. Let’s start with the foundation that determines everything else: choosing the right executor model for scale.

Understanding Runner Executors: Choosing Your Foundation

Before scaling your GitLab Runner infrastructure, you need to select the right executor type. This decision shapes your fleet’s performance characteristics, operational complexity, and cost trajectory. Choose wrong, and you’ll hit scaling bottlenecks long before reaching your target workload.

Visual: Comparison of executor performance characteristics and scaling limits

Docker Executor: The Sweet Spot for Most Fleets

The Docker executor strikes the best balance between isolation, performance, and operational overhead for medium-to-large runner deployments. Each job runs in a fresh container with defined resource limits, providing consistent build environments without the heavyweight orchestration of Kubernetes.

Performance characteristics matter at scale. A Docker executor on a dedicated c5.2xlarge instance handles 8-12 concurrent jobs with sub-30-second container startup times. Network throughput remains predictable because containers share the host’s network stack directly—critical when your CI/CD pulls gigabytes of dependencies daily.

The primary scaling bottleneck isn’t CPU or memory. It’s the Docker daemon itself. Beyond 15-20 concurrent containers on a single host, you’ll see daemon lock contention degrading job start times. This hard limit drives the architecture decision: horizontal scaling with multiple runner instances beats vertical scaling every time.

Docker-in-Docker: The Performance Tax

Running Docker-in-Docker (DinD) for jobs that build container images introduces measurable performance degradation. The nested virtualization overhead adds 15-25% to build times in production workloads, but the real problem emerges under concurrent load.

DinD creates network bottlenecks because traffic flows through multiple network namespaces—the outer container, the inner Docker daemon, and finally your build containers. At 10+ concurrent DinD jobs, you’ll observe network throughput dropping to 40-60% of baseline, even on hosts with multi-gigabit interfaces. Log streaming latency spikes. Registry pulls timeout. Jobs fail with inexplicable network errors.

The solution: use the Docker executor with volume-mounted Docker sockets for image builds, accepting the reduced isolation in exchange for native network performance. For true multi-tenancy requirements, move to Kubernetes with dedicated node pools.

Shell Executor: Maximum Performance, Minimum Isolation

The shell executor runs jobs directly on the host with zero containerization overhead. Build times drop 30-40% compared to Docker for compute-heavy workloads like compilation or test suites. A single m5.4xlarge shell executor processes workloads that would require three equivalent Docker executor hosts.

The tradeoff: jobs share the host environment completely. Dependency conflicts between projects require manual management. Malicious or buggy jobs can affect concurrent builds or compromise the runner itself. Shell executors work for trusted, homogeneous workloads in isolated networks—not for general-purpose CI/CD platforms.

Kubernetes Executor: When You’ve Outgrown Docker Machine

Kubernetes executors defer job scheduling to the cluster’s control plane, eliminating the per-runner concurrency ceiling. Your fleet scales to hundreds of concurrent jobs across node pools with heterogeneous instance types. This architecture becomes cost-effective above 50 concurrent jobs when runner management overhead exceeds cluster operational costs.

The next section walks through configuring your first production runner with Docker Machine—the fastest path from single-server setup to basic auto-scaling.

Configuring Your First Production Runner with Docker+Machine

Once you’ve chosen Docker+Machine as your executor, the configuration becomes critical. A poorly configured runner creates bottlenecks, while an over-provisioned one wastes resources. This section walks through a battle-tested configuration that handles real production workloads.

Registration and Authentication

GitLab runners authenticate using registration tokens (legacy) or runner authentication tokens (recommended for GitLab 16.0+). For production environments, create a project-specific or group-level runner rather than using instance-wide runners, which gives you granular access control.

The authentication token serves as the runner’s long-lived credential after initial registration. Unlike the registration token, which you use once during setup, the authentication token persists in your config.toml and authenticates every job request. Guard this token carefully—anyone with access can execute jobs on your infrastructure and potentially access secrets, cloud credentials, or production deployment capabilities.

Register your runner with:

register-runner.sh
gitlab-runner register \
--non-interactive \
--url "https://gitlab.example.com" \
--token "glrt-x8K2mP9vNqL4sT6wY" \
--executor "docker+machine" \
--docker-image "alpine:latest" \
--description "production-runner-01" \
--tag-list "docker,production,x86_64" \
--run-untagged="false" \
--locked="false"

Setting --run-untagged="false" prevents this runner from picking up arbitrary jobs—critical for security and resource management. The --locked="false" flag allows the runner to execute jobs from different projects within the same group, useful for shared infrastructure. For maximum security, use --locked="true" and create dedicated runners per project.

The registration process writes credentials and configuration to /etc/gitlab-runner/config.toml. Back up this file immediately—losing it means re-registering and updating CI configurations that reference runner-specific tags. Store backups in your secrets management system, not in version control.

Concurrent Job Limits and Resource Allocation

The concurrent setting defines how many jobs run simultaneously across your entire fleet. This is not per-machine—it’s the global limit for this runner registration. Set this based on your expected job throughput and machine provisioning capacity.

A common mistake is setting concurrent too low, causing jobs to queue even when machines sit idle. Calculate your target concurrency by multiplying your peak job arrival rate by average job duration. If you receive 20 jobs per minute and each takes 3 minutes, you need concurrent = 60 to avoid queuing. Add 20% headroom for traffic spikes.

config.toml
concurrent = 10
[[runners]]
name = "production-runner-01"
url = "https://gitlab.example.com"
token = "glrt-x8K2mP9vNqL4sT6wY"
executor = "docker+machine"
[runners.docker]
image = "alpine:latest"
privileged = false
disable_cache = false
volumes = ["/cache", "/builds:/builds:rw"]
shm_size = 2147483648
[runners.machine]
IdleCount = 2
IdleTime = 600
MaxBuilds = 20
MachineName = "gitlab-runner-%s"
MachineDriver = "amazonec2"
MachineOptions = [
"amazonec2-region=us-east-1",
"amazonec2-instance-type=t3.medium",
"amazonec2-vpc-id=vpc-0a1b2c3d4e5f6g7h8",
"amazonec2-subnet-id=subnet-9i8h7g6f5e4d3c2b1",
"amazonec2-zone=a"
]

The IdleCount = 2 maintains two warm machines ready to accept jobs immediately, eliminating the 60-90 second machine provisioning delay. IdleTime = 600 keeps idle machines alive for 10 minutes after their last job—tune this based on your job frequency patterns. MaxBuilds = 20 destroys and recreates machines after 20 jobs, preventing disk accumulation and ensuring fresh state.

The shm_size parameter allocates 2GB of shared memory for /dev/shm, preventing out-of-memory crashes in containerized browsers and Node.js build tools that rely on shared memory. Chrome and Puppeteer fail with cryptic errors when shared memory is insufficient—this setting prevents those failures.

Cache volumes significantly impact build performance. The /cache volume persists between jobs on the same machine, storing package managers’ download directories. Map your package manager paths here: /cache/npm for npm, /cache/pip for Python, /cache/go for Go modules. Combine this with GitLab’s distributed cache using cache.key directives in your .gitlab-ci.yml to share artifacts across the entire runner fleet, not just per-machine.

The /builds:/builds:rw volume gives containers read-write access to the build directory. Some CI tools expect to modify files outside their working directory—this volume prevents permission errors while maintaining isolation between jobs.

Implementing Tag-Based Job Routing

Tags route jobs to appropriate runners. Design a tag taxonomy that reflects your infrastructure capabilities, not team structure:

.gitlab-ci.yml
build:docker:
stage: build
tags:
- docker
- production
- x86_64
script:
- docker build -t myapp:${CI_COMMIT_SHA} .
build:arm:
stage: build
tags:
- docker
- production
- arm64
script:
- docker build -t myapp:${CI_COMMIT_SHA}-arm .

Avoid creating tags like team-backend or project-checkout—these couple your CI configuration to organizational structure. Instead, use capability-based tags: docker, kubernetes, gpu, high-memory. This approach lets you reorganize teams without breaking pipelines.

Layer your tags from general to specific. The hierarchy docker > production > x86_64 allows you to route jobs broadly (any docker runner) or narrowly (specifically x86_64 production runners). Jobs match when all specified tags exist on the runner—a job tagged docker,gpu only runs on runners with both tags, never on runners with just docker.

Consider creating environment-specific runners with tags like staging and production. This prevents accidental deployments when a developer’s feature branch triggers a production job. Combined with --run-untagged="false", this creates strong isolation between environments.

With this foundation in place, you’re ready to add auto-scaling. The next section implements Docker Machine’s auto-scaling on AWS, where machines spin up on demand and terminate when idle.

Implementing Auto-Scaling with Docker Machine on AWS

The Docker+Machine executor transforms a static runner into a dynamic fleet that provisions EC2 instances on-demand. When configured correctly, this setup reduces idle costs to near-zero while maintaining capacity for burst workloads.

Core Auto-Scaling Parameters

Three parameters control your fleet’s economics: IdleCount, IdleTime, and MaxBuilds. These determine when machines spin up, how long they wait, and when they terminate.

IdleCount sets the number of machines kept warm and ready. Setting this to 1 ensures instant job pickup for the first concurrent build, while additional jobs trigger new instances. Setting it to 0 maximizes cost savings but introduces a 60-90 second cold start for the first job.

IdleTime defines how long an idle machine waits before termination. The default 600 seconds (10 minutes) works for teams with frequent commits. Lower this to 300 seconds if your pipeline runs are sporadic, or increase to 900 seconds if builds cluster together during business hours.

MaxBuilds limits how many jobs a single machine processes before forced retirement. This prevents state accumulation and credential leakage between jobs. Set this to 20-30 for standard pipelines, or as low as 1 for security-sensitive workloads where complete isolation matters.

config.toml
concurrent = 50
[[runners]]
name = "aws-autoscale-runner"
url = "https://gitlab.example.com/"
token = "glrt-abc123xyz789def456"
executor = "docker+machine"
[runners.docker]
image = "alpine:latest"
privileged = true
volumes = ["/cache"]
[runners.machine]
IdleCount = 1
IdleTime = 300
MaxBuilds = 25
MachineDriver = "amazonec2"
MachineName = "gitlab-runner-%s"
MachineOptions = [
"amazonec2-region=us-east-1",
"amazonec2-instance-type=t3.medium",
"amazonec2-vpc-id=vpc-0a1b2c3d4e5f6g7h8",
"amazonec2-subnet-id=subnet-9i8j7k6l5m4n3o2p1",
"amazonec2-security-group=sg-runner-fleet",
"amazonec2-use-private-address=true",
"amazonec2-tags=Environment,production,Team,platform"
]

Spot Instances: 70% Cost Reduction

Spot instances deliver the same performance as on-demand instances at 60-80% lower prices. The tradeoff is interruption risk—AWS can reclaim spot instances with two minutes notice when capacity tightens.

Enable spot instances by adding the amazonec2-request-spot-instance flag and setting a maximum price. Leaving amazonec2-spot-price empty uses the current spot market rate, which typically runs 30-40% of on-demand pricing.

config.toml
[runners.machine]
MachineOptions = [
"amazonec2-region=us-east-1",
"amazonec2-instance-type=t3.medium",
"amazonec2-request-spot-instance=true",
"amazonec2-spot-price=",
"amazonec2-block-duration-minutes=60"
]

The amazonec2-block-duration-minutes parameter prevents interruptions for the specified duration (60, 120, 180, 240, 300, or 360 minutes). This costs 30-50% more than standard spot but eliminates interruption risk for that window. For pipelines under one hour, setting this to 60 minutes provides predictable execution at still-significant savings.

💡 Pro Tip: Mix spot and on-demand instances using runner tags. Configure one runner with tags = ["spot", "standard"] using spot instances, and another with tags = ["on-demand", "critical"] using traditional instances. Tag critical deployment jobs with on-demand to guarantee completion.

Handling Spot Interruptions

Spot interruptions mid-build fail the job, but GitLab’s automatic retry mechanism handles most cases transparently. Configure retry counts in your .gitlab-ci.yml to accommodate interruption frequency:

deploy:production:
script:
- ./deploy.sh
tags:
- spot
retry:
max: 2
when:
- runner_system_failure
- stuck_or_timeout_failure

For jobs exceeding one hour where block duration isn’t economical, split them into smaller stages with artifacts passed between stages. A spot interruption then only retries the failed stage, not the entire workflow.

Monitor spot interruption rates through CloudWatch metrics. If interruptions exceed 5% of builds in a specific availability zone, add multiple subnets across zones to the MachineOptions array, or switch that instance type to on-demand during high-demand periods.

With auto-scaling configured and spot instances running, a 50-runner fleet that previously cost $3,600/month in dedicated EC2 instances now costs $400-600/month, spinning up only when needed. The next challenge is managing that fleet when Docker Machine’s limitations appear at scale.

Kubernetes Runner Fleet: When Docker Machine Isn’t Enough

Docker Machine served GitLab well for years, but it’s deprecated. More importantly, if you’re already running workloads on Kubernetes, spinning up separate VMs for CI jobs creates unnecessary infrastructure fragmentation. The Kubernetes executor lets your GitLab Runners schedule jobs as pods directly on your existing cluster, bringing native orchestration, better resource utilization, and simpler infrastructure management.

Migration Strategy: Parallel Operation First

Don’t flip a switch and migrate everything at once. Register a new runner with the Kubernetes executor alongside your existing Docker Machine runners, then gradually shift jobs using GitLab’s tag-based routing:

config.toml
concurrent = 10
[[runners]]
name = "k8s-runner-production"
url = "https://gitlab.company.com"
token = "glrt-AbCdEf1234567890"
executor = "kubernetes"
[runners.kubernetes]
host = "https://kubernetes.default.svc"
namespace = "gitlab-runner"
privileged = false
cpu_request = "500m"
memory_request = "512Mi"
service_cpu_request = "200m"
service_memory_request = "256Mi"
helper_cpu_request = "100m"
helper_memory_request = "128Mi"
poll_timeout = 180
pod_labels_overwrite_allowed = ".*"

Tag this runner with kubernetes in GitLab’s runner settings, then update select projects to use tags: [kubernetes] in their .gitlab-ci.yml. Monitor job success rates before expanding.

Start with non-critical projects—internal tools, documentation sites, or development branches. This gives you time to identify configuration issues, tune resource allocations, and build confidence in the new infrastructure before migrating production builds. Track failure rates and job duration metrics in GitLab’s CI/CD analytics. If Kubernetes jobs show higher failure rates or significantly longer runtimes, investigate pod startup times, image pull performance, and network policies that might be blocking required services.

Plan for a migration window of 4-6 weeks. Week one covers initial setup and testing with low-risk projects. Weeks two through four involve gradual expansion to more critical workloads, adjusting resource configurations based on observed usage patterns. The final weeks handle edge cases—jobs with unusual requirements, legacy pipeline configurations, and projects that need custom pod templates. Keep your Docker Machine runners online during this entire period as a fallback option.

Pod Templates: Right-Sizing for Job Types

The default pod configuration works for basic builds, but production workloads need customization. Use pod templates to match resources to job requirements—don’t give every job 4 CPU cores when most need 500 millicores.

.gitlab-ci.yml
variables:
KUBERNETES_CPU_REQUEST: "500m"
KUBERNETES_CPU_LIMIT: "2"
KUBERNETES_MEMORY_REQUEST: "1Gi"
KUBERNETES_MEMORY_LIMIT: "4Gi"
build:
stage: build
tags: [kubernetes]
script:
- npm ci
- npm run build
e2e-tests:
stage: test
tags: [kubernetes]
variables:
KUBERNETES_CPU_REQUEST: "2"
KUBERNETES_CPU_LIMIT: "4"
KUBERNETES_MEMORY_REQUEST: "4Gi"
KUBERNETES_MEMORY_LIMIT: "8Gi"
script:
- npm run test:e2e

Different job types have radically different resource profiles. Static site builds might complete comfortably with 500m CPU and 512Mi memory. Frontend builds with webpack or Vite need 1-2 CPU cores and 2-4Gi memory to avoid thrashing. End-to-end test suites running browsers need 2-4 cores and 4-8Gi. Database integration tests benefit from fast CPU but modest memory. Profile your actual jobs before setting these values—use kubectl top pods to observe real resource consumption patterns, not guesses about what jobs might need.

For jobs requiring Docker-in-Docker (building container images), enable privileged mode selectively:

config.toml
[runners.kubernetes]
namespace = "gitlab-runner-privileged"
privileged = true
[[runners.kubernetes.volumes.empty_dir]]
name = "docker-storage"
mount_path = "/var/lib/docker"
medium = "Memory"

💡 Pro Tip: Use memory-backed emptyDir volumes for Docker storage in privileged pods. This speeds up image builds significantly and automatically cleans up after jobs complete.

Consider alternative approaches to privileged Docker-in-Docker for image builds. Kaniko runs unprivileged and builds container images from Dockerfiles without requiring a Docker daemon. Buildah offers similar capabilities with better caching behavior. Both eliminate security concerns around privileged containers and integrate cleanly with Kubernetes RBAC policies. Reserve privileged mode for jobs that genuinely need it—integration tests that start database containers, or specialized builds that require kernel-level access.

Resource Requests vs Limits: The Node Saturation Problem

Setting resource limits prevents runaway jobs from consuming entire nodes, but the relationship between requests and limits determines scheduling behavior. Kubernetes schedules pods based on requests but enforces limits at runtime.

Set requests to typical usage and limits to maximum acceptable usage:

config.toml
[runners.kubernetes]
cpu_request = "1"
cpu_limit = "2"
memory_request = "2Gi"
memory_limit = "4Gi"

If you set requests equal to limits, Kubernetes guarantees those resources but reduces scheduling density—your nodes sit underutilized. If requests are too low compared to limits, you overcommit nodes, and jobs start getting OOMKilled when simultaneous jobs hit their limits.

Monitor actual resource consumption with kubectl top pods -n gitlab-runner for two weeks, then set requests to the 75th percentile and limits to the 95th percentile of observed usage. This balancing act maximizes node utilization while preventing resource contention. Your cluster’s bin-packing efficiency depends on accurate requests—Kubernetes can’t schedule pods efficiently if every job requests 4Gi but actually uses 800Mi.

Memory limits deserve special attention because memory is non-compressible. If a pod exceeds its memory limit, Kubernetes immediately OOMKills it—there’s no throttling or graceful degradation. CPU limits get throttled, slowing jobs but not killing them. Set memory requests conservatively based on observed usage, but add headroom in limits for occasional spikes. A job that typically uses 1.5Gi but occasionally hits 2.2Gi during dependency installation needs a 2Gi request and 3Gi limit, not a 1.5Gi request that causes intermittent OOMKills.

Watch for node pressure events in your cluster monitoring. When nodes run low on allocatable resources, Kubernetes starts evicting pods to reclaim capacity. CI job pods are prime eviction candidates because they typically don’t have disruption budgets. If you see frequent pod evictions during peak CI hours, either your requests are set too low (causing overcommitment), or you need to scale your node pool to handle concurrent job volume.

With Kubernetes executors handling orchestration, your runner fleet scales naturally with your cluster’s autoscaling configuration. But scaling introduces new monitoring challenges—how do you debug job failures across hundreds of ephemeral pods? The next section covers production monitoring strategies that surface problems before they impact developer velocity.

Fleet Management: Monitoring and Debugging at Scale

A runner fleet that looks healthy isn’t always healthy. Jobs queue silently, instances fail to deregister, and memory leaks accumulate until developers start asking why their pipelines take 45 minutes to start. Proactive monitoring catches these issues before they impact delivery velocity.

Essential Metrics for Runner Health

GitLab Runner exposes Prometheus metrics on port 9252 by default. The metrics that matter most reveal both capacity problems and operational failures.

Monitor gitlab_runner_jobs with status labels to track queue depth. When queued exceeds your idle capacity for more than 5 minutes, you’re underprovisioned. Track gitlab_runner_concurrent against gitlab_runner_limit to see how close you are to hitting configured maximums.

prometheus-alerts.yml
groups:
- name: gitlab_runner
interval: 30s
rules:
- alert: RunnerQueueDepth
expr: gitlab_runner_jobs{state="queued"} > 10
for: 5m
labels:
severity: warning
annotations:
summary: "High job queue depth on {{ $labels.runner }}"
- alert: RunnerJobFailureRate
expr: |
rate(gitlab_runner_jobs{state="failed"}[5m])
/ rate(gitlab_runner_jobs[5m]) > 0.2
for: 10m
labels:
severity: critical
annotations:
summary: "Runner {{ $labels.runner }} has >20% failure rate"
- alert: RunnerStaleInstances
expr: gitlab_runner_zombie_jobs > 0
for: 15m
labels:
severity: warning
annotations:
summary: "{{ $value }} zombie jobs detected on {{ $labels.runner }}"

The gitlab_runner_errors_total counter broken down by type label exposes systemic issues. A spike in executor_provider errors indicates infrastructure problems—AWS API throttling, spot instance interruptions, or Kubernetes node pressure. Rising prepare_script_failure counts suggest image registry issues or network problems.

Debugging Stuck Jobs and Zombie Runners

Jobs that show “running” in GitLab but consume no resources are the most common fleet pathology. This happens when the executor loses connection to the coordinator but the job state doesn’t update.

Check runner logs for context canceled or connection refused errors. These indicate network partitions or coordinator restarts. The runner retries registration every 3 seconds by default, but jobs already assigned get orphaned.

Terminal window
## Find zombie processes on a runner host
docker ps -a | grep -E "runner-[a-f0-9]+-project-[0-9]+-concurrent-[0-9]+"
## Check for jobs running longer than expected
gitlab-runner verify --delete

For Docker Machine executors, instances that fail to terminate leave zombie VMs consuming costs without processing jobs. Set MachineOptions.engine-install-url to a reliable mirror—the default Docker install script URL has caused thousands of stuck machines when it experiences downtime.

💡 Pro Tip: Set IdleTime to 1200 (20 minutes) rather than the default 600. Shorter idle times cause more frequent machine cycling, which increases the likelihood of hitting cloud provider API rate limits during scale-down storms.

Runner Registration Health

A runner that appears online in GitLab but never receives jobs has registration token or tag mismatches. Use gitlab-runner list on the host to verify registration against the coordinator. The output shows tokens, descriptions, and executor types—compare against what GitLab reports in the admin panel.

Tag mismatches are silent killers. A job with tags: [docker, production] won’t run on a runner registered with only [docker]. Set [runners.TagList] explicitly in config.toml rather than relying on registration-time tags, which don’t persist through runner restarts without --locked=false.

With monitoring in place and debugging workflows established, the next question becomes: what is this infrastructure actually costing, and where can you optimize?

Cost Optimization: Real Numbers from Production Fleets

Understanding the cost implications of different runner architectures separates theoretical scaling from production viability. Here’s what three years of fleet management across multiple organizations reveals about the actual economics.

Visual: Cost comparison between dedicated, auto-scaled, and Kubernetes runner deployments

Dedicated vs Auto-Scaled: The Math

A dedicated c5.xlarge instance running 24/7 costs approximately $1,460/year on AWS. This runner sits idle during off-hours, delivering roughly 30% utilization in typical development workflows. The same workload on an auto-scaled fleet with IdleTime=600 (10 minutes) drops to $520/year—a 64% reduction—while maintaining sub-minute job start times during business hours.

The break-even point sits around 40% sustained utilization. If your CI/CD runs continuously—perhaps supporting multiple time zones or automated testing—dedicated instances become cost-competitive. Below that threshold, auto-scaling wins decisively.

Kubernetes runners introduce different economics. A three-node cluster with t3.medium instances ($876/year baseline) supports 10-15 concurrent jobs through pod density, making it cost-effective beyond 20-30 jobs daily. However, cluster overhead (control plane, monitoring, networking) adds $200-300/year that single-runner deployments avoid.

Tuning IdleTime for Your Workflow

IdleTime controls how long runners persist after completing jobs. The default 600 seconds balances responsiveness with waste, but production demands tuning.

Setting IdleTime=300 reduces costs by 15-20% in bursty workloads where commits cluster around specific hours. Developers tolerate 30-second delays for cold starts when jobs arrive sporadically. Conversely, IdleTime=1200 makes sense for teams running continuous integration where the next job typically arrives within minutes—the extra $50/year prevents the frustration of waiting for instance provisioning.

Monitor your job arrival patterns. If 80% of jobs arrive within five minutes of the previous job, increase IdleTime to match. If gaps exceed 20 minutes regularly, reduce it.

Cache Strategy Economics

Distributed caching transforms both job duration and infrastructure costs. A well-configured S3 cache bucket reduces average job time by 40-60% in dependency-heavy projects, translating directly to compute savings.

The tradeoff: network egress costs $0.09/GB on AWS. A 500MB cache pulled 100 times daily costs $4.50/month in transfer fees but saves 15-20 minutes of compute time per job. At $0.17/hour for c5.xlarge instances, that’s $42.50/month saved—a 9:1 return.

Cache invalidation strategy matters. Aggressive caching with weekly expiration cuts costs but risks stale dependencies. Daily invalidation increases network costs by 30% while ensuring reproducibility.

With cost optimization established, the next critical consideration is securing your runner fleet against the unique threats of multi-tenant CI/CD environments.

Security Hardening for Multi-Tenant Runner Fleets

Running GitLab Runners across multiple teams or projects introduces attack vectors that don’t exist in single-tenant setups. A compromised job can leak secrets, access other projects’ data, or pivot into your infrastructure. Here’s how to lock down your fleet.

Isolating Runners by Sensitivity Level

Not all workloads deserve the same trust boundary. Segregate runners into tiers based on what they can access:

Public tier: Runs untrusted code from external contributors or open-source projects. Deploy these with zero access to internal networks, no Docker socket mounting, and ephemeral instances that terminate after each job. Tag these runners with public-untrusted.

Internal tier: Handles standard development workloads. These runners access internal package repositories and staging environments but never production secrets. Use network policies to restrict egress to specific CIDR ranges. Tag with internal-dev.

Production tier: Executes deployment pipelines with access to production credentials. Limit these runners to specific projects using GitLab’s protected runners feature (available in Premium/Ultimate). Enable audit logging for all job executions and store logs in immutable storage.

Configure runner registration to enforce this separation:

[[runners]]
name = "production-runner"
limit = 10
run_untagged = false
locked = true

The locked = true setting prevents projects from stealing the runner by matching tags.

Preventing Cache Poisoning

GitLab’s distributed cache is a common vulnerability. By default, caches are scoped by key alone, meaning Project A can inject malicious dependencies into Project B’s cache if they use the same key.

Always scope caches to the project level in your runner configuration:

[runners.cache]
Type = "s3"
Shared = false

Setting Shared = false isolates each project’s cache to its own S3 prefix. For multi-tenant Kubernetes runners, use separate cache buckets per sensitivity tier.

Audit your .gitlab-ci.yml files for cache keys that include user-controlled variables like $CI_COMMIT_REF_NAME. An attacker can create a branch named ../../other-project to traverse cache paths.

Network Segmentation

Restrict runner network access to the minimum required services. In AWS, place runners in dedicated VPCs with security groups that deny all inbound traffic and whitelist only:

  • GitLab instance HTTPS endpoint
  • Package registry domains (npm, PyPI, Docker Hub)
  • Cloud provider APIs (if using auto-scaling)

Deny access to AWS metadata endpoints (169.254.169.254) to prevent credential theft. For Kubernetes runners, implement NetworkPolicies that block pod-to-pod communication and restrict egress to known-good IP ranges.

With these controls in place, your runner fleet can safely handle workloads of varying trust levels without creating lateral movement opportunities for attackers. Next, we’ll examine the operational playbooks that keep this infrastructure running smoothly.

Key Takeaways

  • Start with Docker executor and Docker Machine auto-scaling on spot instances to reduce costs by 60-70% compared to dedicated runners
  • Monitor queue depth and job wait times to tune IdleCount and IdleTime parameters for your team’s usage patterns
  • Migrate to Kubernetes executor when you need pod-level resource control or exceed Docker Machine’s ~100 concurrent job limit
  • Implement runner tagging strategy from day one to route jobs based on resource needs and security requirements