Hero image for Building Production-Ready ECR Registries with Terraform

Building Production-Ready ECR Registries with Terraform


You’ve pushed your first Docker image to ECR manually through the AWS console, but now you need to replicate this setup across dev, staging, and prod—with proper access controls, image scanning, and lifecycle policies. Clicking through the console 30 times isn’t the answer.

Manual ECR configuration creates three immediate problems. First, you can’t audit changes. When someone modifies a lifecycle policy or grants cross-account access, there’s no paper trail beyond CloudTrail logs that require forensic analysis to understand. Second, environments drift. Your staging registry has different scanning settings than production because someone made a quick fix last month and forgot to document it. Third, replication is painful. Standing up a new environment means either screenshots of console settings or a hastily written runbook that’s already out of date.

These problems compound as your infrastructure grows. A single misconfigured registry exposes vulnerable images to production. Inconsistent lifecycle policies fill your storage with stale images costing hundreds per month. Missing cross-account permissions break CI/CD pipelines at 2am. You need these configurations defined in code, version-controlled, and automatically applied.

Terraform solves this by treating your ECR registries as declarative infrastructure. Every setting—from scan-on-push to repository policies—exists in HCL files that you review, version, and deploy atomically. Changes go through pull requests. Configurations stay consistent across environments. And when you need to audit who changed what, you check git history instead of parsing JSON logs.

The foundation of production-ready ECR infrastructure starts with understanding how Terraform models registry resources and why declarative configuration prevents the drift that manual changes guarantee.

Why Terraform for ECR Management

Managing Amazon Elastic Container Registry (ECR) repositories through the AWS Console works fine for a proof of concept or a single development environment. But when you’re responsible for deploying containerized applications across dev, staging, and production—potentially spanning multiple AWS accounts and regions—manual setup becomes a reliability bottleneck.

Visual: Infrastructure as Code workflow for ECR management

The Manual Configuration Problem

Every time you create an ECR repository through the Console, you’re making a series of decisions: encryption settings, scan-on-push configuration, image tag mutability, lifecycle policies, and cross-account access permissions. Repeat this process across three environments and you’ve already introduced opportunities for configuration drift. Maybe production has scan-on-push enabled, but staging doesn’t. Perhaps one environment allows mutable tags while another enforces immutability. These inconsistencies aren’t just aesthetic issues—they create security gaps and make troubleshooting significantly harder when behavior differs between environments.

The real pain emerges during incident response. When a vulnerability scanner flags an issue at 2 AM, you need to know exactly how your registries are configured. With manual setups, that knowledge lives in tribal memory or scattered documentation that’s inevitably out of date. There’s no audit trail showing when encryption was enabled, who modified the lifecycle policy, or why certain IAM principals have pull access.

Infrastructure as Code: The Terraform Advantage

Terraform transforms ECR management from a series of point-and-click operations into version-controlled, peer-reviewed code. Your repository configuration becomes a Terraform file that declares exactly what resources exist and how they’re configured. This single source of truth eliminates drift because you can verify actual infrastructure state against your intended configuration at any time.

The auditability benefit extends beyond simple configuration tracking. Every change to your ECR setup flows through your standard code review process. When someone proposes enabling cross-region replication or modifying retention policies, the pull request captures the context, the reviewers provide oversight, and Git preserves the complete history. This creates a paper trail that satisfies compliance requirements while making it trivial to understand why specific decisions were made.

Reproducibility is where Terraform truly shines for ECR management. Creating an identical registry in a new region or AWS account becomes a matter of changing variable values and running terraform apply. The same configuration that governs your production registry in us-east-1 can instantiate a disaster recovery copy in eu-west-1, complete with matching encryption, scanning policies, and lifecycle rules. This consistency isn’t just convenient—it’s essential for maintaining security posture across your entire container infrastructure.

With this foundation established, let’s examine how to implement a basic ECR repository using Terraform.

Basic ECR Repository with Terraform

Let’s create a production-ready ECR repository with Terraform. This section focuses on the foundational configuration that serves as the building block for more advanced patterns.

Provider and Backend Configuration

Start by configuring the AWS provider and remote state backend. Store your Terraform state in S3 with DynamoDB locking to enable team collaboration and prevent concurrent modifications.

provider.tf
terraform {
required_version = ">= 1.0"
required_providers {
aws = {
source = "hashicorp/aws"
version = "~> 5.0"
}
}
backend "s3" {
bucket = "my-terraform-state-bucket"
key = "ecr/terraform.tfstate"
region = "us-east-1"
encrypt = true
dynamodb_table = "terraform-state-lock"
}
}
provider "aws" {
region = "us-east-1"
default_tags {
tags = {
ManagedBy = "Terraform"
Environment = "production"
Project = "container-infrastructure"
}
}
}

The default_tags block automatically applies tags to all resources, ensuring consistent labeling across your infrastructure without repeating tags in every resource definition. This becomes particularly valuable when managing dozens of ECR repositories—tags propagate to all resources created by this provider configuration, including repositories, lifecycle policies, and replication configurations.

Before running terraform init, ensure your S3 bucket and DynamoDB table exist. The state bucket must have versioning enabled to recover from accidental state corruption, while the DynamoDB table requires a primary key named LockID (string type) to coordinate lock acquisition across team members.

Core ECR Repository Resource

The aws_ecr_repository resource defines your container registry with critical security and operational settings:

ecr.tf
resource "aws_ecr_repository" "app" {
name = "my-application"
image_tag_mutability = "IMMUTABLE"
encryption_configuration {
encryption_type = "AES256"
}
image_scanning_configuration {
scan_on_push = true
}
tags = {
Application = "my-application"
Team = "platform"
}
}
output "repository_url" {
description = "ECR repository URL for docker push commands"
value = aws_ecr_repository.app.repository_url
}
output "repository_arn" {
description = "ECR repository ARN for IAM policies"
value = aws_ecr_repository.app.arn
}

Image Tag Mutability determines whether tags can be overwritten. Set this to IMMUTABLE to prevent accidental overwrites of existing image tags. When immutability is enabled, pushing my-app:v1.0 twice results in an error on the second push, protecting you from silently replacing production images. Use MUTABLE only in development environments where you need to repeatedly push the same tag during rapid iteration. Note that immutability enforcement happens at the tag level—you can still delete tags and recreate them, but you cannot overwrite an existing tag reference.

Encryption Configuration uses AES256 encryption by default, which AWS manages without additional setup or key rotation concerns. For enhanced security in regulated environments, specify encryption_type = "KMS" and provide a kms_key ARN to use customer-managed keys with full audit trails through CloudTrail. KMS encryption enables you to implement key policies that restrict who can decrypt repository images, adding an extra layer of access control beyond IAM policies. Be aware that KMS-encrypted repositories incur additional API call costs for every image layer operation.

Image Scanning with scan_on_push = true automatically runs vulnerability scans when you push images. ECR uses Amazon Inspector to detect software vulnerabilities in both operating system packages and programming language libraries, generating findings you can query via the AWS CLI or integrate with security tools like AWS Security Hub. This happens asynchronously, so pushes complete quickly without waiting for scan results. Scan findings typically appear within 15 minutes for most images, though large multi-layer images may take longer. You can configure EventBridge rules to trigger notifications when scans detect critical vulnerabilities, enabling automated response workflows.

The outputs expose essential repository attributes for downstream automation. The repository_url follows the format {account_id}.dkr.ecr.{region}.amazonaws.com/{repository_name} and serves as the target for docker push commands in CI/CD pipelines. The repository_arn provides the unique resource identifier required for IAM policy statements and cross-account access configurations.

Deploying Your Configuration

Deploy this configuration with terraform init to download the AWS provider and initialize the backend, followed by terraform plan to preview changes, and finally terraform apply to create the repository. Terraform displays the planned changes before execution, showing exactly what resources will be created, modified, or destroyed. After successful deployment, Terraform outputs the repository URL and ARN, which you can reference in other configurations using terraform output repository_url.

Now that you have a functional repository, the next critical step is implementing lifecycle policies to manage image retention and prevent storage costs from spiraling out of control.

Image Lifecycle Policies and Retention

ECR storage costs accumulate fast in active development environments. A single repository can balloon to hundreds of gigabytes when teams push multiple images daily without cleanup. Lifecycle policies automate image removal based on age, count, or tag status, preventing storage bloat while preserving images needed for rollbacks.

Priority-Based Rule Evaluation

ECR evaluates lifecycle rules in priority order, processing lower numbers first. Once an image matches a rule, ECR excludes it from subsequent rules. This priority system lets you protect critical images while aggressively pruning everything else.

Understanding this exclusion mechanism is crucial for policy design. If rule 1 matches an image, rules 2-10 never evaluate that image—even if their criteria would also match. This means your most protective rules should have the lowest priority numbers, with increasingly aggressive cleanup rules following behind.

Here’s a comprehensive lifecycle policy that handles common retention scenarios:

lifecycle_policy.tf
resource "aws_ecr_repository" "app" {
name = "production-api"
}
resource "aws_ecr_lifecycle_policy" "app" {
repository = aws_ecr_repository.app.name
policy = jsonencode({
rules = [
{
rulePriority = 1
description = "Keep last 10 production images"
selection = {
tagStatus = "tagged"
tagPrefixList = ["prod-"]
countType = "imageCountMoreThan"
countNumber = 10
}
action = {
type = "expire"
}
},
{
rulePriority = 2
description = "Keep last 5 staging images"
selection = {
tagStatus = "tagged"
tagPrefixList = ["staging-"]
countType = "imageCountMoreThan"
countNumber = 5
}
action = {
type = "expire"
}
},
{
rulePriority = 3
description = "Expire untagged images after 7 days"
selection = {
tagStatus = "untagged"
countType = "sinceImagePushed"
countUnit = "days"
countNumber = 7
}
action = {
type = "expire"
}
},
{
rulePriority = 4
description = "Keep only last 3 feature branch images"
selection = {
tagStatus = "tagged"
tagPrefixList = ["feature-"]
countType = "imageCountMoreThan"
countNumber = 3
}
action = {
type = "expire"
}
}
]
})
}

Rule 1 processes first, protecting the 10 most recent production images. Rule 2 then evaluates remaining images, keeping 5 staging versions. Rule 3 targets untagged images—typically intermediate builds from multi-stage Dockerfiles or failed CI runs that never received proper tags. These untagged layers serve as cache during builds but have no long-term value once a build completes.

The countType parameter determines how ECR selects images for removal. Use imageCountMoreThan to retain a fixed number of recent images regardless of age—ideal for production and staging where you need the last N deployable versions. Use sinceImagePushed with a time unit to expire images older than a threshold—better for cleanup jobs where age matters more than count.

💡 Pro Tip: Set untagged image retention to 3-7 days. Shorter windows reduce storage costs but can break builds if CI pipelines reference recent layer caches.

Balancing Cost and Rollback Safety

Production rollback requirements dictate your retention count. Most teams need 5-10 production images—enough to roll back through several releases without rebuilding from source. This range covers typical deployment cadences: teams shipping daily might need 10 images to cover two weeks of releases, while weekly deployers can retain fewer versions.

For staging environments, 3-5 images provide sufficient testing history while minimizing costs. Staging serves as a pre-production validation step, not a long-term archive. Once changes promote to production, staging images lose their value.

Feature branch images create the biggest storage challenge. Developers may push dozens of iterations during development, but you only need the latest few versions. Setting countNumber = 3 for feature branches strikes a balance: recent enough for debugging but aggressive enough to prevent accumulation. When branches merge, their images become orphaned, so cleanup happens naturally.

Consider your Mean Time to Recovery (MTTR) when setting retention counts. If deploying a fix typically takes 2 hours, you only need enough image history to cover that window plus buffer. Keeping 20+ production images suggests either overly cautious policies or lack of confidence in forward fixes.

Testing Lifecycle Policies

ECR provides a dry-run capability through the AWS CLI. Before applying a lifecycle policy in Terraform, validate which images would be removed:

Terminal window
aws ecr get-lifecycle-policy-preview \
--repository-name production-api \
--lifecycle-policy-text file://policy.json

The preview shows exactly which images match each rule, preventing accidental deletion of critical versions. Run this preview in a development repository first to validate your rule logic. Pay special attention to the imagePushedAt timestamps and imageTags in the output—these reveal whether your tag prefix patterns work as intended.

Lifecycle policies execute within 24 hours of matching conditions. You can’t trigger immediate deletion, so plan for a buffer period when cleaning up large repositories. Monitor the ECR events in CloudWatch to track policy execution and catch any unexpected deletions. The RepositoryPolicyExecutionStarted and RepositoryPolicyExecutionCompleted events show when cleanup runs complete.

After implementing a new lifecycle policy, check your repository’s image count daily for the first week. If counts drop too aggressively or not at all, adjust your rule priorities and count thresholds accordingly.

With automated cleanup handling retention, the next critical concern is security—ensuring only vulnerability-free images reach production through scanning and access controls.

Security: Scanning and Access Control

Container security begins at the registry. ECR provides built-in vulnerability scanning and fine-grained access controls that, when configured properly through Terraform, create a secure foundation for your container supply chain. These security measures protect against vulnerabilities in your images and ensure only authorized services can access your registries.

Automated Vulnerability Scanning

ECR’s image scanning detects software vulnerabilities in your container images using the Common Vulnerabilities and Exposures (CVEs) database. The service analyzes the packages and libraries in your images, comparing them against known vulnerability databases to identify security risks before they reach production environments.

Enable scanning on push to catch vulnerabilities immediately after image uploads:

ecr.tf
resource "aws_ecr_repository" "app" {
name = "production/api-service"
image_tag_mutability = "IMMUTABLE"
image_scanning_configuration {
scan_on_push = true
}
encryption_configuration {
encryption_type = "KMS"
kms_key = aws_kms_key.ecr.arn
}
tags = {
Environment = "production"
ManagedBy = "terraform"
}
}
resource "aws_kms_key" "ecr" {
description = "ECR encryption key"
deletion_window_in_days = 10
enable_key_rotation = true
}

Scan results appear in the ECR console and through the AWS CLI within minutes of pushing an image. Each finding includes the CVE identifier, affected package, severity level (Critical, High, Medium, Low, Informational), and links to detailed vulnerability information. Critical and high-severity findings trigger EventBridge events, which you can route to SNS topics or Lambda functions for automated alerting and integration with incident management systems.

ECR supports two scanning types: basic scanning using the open-source Clair project, and enhanced scanning powered by Amazon Inspector. Enhanced scanning provides continuous monitoring, broader OS coverage, and deeper analysis of application dependencies. For production workloads, enhanced scanning detects vulnerabilities in both operating system packages and language-specific libraries (Java, Python, Node.js, .NET, Go, and Ruby).

Configure lifecycle policies to automatically remove images with critical vulnerabilities after a specified retention period, ensuring vulnerable images don’t linger in your registry. Combine scanning with immutable tags to prevent overwriting images that have passed security review.

💡 Pro Tip: Use scan_on_push = true in non-production environments and configure scheduled scans in production to catch newly-discovered vulnerabilities in existing images. The CVE database updates continuously, so yesterday’s clean image might have new vulnerabilities today.

Cross-Account Repository Access

Platform teams often need to share container images across AWS accounts—development teams pulling base images from a central registry, or production accounts accessing images built in a dedicated CI/CD account. Repository policies grant this access using resource-based permissions that define who can perform specific actions on your ECR repositories.

Unlike IAM policies that attach to users and roles, repository policies attach directly to ECR repositories and specify which AWS principals can access them. This approach works well for cross-account scenarios where you control the repository but not the accessing principals.

ecr_policy.tf
data "aws_iam_policy_document" "ecr_cross_account" {
statement {
sid = "AllowPullFromProduction"
effect = "Allow"
principals {
type = "AWS"
identifiers = [
"arn:aws:iam::987654321098:root",
"arn:aws:iam::876543210987:root"
]
}
actions = [
"ecr:GetDownloadUrlForLayer",
"ecr:BatchGetImage",
"ecr:BatchCheckLayerAvailability"
]
}
statement {
sid = "AllowPushFromCICD"
effect = "Allow"
principals {
type = "AWS"
identifiers = ["arn:aws:iam::123456789012:role/github-actions-ecr-push"]
}
actions = [
"ecr:PutImage",
"ecr:InitiateLayerUpload",
"ecr:UploadLayerPart",
"ecr:CompleteLayerUpload"
]
}
}
resource "aws_ecr_repository_policy" "app" {
repository = aws_ecr_repository.app.name
policy = data.aws_iam_policy_document.ecr_cross_account.json
}

This policy allows two production accounts to pull images while restricting push access to a dedicated CI/CD role. Separate statements with distinct SIDs make policies easier to audit and modify. The aws_iam_policy_document data source generates valid JSON policy documents, handling proper escaping and formatting automatically.

When granting access to entire accounts using the root ARN (arn:aws:iam::ACCOUNT-ID:root), principals in those accounts still need corresponding IAM permissions to perform the actions. This dual-permission model (repository policy allowing the action, IAM policy granting the permission) provides defense in depth. For tighter control, specify individual role ARNs instead of account-level access.

IAM Permissions for CI/CD Pipelines

CI/CD pipelines need precise permissions to push images without over-privileged access. Create a dedicated IAM role with least-privilege permissions scoped to specific repositories and actions. Modern CI/CD platforms support OIDC federation, which eliminates long-lived AWS credentials in favor of short-lived tokens.

iam_cicd.tf
resource "aws_iam_role" "github_actions_ecr" {
name = "github-actions-ecr-push"
assume_role_policy = jsonencode({
Version = "2012-10-17"
Statement = [{
Effect = "Allow"
Principal = {
Federated = "arn:aws:iam::123456789012:oidc-provider/token.actions.githubusercontent.com"
}
Action = "sts:AssumeRoleWithWebIdentity"
Condition = {
StringEquals = {
"token.actions.githubusercontent.com:aud" = "sts.amazonaws.com"
"token.actions.githubusercontent.com:sub" = "repo:myorg/myrepo:ref:refs/heads/main"
}
}
}]
})
}
resource "aws_iam_role_policy" "ecr_push" {
name = "ecr-push-permissions"
role = aws_iam_role.github_actions_ecr.id
policy = jsonencode({
Version = "2012-10-17"
Statement = [
{
Effect = "Allow"
Action = [
"ecr:GetAuthorizationToken"
]
Resource = "*"
},
{
Effect = "Allow"
Action = [
"ecr:BatchCheckLayerAvailability",
"ecr:PutImage",
"ecr:InitiateLayerUpload",
"ecr:UploadLayerPart",
"ecr:CompleteLayerUpload"
]
Resource = aws_ecr_repository.app.arn
}
]
})
}

This role uses OIDC federation with GitHub Actions, eliminating long-lived AWS credentials from your CI/CD environment. The trust policy restricts assumption to workflows running on the main branch of a specific repository. Adjust the sub condition to match your repository structure—use wildcards like repo:myorg/myrepo:* to allow all branches, or specify environment-based conditions for deployment workflows.

The policy grants ecr:GetAuthorizationToken globally because ECR’s authorization endpoint operates at the account level, not per-repository. All other permissions scope to the specific repository ARN, preventing the pipeline from pushing to unintended registries. For pipelines managing multiple repositories, use wildcards in the resource ARN or attach multiple statements covering different repositories.

Add ecr:DescribeImages and ecr:ListImages if your CI/CD pipeline needs to check for existing tags before pushing, preventing duplicate builds. For pipelines that also deploy images, grant ecr:GetDownloadUrlForLayer, ecr:BatchGetImage, and ecr:BatchCheckLayerAvailability to enable pulling images during deployment verification.

With scanning enabled and access controls defined, your ECR repositories enforce security at the registry level. The next step is packaging these configurations into reusable Terraform modules that maintain consistency across environments.

Multi-Environment Patterns with Modules

Managing ECR repositories across multiple environments requires a strategic approach that balances consistency with flexibility. Terraform modules provide the ideal abstraction layer, enabling you to define repository configurations once and deploy them consistently across dev, staging, and production. This modular approach reduces duplication, enforces organizational standards, and simplifies maintenance as your infrastructure scales.

Building a Reusable ECR Module

A well-designed ECR module encapsulates repository creation, lifecycle policies, and security controls into a single reusable component. The module should expose configuration parameters as variables while maintaining sensible defaults that work across environments. Here’s a production-ready module structure:

modules/ecr/main.tf
variable "repository_name" {
description = "Name of the ECR repository"
type = string
}
variable "environment" {
description = "Environment name (dev, staging, prod)"
type = string
}
variable "image_tag_mutability" {
description = "Image tag mutability setting"
type = string
default = "MUTABLE"
}
variable "retention_count" {
description = "Number of images to retain"
type = number
default = 30
}
variable "scan_on_push" {
description = "Enable image scanning on push"
type = bool
default = true
}
variable "additional_tags" {
description = "Additional tags to apply to the repository"
type = map(string)
default = {}
}
resource "aws_ecr_repository" "this" {
name = "${var.environment}-${var.repository_name}"
image_tag_mutability = var.image_tag_mutability
image_scanning_configuration {
scan_on_push = var.scan_on_push
}
encryption_configuration {
encryption_type = "KMS"
}
tags = merge(
{
Environment = var.environment
ManagedBy = "terraform"
Repository = var.repository_name
},
var.additional_tags
)
}
resource "aws_ecr_lifecycle_policy" "this" {
repository = aws_ecr_repository.this.name
policy = jsonencode({
rules = [{
rulePriority = 1
description = "Keep last ${var.retention_count} images"
selection = {
tagStatus = "any"
countType = "imageCountMoreThan"
countNumber = var.retention_count
}
action = {
type = "expire"
}
}]
})
}
output "repository_url" {
value = aws_ecr_repository.this.repository_url
description = "Full URL of the ECR repository"
}
output "repository_arn" {
value = aws_ecr_repository.this.arn
description = "ARN of the ECR repository"
}
output "repository_name" {
value = aws_ecr_repository.this.name
description = "Name of the ECR repository"
}

This module structure provides flexibility through input variables while maintaining consistency in encryption, tagging, and lifecycle management. The additional_tags variable allows environment-specific metadata without modifying the core module.

Environment-Specific Configurations

Use separate root modules for each environment, passing environment-specific variables to your ECR module. This approach provides clear separation while maintaining consistency. Each environment maintains its own state file, preventing accidental cross-environment modifications:

environments/production/main.tf
terraform {
backend "s3" {
bucket = "mycompany-terraform-state"
key = "ecr/production/terraform.tfstate"
region = "us-east-1"
dynamodb_table = "terraform-state-lock"
encrypt = true
}
}
provider "aws" {
region = "us-east-1"
default_tags {
tags = {
Environment = "production"
ManagedBy = "terraform"
Project = "container-registry"
}
}
}
module "api_repository" {
source = "../../modules/ecr"
repository_name = "api-service"
environment = "prod"
image_tag_mutability = "IMMUTABLE"
retention_count = 50
scan_on_push = true
additional_tags = {
CostCenter = "engineering"
Team = "platform"
}
}
module "worker_repository" {
source = "../../modules/ecr"
repository_name = "background-worker"
environment = "prod"
image_tag_mutability = "IMMUTABLE"
retention_count = 50
scan_on_push = true
additional_tags = {
CostCenter = "engineering"
Team = "platform"
}
}
environments/dev/main.tf
terraform {
backend "s3" {
bucket = "mycompany-terraform-state"
key = "ecr/dev/terraform.tfstate"
region = "us-east-1"
dynamodb_table = "terraform-state-lock"
encrypt = true
}
}
provider "aws" {
region = "us-east-1"
default_tags {
tags = {
Environment = "development"
ManagedBy = "terraform"
Project = "container-registry"
}
}
}
module "api_repository" {
source = "../../modules/ecr"
repository_name = "api-service"
environment = "dev"
image_tag_mutability = "MUTABLE"
retention_count = 10
scan_on_push = false
additional_tags = {
AutoShutdown = "true"
}
}

Workspaces vs. Separate State Files

Terraform workspaces offer a tempting way to manage multiple environments from a single configuration, but they introduce significant operational risks. A single mistake with workspace selection can deploy production changes to development or vice versa. Separate state files provide explicit isolation and enable environment-specific IAM policies.

With separate state files, you can restrict production state modifications to CI/CD systems and senior engineers while allowing broader access to development states. This blast radius reduction is crucial for production stability. The additional overhead of maintaining separate directories is minimal compared to the safety benefits.

If you do choose workspaces for simpler use cases, implement workspace-aware naming (${terraform.workspace}-${var.repository_name}) and add validation to prevent operations in the wrong workspace. However, for production systems, separate state files remain the recommended approach.

Naming Conventions and Tagging

Consistent naming prevents collisions and improves discoverability. The ${environment}-${repository_name} pattern ensures unique repository names while maintaining clarity about ownership and purpose. For organizations with multiple AWS accounts, consider prefixing with organization or team identifiers: ${org}-${environment}-${repository_name}.

Production repositories benefit from immutable tags to prevent accidental overwrites of deployed images. Development environments use mutable tags for rapid iteration, allowing developers to push updates without generating new tag names. This divergence in mutability settings reflects the different operational requirements of each environment.

Comprehensive tagging enables cost allocation, resource filtering, and automated governance. Every repository should include Environment, ManagedBy, and Repository tags as minimum metadata. Add CostCenter, Team, Application, or Owner tags based on your organizational needs. Use the AWS provider’s default_tags feature to apply common tags automatically, reducing duplication and ensuring consistency.

💡 Pro Tip: Use separate state files per environment rather than workspaces. This isolation prevents accidental cross-environment modifications and allows different IAM permissions for production versus non-production infrastructure. Implement state locking with DynamoDB to prevent concurrent modifications.

With your multi-environment module structure in place, the next step is integrating these repositories into your CI/CD pipelines for automated image builds and deployments.

CI/CD Integration: GitHub Actions to ECR

Once your ECR repositories are provisioned with Terraform, the next step is connecting your build pipelines. GitHub Actions has become the standard CI/CD platform for many teams, and integrating it with ECR requires careful attention to authentication and security practices.

OIDC Authentication: The Modern Approach

The traditional method of storing AWS access keys as GitHub secrets creates security risks—keys are long-lived, difficult to rotate, and can be leaked. Instead, use OpenID Connect (OIDC) to establish a trust relationship between GitHub and AWS, allowing temporary credentials without stored secrets.

OIDC works by having GitHub Actions generate a short-lived JSON Web Token (JWT) for each workflow run. This token contains claims about the repository, branch, and workflow. AWS validates the token against your configured trust policy and issues temporary credentials that expire after the workflow completes—typically within an hour. This eliminates the need to manage static credentials entirely.

First, configure the OIDC provider in Terraform:

oidc.tf
resource "aws_iam_openid_connect_provider" "github_actions" {
url = "https://token.actions.githubusercontent.com"
client_id_list = ["sts.amazonaws.com"]
thumbprint_list = ["6938fd4d98bab03faadb97b34396831e3780aea1"]
}
resource "aws_iam_role" "github_actions_ecr" {
name = "github-actions-ecr-push"
assume_role_policy = jsonencode({
Version = "2012-10-17"
Statement = [{
Effect = "Allow"
Principal = {
Federated = aws_iam_openid_connect_provider.github_actions.arn
}
Action = "sts:AssumeRoleWithWebIdentity"
Condition = {
StringEquals = {
"token.actions.githubusercontent.com:aud" = "sts.amazonaws.com"
}
StringLike = {
"token.actions.githubusercontent.com:sub" = "repo:your-org/your-repo:*"
}
}
}]
})
}
resource "aws_iam_role_policy_attachment" "github_actions_ecr" {
role = aws_iam_role.github_actions_ecr.name
policy_arn = "arn:aws:iam::aws:policy/AmazonEC2ContainerRegistryPowerUser"
}

The StringLike condition restricts the role to specific repositories, preventing unauthorized access from other GitHub Actions workflows. You can further tighten this by specifying exact branches or tags: "repo:your-org/your-repo:ref:refs/heads/main" limits access to only the main branch.

The AmazonEC2ContainerRegistryPowerUser managed policy provides push and pull permissions but not repository deletion—appropriate for most CI/CD workflows. For read-only scenarios like deployment pipelines that only pull images, use AmazonEC2ContainerRegistryReadOnly instead.

Building and Pushing Images

With OIDC configured, your GitHub Actions workflow authenticates using temporary credentials and pushes images to ECR:

.github/workflows/build-push.yml
name: Build and Push to ECR
on:
push:
branches: [main]
permissions:
id-token: write
contents: read
jobs:
build:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Configure AWS credentials
uses: aws-actions/configure-aws-credentials@v4
with:
role-to-assume: arn:aws:iam::123456789012:role/github-actions-ecr-push
aws-region: us-east-1
- name: Login to Amazon ECR
id: login-ecr
uses: aws-actions/amazon-ecr-login@v2
- name: Build, tag, and push image
env:
ECR_REGISTRY: ${{ steps.login-ecr.outputs.registry }}
ECR_REPOSITORY: production-api
IMAGE_TAG: ${{ github.sha }}
run: |
docker build -t $ECR_REGISTRY/$ECR_REPOSITORY:$IMAGE_TAG .
docker tag $ECR_REGISTRY/$ECR_REPOSITORY:$IMAGE_TAG $ECR_REGISTRY/$ECR_REPOSITORY:latest
docker push $ECR_REGISTRY/$ECR_REPOSITORY:$IMAGE_TAG
docker push $ECR_REGISTRY/$ECR_REPOSITORY:latest

The permissions block grants the workflow access to OIDC tokens. The configure-aws-credentials action exchanges the GitHub token for temporary AWS credentials valid for the workflow duration.

💡 Pro Tip: Tag images with both the commit SHA and latest. The SHA provides immutability and rollback capability, while latest simplifies development workflows.

Notice that ECR login credentials are valid for 12 hours, but in practice they’re used immediately and discarded when the workflow completes. The amazon-ecr-login action handles the docker login command automatically, extracting the registry URL and credentials from the assumed role.

Multi-Environment Deployments

For teams managing multiple environments, parameterize the ECR repository and AWS role based on the target environment:

.github/workflows/deploy.yml
- name: Set environment variables
run: |
if [ "${{ github.ref }}" == "refs/heads/main" ]; then
echo "ENVIRONMENT=production" >> $GITHUB_ENV
echo "ECR_REPO=production-api" >> $GITHUB_ENV
echo "ROLE_ARN=arn:aws:iam::123456789012:role/github-actions-ecr-push" >> $GITHUB_ENV
else
echo "ENVIRONMENT=staging" >> $GITHUB_ENV
echo "ECR_REPO=staging-api" >> $GITHUB_ENV
echo "ROLE_ARN=arn:aws:iam::987654321098:role/github-actions-ecr-push" >> $GITHUB_ENV
fi

This pattern allows separate AWS accounts per environment, reinforcing blast radius isolation. Production and staging OIDC roles can have identical trust policies but reside in different accounts, ensuring that a compromised staging workflow cannot affect production resources.

For more sophisticated workflows, consider using GitHub Environments to require manual approvals before production deployments. Combine this with environment-specific protection rules and deployment branches to ensure only tested, reviewed code reaches production ECR repositories.

Optimizing Build Performance

GitHub Actions runners start fresh for each workflow run, meaning Docker layer caching doesn’t persist by default. For large images or frequent builds, enable Docker layer caching using GitHub’s cache action or BuildKit’s built-in cache exporters:

- name: Build with cache
uses: docker/build-push-action@v5
with:
context: .
push: true
tags: ${{ steps.login-ecr.outputs.registry }}/production-api:${{ github.sha }}
cache-from: type=gha
cache-to: type=gha,mode=max

The type=gha cache backend stores layers in GitHub’s cache, reducing build times from minutes to seconds for unchanged dependencies. The mode=max setting caches all layers, not just the final result, maximizing reuse across builds.

With your CI/CD pipeline pushing images to ECR automatically, the next operational concern becomes visibility: monitoring registry health, tracking image usage, and responding to security findings.

Monitoring and Operations

Deploying ECR repositories is just the beginning. Production container registries demand continuous monitoring to track costs, detect security issues, and ensure operational stability. Terraform makes infrastructure observable by design—but you need to configure the right metrics, alerts, and recovery mechanisms.

Visual: CloudWatch dashboard showing ECR metrics and alerts

CloudWatch Metrics for Cost and Usage Visibility

ECR automatically publishes metrics to CloudWatch, but the default namespace only tracks repository-level data. For production environments, you need visibility into storage consumption and data transfer costs, which drive your ECR bill.

Track these critical metrics:

  • RepositoryPullCount: Identifies frequently accessed images and validates cache hit rates in your CI/CD pipelines
  • RepositoryImageCount: Monitors lifecycle policy effectiveness—spikes indicate retention rules aren’t working as expected
  • Storage (via CloudWatch Logs Insights): Query ECR API calls to calculate total storage per repository using DescribeImages responses

Set up CloudWatch dashboards to correlate image pushes with deployment events. When storage costs jump unexpectedly, cross-reference with your lifecycle policies to identify orphaned images or misconfigured expiration rules.

Alerts for Security and Operational Failures

Image scan failures represent critical security gaps in your deployment pipeline. Configure CloudWatch alarms to trigger when vulnerability scans detect HIGH or CRITICAL findings, then route alerts to SNS topics that notify your security team and block deployments.

Create alarms for:

  • Scan findings threshold exceeded: Alert when any image has more than 10 critical vulnerabilities
  • Failed scans: Trigger when scanOnPush fails due to service limits or timeouts
  • Unauthorized access attempts: Monitor CloudTrail for denied GetAuthorizationToken or BatchGetImage calls

Integrate these alarms with your incident response workflow. For automated remediation, connect SNS topics to Lambda functions that automatically quarantine vulnerable images by updating repository policies to deny pulls.

Backup and Disaster Recovery

ECR stores images in S3 with built-in redundancy, but cross-region replication isn’t automatic. For mission-critical registries, implement replication using ECR’s native cross-region replication rules or export images to S3 buckets in different regions.

Your disaster recovery strategy should include:

  • Cross-region replication rules: Define in Terraform using aws_ecr_replication_configuration to replicate tagged images to secondary regions
  • Image manifest exports: Periodically export image manifests to S3 for audit trails and rollback scenarios
  • Registry access logs: Enable CloudTrail logging for all ECR API calls to track who pushed or pulled images

For multi-account setups, maintain a centralized disaster recovery account with read-only cross-account access to all production registries. This ensures your team can pull critical images even if primary accounts are compromised.

With monitoring and recovery mechanisms in place, your ECR infrastructure becomes truly production-ready. These operational safeguards work in concert with the lifecycle policies, security controls, and CI/CD integrations you’ve already built.

Key Takeaways

  • Start with basic ECR Terraform resources and add lifecycle policies immediately to avoid runaway storage costs
  • Always enable image scanning and use repository policies for cross-account access instead of sharing credentials
  • Use Terraform modules and separate state files to manage ECR across environments without duplicating code
  • Integrate ECR with GitHub Actions using OIDC authentication for secure, keyless CI/CD pipelines