Feb 10, 2026

Running EKS Anywhere on Bare Metal: A Production-Ready Setup Guide

Your data sovereignty requirements just killed your cloud migration plans. The security team says sensitive workloads must stay on-premises, but your developers are already building for Kubernetes and love the EKS developer experience. You’re now tasked with bringing AWS-grade Kubernetes to your own data center without sacrificing the tooling your team depends on.

This scenario plays out constantly in healthcare, finance, government, and any industry where compliance officers have veto power over architecture decisions. You’ve invested in EKS expertise, built CI/CD pipelines around eksctl and the AWS CLI, and trained your team on IAM roles for service accounts. Now someone’s asking you to throw all that away and start fresh with a different Kubernetes distribution—or worse, roll your own cluster from scratch.

The gap between “Kubernetes in the cloud” and “Kubernetes on your own hardware” has historically been massive. Cloud-managed Kubernetes handles the hard parts: control plane availability, etcd backup and restore, certificate rotation, version upgrades that don’t crater your workloads at 2 AM. Bringing those capabilities to bare metal means either building significant operational expertise or accepting a distribution that looks nothing like what your developers already know.

EKS Anywhere exists to close that gap. It’s not just another Kubernetes distribution—it’s the actual EKS experience running on infrastructure you control. Same APIs, same tooling, same upgrade mechanisms. Your developers don’t need to learn a new platform, and your security team gets the on-premises deployment they’re demanding.

But making this work in production requires more than following the quickstart guide. You need to understand what EKS Anywhere actually provides, where it fits against alternatives, and how to build a deployment that survives contact with real-world operational requirements.

Why EKS Anywhere Exists: Bridging the Hybrid Gap

Running Kubernetes consistently across cloud and on-premises environments remains one of the most persistent challenges facing platform teams. Your developers want a unified experience. Your operations team wants standardized tooling. Your security team wants consistent policies. Meanwhile, you’re juggling EKS clusters in AWS alongside self-managed Kubernetes in your data centers, with different upgrade procedures, different monitoring stacks, and different failure modes.

Visual: EKS Anywhere bridging cloud and on-premises environments

EKS Anywhere addresses this fragmentation directly. It brings the same Kubernetes distribution that powers Amazon EKS—identical control plane components, identical container runtime, identical networking defaults—to your bare-metal servers, VMware vSphere clusters, or other supported infrastructure. The clusters you run on-premises become operationally similar to those in AWS, not just conceptually compatible.

How EKS Anywhere Differs from Vanilla Kubernetes

Installing upstream Kubernetes using kubeadm gives you a functional cluster, but you’re immediately responsible for every decision: which CNI plugin, which ingress controller, which certificate rotation strategy. EKS Anywhere makes these decisions for you, providing Cilium for networking, Bottlerocket or Ubuntu for node operating systems, and pre-configured security defaults that match AWS best practices.

Compared to other enterprise distributions, EKS Anywhere occupies a specific niche. OpenShift provides a comprehensive platform with its own opinions about CI/CD, service mesh, and developer experience—you adopt the Red Hat ecosystem wholesale. Rancher emphasizes multi-cluster management and supports any upstream Kubernetes distribution. EKS Anywhere focuses narrowly on AWS operational compatibility: if your organization already standardizes on EKS in the cloud, extending that consistency to on-premises becomes straightforward.

Licensing and Enterprise Support

EKS Anywhere itself is open source and free to run. You can deploy clusters, upgrade them, and operate them indefinitely without paying AWS anything. However, enterprise support subscriptions unlock 24/7 access to AWS support engineers, guaranteed response times for production issues, and extended support for older Kubernetes versions beyond the standard community window.

💡 Pro Tip: Even without a paid subscription, you can use EKS Connector to register your EKS Anywhere clusters with the AWS console. This gives you visibility into cluster health and workload status through a single pane of glass, though troubleshooting support still requires a subscription.

When EKS Anywhere Makes Sense

Choose EKS Anywhere when your organization already invests heavily in AWS and wants to minimize operational divergence between cloud and on-premises environments. It works particularly well for regulated industries requiring data residency, edge deployments needing local compute, or manufacturing environments where latency to the cloud proves unacceptable.

If you’re not already an AWS shop, or if you need advanced multi-cluster federation across different providers, evaluate Rancher or OpenShift instead. EKS Anywhere optimizes for AWS consistency, not provider-agnostic flexibility.

With the strategic context established, let’s examine the specific components and infrastructure requirements you’ll need to deploy EKS Anywhere on bare metal.

Architecture Deep Dive: Components and Infrastructure Requirements

Before provisioning hardware or writing configuration files, you need a clear understanding of how EKS Anywhere organizes infrastructure and what resources your deployment demands. This section establishes the architectural foundation that determines whether your bare-metal investment succeeds or becomes a costly lesson in inadequate planning.

Visual: EKS Anywhere architecture showing management and workload clusters

Management Cluster vs. Workload Cluster Topology

EKS Anywhere follows a separation-of-concerns model borrowed from the Kubernetes Cluster API project. The management cluster serves as your control plane for cluster lifecycle operations—it creates, upgrades, and deletes workload clusters but runs no application workloads itself. The workload clusters host your actual applications and services.

This separation provides operational isolation. A misbehaving application can’t destabilize the infrastructure responsible for managing cluster health. For production deployments, AWS recommends dedicating physical hardware to the management cluster rather than co-locating it with workload resources.

In smaller environments, you can run a “self-managed” topology where a single cluster manages itself. This reduces hardware requirements but sacrifices the resilience benefits of separation. For anything beyond development or proof-of-concept work, maintain the split architecture.

Cluster API: Kubernetes-Native Lifecycle Management

EKS Anywhere leverages Cluster API (CAPI) to treat clusters as declarative Kubernetes resources. Rather than imperative scripts that execute once and leave systems in unknown states, CAPI controllers continuously reconcile your declared cluster specification against reality.

This approach enables GitOps workflows from day one. Your cluster definitions live in version control, and changes flow through the same review and deployment pipelines as application code. The Tinkerbell provider handles bare-metal specifics—PXE booting, OS provisioning, and hardware inventory management—while CAPI orchestrates the higher-level cluster operations.

Hardware Requirements for Bare-Metal Deployments

Minimum specifications for a production-capable control plane node:

CPU: 4 cores (AMD64 architecture)
Memory: 16 GB RAM
Storage: 100 GB SSD for etcd and system components
Network: Dual NICs recommended (management and workload traffic separation)

Worker nodes scale based on workload demands, but start with comparable specifications. Each machine requires BMC (Baseboard Management Controller) access—IPMI, iDRAC, or iLO—for out-of-band provisioning and power management.

💡 Pro Tip: Maintain at least 20% overhead capacity on control plane nodes. Etcd performance degrades precipitously when memory pressure triggers swap usage, and recovery from a degraded etcd cluster consumes significant engineering time.

Networking Prerequisites

Your network infrastructure must provide:

DHCP server with reservations for all cluster nodes (Tinkerbell can run its own, but integrating with existing infrastructure reduces complexity)
DNS resolution for cluster endpoints and node hostnames
Load balancer for the Kubernetes API server—HAProxy or MetalLB for on-premises deployments
Layer 2 adjacency between provisioning infrastructure and target hardware during initial bootstrap

Allocate dedicated IP ranges for pod networking (default /16) and services (default /12). These must not overlap with existing infrastructure subnets.

Storage Options and CSI Compatibility

EKS Anywhere supports standard CSI drivers, giving you flexibility in storage backends. Common options include:

Rook-Ceph for hyperconverged storage using local disks
OpenEBS for containerized storage with LocalPV or Mayastor
NetApp Trident or Pure Storage for enterprise SAN integration
NFS-based provisioners for simpler deployments with existing NAS infrastructure

Choose storage that matches your performance and availability requirements. Distributed storage systems like Ceph add operational complexity but provide resilience against node failures.

With infrastructure requirements mapped, you’re ready to translate this architecture into a running cluster. The next section walks through the bootstrap process using eksctl anywhere, transforming bare hardware into a functional Kubernetes environment.

Bootstrapping Your First Cluster with eksctl anywhere

With your hardware infrastructure prepared and network prerequisites in place, you’re ready to provision your first EKS Anywhere cluster. This section walks through the complete bootstrap process, from installing the tooling to watching your bare-metal nodes transform into a production Kubernetes cluster.

Installing eksctl and the EKS Anywhere Plugin

The eksctl CLI serves as your primary interface for cluster lifecycle management. Install it alongside the EKS Anywhere plugin on your administrative workstation:

## Install eksctl
curl --silent --location "https://github.com/weaveworks/eksctl/releases/latest/download/eksctl_$(uname -s)_amd64.tar.gz" | tar xz -C /tmp
sudo mv /tmp/eksctl /usr/local/bin

## Install the EKS Anywhere plugin
RELEASE_VERSION=$(curl https://anywhere-assets.eks.amazonaws.com/releases/eks-a/manifest.yaml | yq ".spec.latestVersion")
EKS_ANYWHERE_TARBALL="eksctl-anywhere-${RELEASE_VERSION}-$(uname -s)-amd64.tar.gz"
curl --silent --location "https://anywhere-assets.eks.amazonaws.com/releases/eks-a/downloads/artifacts/eks-anywhere/${RELEASE_VERSION}/${EKS_ANYWHERE_TARBALL}" | tar xz -C /tmp
sudo mv /tmp/eksctl-anywhere /usr/local/bin

## Verify installation
eksctl anywhere version

Your admin machine needs Docker running since eksctl anywhere creates a temporary bootstrap cluster in containers before pivoting to your actual hardware. Ensure Docker has at least 4GB of memory allocated, as the bootstrap cluster runs multiple control plane components simultaneously. Additionally, verify that your administrative workstation has network connectivity to both the bare-metal nodes and the provisioning network segment.

Generating the Cluster Configuration

EKS Anywhere uses declarative YAML manifests to define cluster topology. Generate a starter configuration for the bare-metal (Tinkerbell) provider:

export CLUSTER_NAME="prod-eks-cluster"
eksctl anywhere generate clusterconfig $CLUSTER_NAME --provider tinkerbell > cluster-config.yaml

This produces a multi-document YAML file containing your Cluster, TinkerbellDatacenterConfig, TinkerbellMachineConfig, and TinkerbellTemplateConfig resources. Open cluster-config.yaml and customize the critical fields:

apiVersion: anywhere.eks.amazonaws.com/v1alpha1
kind: Cluster
metadata:
  name: prod-eks-cluster
spec:
  kubernetesVersion: "1.29"
  controlPlaneConfiguration:
    count: 3
    endpoint:
      host: "192.168.1.100"  # Virtual IP for the control plane
    machineGroupRef:
      name: prod-eks-cluster-control-plane
  workerNodeGroupConfigurations:
    - count: 5
      machineGroupRef:
        name: prod-eks-cluster-worker
      name: md-0
  datacenterRef:
    kind: TinkerbellDatacenterConfig
    name: prod-eks-cluster
---
apiVersion: anywhere.eks.amazonaws.com/v1alpha1
kind: TinkerbellDatacenterConfig
metadata:
  name: prod-eks-cluster
spec:
  tinkerbellIP: "192.168.1.10"  # IP for the Tinkerbell provisioner

Pay careful attention to the endpoint.host value—this virtual IP must be reserved and routable from all nodes but not assigned to any physical interface. The control plane components use this VIP with kube-vip to provide high availability across the three control plane nodes.

Understanding the Tinkerbell Provisioning Stack

Tinkerbell orchestrates the bare-metal provisioning lifecycle. When you initiate cluster creation, EKS Anywhere deploys Tinkerbell components that handle:

Boots: A DHCP and iPXE server that network-boots your hardware
Hegel: A metadata service providing machine-specific configuration
Tink Server: The workflow engine executing provisioning actions
Rufio: A BMC controller for out-of-band machine management via IPMI or Redfish

The Tinkerbell stack operates as a cohesive unit during provisioning. Boots responds to DHCP requests from bare-metal nodes and directs them to download an iPXE script. This script instructs the machine to fetch a lightweight operating system image that runs entirely in memory. Once booted, the node contacts Hegel to retrieve its unique configuration and then executes the workflow defined in Tink Server. The workflow typically partitions disks, installs the target operating system (Bottlerocket or Ubuntu), configures networking, and prepares the node for Kubernetes bootstrap.

You must create a hardware inventory file describing each physical machine:

apiVersion: tinkerbell.org/v1alpha1
kind: Hardware
metadata:
  name: node-01
  namespace: eksa-system
spec:
  bmcRef:
    kind: Machine
    name: node-01-bmc
  disks:
    - device: /dev/sda
  interfaces:
    - dhcp:
        hostname: node-01
        ip:
          address: 192.168.1.20
          gateway: 192.168.1.1
          netmask: 255.255.255.0
        mac: "00:50:56:ab:cd:01"
        nameservers:
          - 8.8.8.8

💡 Pro Tip: Collect MAC addresses and BMC credentials for all target machines before starting. Missing or incorrect hardware definitions cause the majority of first-time bootstrap failures.

For environments with many nodes, consider automating hardware inventory collection. Many organizations use Redfish APIs or existing CMDB systems to generate the hardware manifest programmatically rather than manually transcribing MAC addresses.

Executing the Cluster Creation

With your configuration validated, trigger the bootstrap:

eksctl anywhere create cluster \
  --filename cluster-config.yaml \
  --hardware-csv hardware.csv \
  --tinkerbell-bootstrap-ip 192.168.1.11

The process unfolds in distinct phases. First, a Kind-based bootstrap cluster spins up locally on your administrative workstation. Next, Tinkerbell components deploy into this temporary cluster and begin listening for incoming provisioning requests. Your bare-metal machines then PXE boot when powered on (either manually or via BMC automation) and receive their operating system through the Tinkerbell workflow. Finally, the Kubernetes control plane initializes on the provisioned nodes, and the cluster pivots from the local bootstrap environment to the actual hardware.

Expect 30-45 minutes for a three-node control plane with five workers. During this time, monitor progress through the eksctl output, which provides detailed status updates for each phase. The control plane nodes provision first, followed by worker nodes in parallel batches.

Troubleshooting Common Bootstrap Failures

When machines fail to provision, start with these diagnostics:

## Check Tinkerbell workflow status
kubectl get workflows -n eksa-system

## Inspect boots DHCP logs
kubectl logs -n eksa-system deployment/boots

## Verify hardware registration
kubectl get hardware -n eksa-system -o wide

The most frequent issues stem from network misconfiguration: VLANs blocking DHCP broadcasts, incorrect MAC addresses in hardware definitions, or firewalls dropping TFTP traffic. Ensure your provisioning network allows UDP ports 67-69 and TCP port 50061 between the Tinkerbell stack and target hardware.

BMC connectivity problems represent another common failure mode. Verify that the administrative network can reach each node’s BMC interface and that credentials in your configuration match those configured on the hardware. Some BMC implementations require specific Redfish or IPMI versions—consult your hardware vendor’s documentation if power management commands fail silently.

If workflows stall in a pending state, examine the Tink Server logs for action failures. Disk naming mismatches (/dev/sda versus /dev/nvme0n1) frequently cause partition steps to fail on heterogeneous hardware. Standardize disk naming in your hardware definitions or use persistent device paths where available.

Once eksctl anywhere completes successfully, it writes a kubeconfig to your working directory. Validate cluster health by checking node status and core component pods before proceeding to production hardening.

With a running cluster in hand, the next step involves applying security configurations and production-grade settings that transform this foundation into an enterprise-ready platform.

Cluster Configuration: Security and Production Hardening

A running EKS Anywhere cluster is just the starting point. Before workloads hit production, you need authentication, authorization, encryption, and audit capabilities that satisfy enterprise security requirements. This section walks through the essential hardening configurations that bring your cluster up to production standards.

OIDC Authentication Integration

EKS Anywhere supports OIDC authentication out of the box, allowing you to integrate with existing identity providers like Okta, Azure AD, or Keycloak. This eliminates the need to manage separate credentials for cluster access and enables centralized identity governance. Configure this in your cluster specification:

apiVersion: anywhere.eks.amazonaws.com/v1alpha1
kind: Cluster
metadata:
  name: prod-cluster
spec:
  identityProviderRefs:
    - kind: OIDCConfig
      name: corporate-idp
---
apiVersion: anywhere.eks.amazonaws.com/v1alpha1
kind: OIDCConfig
metadata:
  name: corporate-idp
spec:
  clientId: eks-anywhere-prod
  issuerUrl: https://sso.example.com/oauth2/default
  usernameClaim: email
  usernamePrefix: "oidc:"
  groupsClaim: groups
  groupsPrefix: "oidc:"
  requiredClaims:
    - claim: department
      value: engineering

The requiredClaims field restricts cluster access to users with specific token claims—useful for limiting access to particular teams or departments. Combined with Kubernetes RBAC, you can map IdP groups directly to cluster roles for fine-grained authorization control.

Pod Security Standards

Kubernetes Pod Security Standards replace the deprecated PodSecurityPolicy. Enforce them at the namespace level using labels:

apiVersion: v1
kind: Namespace
metadata:
  name: production-workloads
  labels:
    pod-security.kubernetes.io/enforce: restricted
    pod-security.kubernetes.io/enforce-version: v1.28
    pod-security.kubernetes.io/audit: restricted
    pod-security.kubernetes.io/warn: restricted

For baseline enforcement across all namespaces, configure the admission controller in your cluster spec:

spec:
  podSecurityAdmission:
    defaultPolicy: baseline
    exemptions:
      namespaces:
        - kube-system
        - eksa-system

Network Policies

Pod Security Standards control what pods can do; network policies control where they can communicate. Without explicit policies, all pod-to-pod traffic is permitted by default. Implement a default-deny policy and explicitly allow required traffic:

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: default-deny-all
  namespace: production-workloads
spec:
  podSelector: {}
  policyTypes:
    - Ingress
    - Egress

Layer application-specific policies on top of this baseline to permit only the traffic flows your workloads require. This microsegmentation limits lateral movement in the event of a compromise.

Audit Logging and SIEM Integration

Enable comprehensive audit logging by specifying an audit policy. EKS Anywhere writes audit logs to the control plane nodes at /var/log/kubernetes/audit/.

apiVersion: audit.k8s.io/v1
kind: Policy
rules:
  - level: Metadata
    resources:
      - group: ""
        resources: ["secrets", "configmaps"]
  - level: RequestResponse
    resources:
      - group: ""
        resources: ["pods", "services"]
      - group: "apps"
        resources: ["deployments", "daemonsets"]
  - level: None
    users: ["system:kube-proxy"]
    verbs: ["watch"]

Deploy a Fluent Bit DaemonSet to ship these logs to your SIEM. Most enterprises route them to Splunk, Elastic, or AWS CloudWatch (via hybrid connectivity). Configure log retention policies that align with your compliance requirements—many regulations mandate 90 days or more of audit log retention.

💡 Pro Tip: Set level: Metadata for secrets rather than RequestResponse to avoid logging sensitive data while still capturing access patterns.

etcd Encryption at Rest

Protect sensitive data stored in etcd by enabling encryption at rest. Create an encryption configuration:

apiVersion: apiserver.config.k8s.io/v1
kind: EncryptionConfiguration
resources:
  - resources:
      - secrets
      - configmaps
    providers:
      - aescbc:
          keys:
            - name: key1
              secret: c2VjdXJlLWVuY3J5cHRpb24ta2V5LTMyLWJ5dGVz
      - identity: {}

Reference this configuration in your cluster specification under controlPlaneConfiguration. Rotate keys by adding a new key at position 0, re-encrypting all secrets with kubectl get secrets --all-namespaces -o json | kubectl replace -f -, then removing the old key. Schedule key rotation quarterly or according to your security policy requirements.

Certificate Management and Rotation

EKS Anywhere handles control plane certificate rotation during cluster upgrades. For custom certificates (ingress, service mesh, mutual TLS between services), implement cert-manager with a ClusterIssuer tied to your internal CA:

apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
  name: internal-ca
spec:
  ca:
    secretName: internal-ca-keypair

Set certificate renewal thresholds to 30 days before expiration and configure alerts in your monitoring stack for certificates approaching renewal. For workloads requiring mutual TLS, cert-manager can automatically provision and rotate certificates, eliminating manual certificate management overhead.

With authentication, authorization, encryption, network segmentation, and audit logging in place, your cluster meets the baseline for production workloads. Next, we turn to Day 2 operations—upgrades, scaling, and integrating GitOps workflows that keep your clusters consistent and auditable over time.

Day 2 Operations: Upgrades, Scaling, and GitOps Integration

Deploying an EKS Anywhere cluster is the starting point. The real challenge lies in maintaining that cluster over months and years—handling Kubernetes version upgrades, scaling infrastructure to meet demand, and establishing repeatable, auditable change management through GitOps. This section provides operational playbooks for each of these critical lifecycle tasks.

Cluster Upgrades and Version Management

EKS Anywhere follows the same Kubernetes version support policy as Amazon EKS, maintaining compatibility with three minor versions at any time. Before initiating an upgrade, verify your target version against the EKS Anywhere release notes and ensure all workloads are compatible. Review the version compatibility matrix carefully—certain CNI plugins, CSI drivers, and admission controllers may require updates before the cluster upgrade can proceed.

The upgrade process uses a rolling replacement strategy for control plane nodes, followed by worker nodes. Update your cluster specification with the new Kubernetes version:

apiVersion: anywhere.eks.amazonaws.com/v1alpha1
kind: Cluster
metadata:
  name: prod-cluster
spec:
  kubernetesVersion: "1.29"
  controlPlaneConfiguration:
    count: 3
    upgradeRolloutStrategy:
      rollingUpdate:
        maxSurge: 1
  workerNodeGroupConfigurations:
    - name: worker-group-1
      count: 5
      upgradeRolloutStrategy:
        rollingUpdate:
          maxSurge: 1
          maxUnavailable: 0

Apply the upgrade with eksctl anywhere upgrade cluster -f cluster-upgrade.yaml. The maxUnavailable: 0 setting ensures zero capacity loss during worker node upgrades—essential for production workloads. For control plane upgrades, the process creates a new node with the target version, waits for it to join the cluster and become healthy, then cordons and drains the old node before removal.

💡 Pro Tip: Always run eksctl anywhere upgrade plan first to preview changes and validate hardware availability before committing to an upgrade. This dry-run validates that sufficient spare capacity exists in your hardware inventory to support the rolling update strategy.

Scaling Node Groups

Horizontal scaling involves modifying the count field in your worker node configuration and reapplying the cluster spec. For bare metal deployments, ensure your hardware inventory in Tinkerbell has sufficient available machines before scaling up. Monitor your hardware pool regularly—machines that fail health checks or become unreachable reduce your effective scaling headroom.

Adding new node groups for specialized workloads—GPU nodes or high-memory instances—requires defining additional worker configurations:

workerNodeGroupConfigurations:
  - name: gpu-workers
    count: 2
    machineGroupRef:
      kind: TinkerbellMachineConfig
      name: gpu-machine-config
    labels:
      node.kubernetes.io/instance-type: gpu
    taints:
      - key: nvidia.com/gpu
        value: "true"
        effect: NoSchedule

When decommissioning nodes, always cordon and drain before removing from the cluster specification. This ensures workloads migrate gracefully and persistent volume attachments detach cleanly.

GitOps Integration with Flux

EKS Anywhere has native Flux integration, enabling declarative cluster management through Git repositories. Enable GitOps during cluster creation or add it to existing clusters:

apiVersion: anywhere.eks.amazonaws.com/v1alpha1
kind: FluxConfig
metadata:
  name: prod-flux
spec:
  systemNamespace: flux-system
  github:
    owner: my-org
    repository: eks-anywhere-config
    branch: main
    personal: false
    clusterConfigPath: clusters/prod-cluster

Once enabled, all cluster configuration changes flow through pull requests, providing audit trails and enabling peer review for infrastructure modifications. Flux continuously reconciles the cluster state against your Git repository, automatically correcting drift. For teams preferring ArgoCD, you can install it manually post-cluster creation and point it at your configuration repository—both tools support the declarative YAML manifests EKS Anywhere generates.

Backup and Disaster Recovery

Velero provides cluster backup capabilities for both Kubernetes resources and persistent volumes. Install Velero with your preferred storage backend:

apiVersion: velero.io/v1
kind: Schedule
metadata:
  name: daily-backup
  namespace: velero
spec:
  schedule: "0 2 * * *"
  template:
    includedNamespaces:
      - "*"
    excludedNamespaces:
      - velero
    storageLocation: default
    ttl: 720h

For bare metal deployments, configure Velero to use an S3-compatible object store (MinIO works well for air-gapped environments) or NFS-backed storage. Consider implementing a tiered retention policy: hourly snapshots retained for 24 hours, daily backups for 30 days, and weekly backups for compliance requirements.

Test your restore procedures quarterly. A backup strategy is only as good as your last successful restore. Document the complete recovery runbook, including secrets restoration and DNS cutover procedures.

With operational processes established, the next consideration is connectivity—how your on-premises EKS Anywhere cluster integrates with AWS services and your broader network infrastructure.

Connecting to AWS: Hybrid Networking and Service Integration

Running EKS Anywhere on bare metal delivers Kubernetes consistency, but the real power emerges when you connect your on-premises clusters to AWS services. This hybrid architecture gives you local compute with cloud-native observability, security, and management capabilities. The integration patterns described here enable seamless workflows between your data center and AWS, treating both environments as a unified platform rather than isolated silos.

Unified Visibility with EKS Connector

EKS Connector registers your on-premises cluster with the AWS Console, providing a single pane of glass across all your Kubernetes environments. The connector runs as an agent in your cluster and maintains an outbound connection to AWS, requiring no inbound firewall rules or complex network configuration.

## Register the EKS Anywhere cluster with AWS
eksctl register cluster \
  --name eks-anywhere-prod \
  --provider EKS_ANYWHERE \
  --region us-east-1

## Apply the generated manifests to your cluster
kubectl apply -f eks-connector.yaml
kubectl apply -f eks-connector-clusterrolebinding.yaml
kubectl apply -f eks-connector-console-dashboard-full-access-group.yaml

After registration, your cluster appears in the EKS Console alongside your cloud-native clusters. You can view workloads, nodes, and cluster health without maintaining separate dashboards. The connector also enables AWS SSO users to access your cluster through the console, simplifying access management for teams already using AWS identity federation.

Secure Credential Management with IAM Roles Anywhere

IAM Roles Anywhere eliminates long-lived credentials by letting your on-premises workloads assume IAM roles using X.509 certificates. This approach integrates with your existing PKI infrastructure and follows the same temporary credential model used by EC2 instance profiles.

## Create a trust anchor using your private CA
aws rolesanywhere create-trust-anchor \
  --name eks-anywhere-anchor \
  --source "sourceType=CERTIFICATE_BUNDLE,sourceData={x509CertificateData=$(cat ca-cert.pem)}" \
  --region us-east-1

## Create a profile linking to your IAM role
aws rolesanywhere create-profile \
  --name eks-anywhere-profile \
  --role-arns arn:aws:iam::123456789012:role/EKSAnywhereWorkloadRole \
  --region us-east-1

Workloads then use the AWS signing helper to obtain temporary credentials, rotating automatically without manual intervention. Session duration defaults to one hour but can be configured up to twelve hours based on your security requirements.

💡 Pro Tip: Deploy the credential helper as a DaemonSet to provide credentials via a local endpoint, simplifying application configuration across your cluster.

Hybrid Network Connectivity

For production deployments, establish AWS Direct Connect or Site-to-Site VPN between your data center and AWS VPCs. This private connectivity enables secure access to AWS services without traversing the public internet. Direct Connect provides consistent network performance with dedicated bandwidth, while Site-to-Site VPN offers a faster deployment path for initial testing or lower-throughput scenarios.

PrivateLink extends this further by exposing specific AWS services—like Amazon S3, Secrets Manager, or ECR—through private endpoints accessible from your on-premises network. Configure your cluster’s DNS to resolve AWS service endpoints to these private addresses, ensuring all traffic stays within your private network path.

Observability with Amazon Managed Prometheus

Ship metrics from your bare-metal cluster to Amazon Managed Prometheus for unified observability. Configure the Prometheus remote write endpoint in your cluster’s monitoring stack:

## Get your AMP workspace endpoint
AMP_ENDPOINT=$(aws amp describe-workspace \
  --workspace-id ws-abc123def456 \
  --query 'workspace.prometheusEndpoint' \
  --output text \
  --region us-east-1)

## The remote write URL follows this pattern
echo "Remote write URL: ${AMP_ENDPOINT}api/v1/remote_write"

Authentication to Amazon Managed Prometheus uses SigV4, which integrates naturally with the IAM Roles Anywhere setup described earlier. Your Prometheus server signs requests using the temporary credentials, maintaining the same security posture as cloud-native workloads.

Pair this with Amazon Managed Grafana for dashboards that span both cloud and on-premises infrastructure, providing consistent alerting and visualization regardless of where workloads run. Teams can correlate metrics across environments, making it easier to diagnose issues that span the hybrid boundary.

These integrations transform EKS Anywhere from an isolated on-premises solution into a true hybrid platform. However, getting the configuration right requires understanding the common pitfalls teams encounter—and the performance optimizations that make bare-metal Kubernetes shine.

Lessons from the Field: Common Pitfalls and Performance Tuning

Production EKS Anywhere deployments surface problems that documentation rarely covers. After running bare-metal clusters through months of production traffic, patterns emerge that save teams weeks of troubleshooting.

Network Performance: Beyond Default CNI Settings

Cilium ships as the default CNI for EKS Anywhere, but its out-of-box configuration prioritizes compatibility over performance. On bare metal, this leaves significant throughput on the table.

The first issue teams encounter: eBPF host routing sits disabled by default. Enabling it bypasses iptables entirely for pod-to-pod traffic, reducing latency by 15-20% in high-throughput scenarios. The second common miss involves MTU configuration—Cilium defaults to 1500, but bare-metal networks often support jumbo frames. Mismatched MTUs between nodes create silent packet fragmentation that manifests as intermittent connection timeouts under load.

💡 Pro Tip: Run cilium connectivity test after any CNI configuration change. It catches subtle issues that only appear under specific traffic patterns.

Storage I/O: Diagnosing the Invisible Bottleneck

Local NVMe storage seems straightforward until provisioner overhead becomes the constraint. The default local-path-provisioner creates volumes synchronously, and its single-threaded design creates queuing delays during pod scheduling storms.

Monitor iowait on nodes during deployments. Values above 5% indicate the provisioner—not the underlying storage—as your bottleneck. For production workloads, OpenEBS with cStor provides async provisioning and handles the scheduling pressure gracefully.

Control Plane Resource Reservation

Bare-metal nodes lack the resource isolation that hypervisors provide. Without explicit reservations, kubelet happily schedules workloads that starve etcd of I/O bandwidth.

Reserve at minimum: 2 CPU cores, 4GB memory, and 100 IOPS for system components on control plane nodes. These numbers come from observing etcd leader election failures during cluster upgrades—the exact scenario where stability matters most.

Curated Packages vs. DIY: The Real Tradeoff

AWS curated packages for EKS Anywhere include tested versions of components like Harbor, Prometheus, and cert-manager. The appeal of DIY installations—newer versions, custom configurations—fades quickly when upgrade compatibility breaks.

Use curated packages for infrastructure components that interact with the cluster lifecycle. Reserve custom installations for application-layer tooling where version flexibility provides genuine value.

These operational patterns compound over time. A cluster tuned across all dimensions handles failure scenarios that break default configurations—exactly the resilience that justifies running Kubernetes on your own hardware in the first place.

Key Takeaways

Start with a separate management cluster to enable zero-downtime upgrades and maintain operational independence for your workload clusters
Implement GitOps from day one using the EKS Anywhere curated packages for Flux to ensure reproducible cluster configurations
Use IAM Roles Anywhere instead of static credentials when integrating on-premises clusters with AWS services
Plan your networking topology before deployment—changing CNI plugins or IP ranges post-installation requires cluster recreation