Building Custom cert-manager Issuers: From Internal PKI to Zero-Touch Certificate Automation
Your security team just mandated that all internal services use certificates from the corporate PKI, but cert-manager only ships with Let’s Encrypt and Vault issuers. You’re staring at hundreds of microservices that need certificates rotated every 90 days, and the manual process that worked for five services won’t scale to five hundred.
The math is brutal. Five hundred services times four rotations per year equals two thousand certificate operations annually. Each rotation requires generating a CSR, submitting it to your internal CA, waiting for approval, downloading the certificate, and updating the Kubernetes secret. Even at fifteen minutes per operation—an optimistic estimate when you factor in ticket queues and approval workflows—you’re looking at five hundred hours of annual toil. That’s three months of an engineer’s time spent copying certificates around.
Meanwhile, your developers expect the same seamless experience they get with public certificates. They want to annotate an Ingress, watch the certificate appear, and forget about it until the next security audit. They don’t care that your corporate CA speaks a proprietary API instead of ACME. They shouldn’t have to.
This is where cert-manager’s extensibility model becomes essential. The same controller architecture that handles Let’s Encrypt challenges can integrate with any certificate authority—Microsoft AD CS, EJBCA, internal REST APIs, or that legacy PKI system running on hardware from 2008. The pattern is consistent: watch for CertificateRequest resources, translate them into your CA’s native format, and write the signed certificate back to the cluster.
The gap between cert-manager’s built-in issuers and enterprise PKI requirements isn’t a limitation—it’s an integration opportunity. Building a custom issuer transforms certificate management from a recurring operational burden into a one-time infrastructure investment.
The Certificate Lifecycle Problem at Scale
Certificate management follows a predictable trajectory in every growing organization. What begins as a handful of manually provisioned TLS certificates quickly spirals into an operational nightmare as microservices multiply and deployment frequency increases.

The Manual Management Breaking Point
Consider a platform running fifty services across three environments. Each service requires certificates for mutual TLS, ingress termination, and internal API authentication. With ninety-day certificate lifetimes—a security best practice—the operations team faces a constant stream of renewal work. Miss a single expiration, and production traffic drops. The cognitive overhead alone makes manual processes unsustainable.
The math is unforgiving. Fifty services across three environments means 150 certificates minimum. With ninety-day lifetimes, that translates to roughly five certificate renewals every single day. No team can sustain this manually while also shipping features.
cert-manager solves this automation problem elegantly for public certificate authorities. Deploy an ACME issuer pointed at Let’s Encrypt, annotate your Ingress resources, and certificates materialize automatically. The controller handles issuance, renewal, and secret management without human intervention.
Where Built-in Issuers Fall Short
Enterprise environments rarely operate with public CAs alone. Internal PKI systems—HashiCorp Vault, Microsoft Active Directory Certificate Services, custom certificate authorities built on EJBCA or similar platforms—remain the backbone of zero-trust architectures. These systems enforce organizational policies around key escrow, certificate templates, approval workflows, and audit logging that public ACME providers cannot satisfy.
cert-manager ships with issuers for Vault and Venafi, but the certificate authority landscape is fragmented. Many organizations run proprietary PKI solutions, legacy systems with custom APIs, or cloud provider certificate services that lack native cert-manager integration. The gap between what cert-manager supports out of the box and what enterprise PKI demands creates a significant operational burden.
💡 Pro Tip: Before building a custom issuer, verify that no existing issuer or external-issuer project covers your CA. The cert-manager ecosystem includes community-maintained issuers for AWS Private CA, Google Cloud CAS, and several other platforms.
Bridging the Automation Gap
Custom issuers extend cert-manager’s controller pattern to communicate with any certificate authority. They translate Kubernetes-native CertificateRequest resources into API calls against internal PKI systems, then write the resulting certificates back as Kubernetes Secrets. Application teams interact exclusively with standard cert-manager resources while the custom issuer handles protocol translation, authentication, and error handling behind the scenes.
This approach delivers the automation benefits of cert-manager—declarative configuration, automatic renewal, GitOps compatibility—while preserving the security policies and audit trails that enterprise PKI systems provide.
Understanding how cert-manager’s extension architecture enables this integration requires examining the issuer abstraction itself.
Anatomy of a cert-manager Issuer: Understanding the Extension Points
Before building a custom issuer, you need a solid understanding of how cert-manager orchestrates certificate lifecycle operations. The architecture follows a controller pattern that cleanly separates certificate requests from the issuance logic, creating well-defined extension points for custom PKI integration.

The CertificateRequest Flow
Every certificate operation in cert-manager ultimately flows through the CertificateRequest resource. When a Certificate resource is created or needs renewal, the cert-manager controller generates a CertificateRequest containing the CSR (Certificate Signing Request) and references to the target issuer.
The flow proceeds as follows:
- A Certificate resource specifies the desired certificate properties and references an issuer
- The cert-manager controller creates a CertificateRequest with the generated CSR
- An issuer controller watches for CertificateRequests that reference its issuer type
- The issuer controller processes the request, communicates with the PKI backend, and updates the CertificateRequest status with the signed certificate
- cert-manager extracts the certificate and stores it in the specified Secret
This decoupled design means your custom issuer only needs to implement step 4—watching CertificateRequests and populating them with signed certificates. The cert-manager core handles CSR generation, Secret management, and renewal scheduling.
Cluster-Scoped vs Namespace-Scoped Issuers
cert-manager provides two issuer scope levels with distinct multi-tenancy implications:
Issuer resources are namespace-scoped. A Certificate in namespace team-alpha can only reference an Issuer in the same namespace. This provides strong isolation—each team manages their own issuer configuration and credentials without visibility into other namespaces.
ClusterIssuer resources are cluster-scoped. Any Certificate in any namespace can reference a ClusterIssuer. This centralization simplifies management but requires careful consideration of who can create Certificates referencing shared issuers.
For enterprise environments with internal PKI, you typically want ClusterIssuers managed by the platform team, combined with RBAC policies and admission controllers that restrict which namespaces or teams can reference specific issuers. A development ClusterIssuer might issue certificates from an internal CA with short validity periods, while a production ClusterIssuer connects to your enterprise PKI with stricter issuance policies.
💡 Pro Tip: Custom issuers should support both Issuer and ClusterIssuer variants from the start. The implementation difference is minimal—primarily how you resolve the issuer reference and locate its credentials—but retrofitting namespace-scoped support later creates unnecessary API versioning complexity.
The External Issuer Controller Pattern
Custom issuers follow the external issuer pattern: a separate controller deployment that watches for CertificateRequests referencing your custom issuer types. Your controller needs to:
- Define Custom Resource Definitions (CRDs) for your issuer types (both namespaced and cluster-scoped variants)
- Implement a controller that reconciles CertificateRequest resources
- Check the issuer reference to determine if the request targets your issuer type
- Validate the request against your issuer’s policy
- Communicate with your PKI backend to obtain the signed certificate
- Update the CertificateRequest status with the certificate chain or failure reason
The webhook architecture comes into play for validation. Your issuer CRDs should include validating admission webhooks that prevent invalid configurations—checking that referenced credentials exist, CA endpoints are reachable, and policy constraints are syntactically correct. Catching configuration errors at admission time provides immediate feedback rather than cryptic reconciliation failures.
cert-manager also supports approval policies through the CertificateRequest approval condition. Your issuer can implement approval logic that evaluates requests against organizational policies before proceeding with issuance, adding a policy enforcement layer between request and fulfillment.
With this architectural foundation in place, we can move to implementation. The next section walks through building a custom issuer controller in Go that integrates with an internal PKI system.
Implementing a Custom Issuer Controller in Go
Building a custom cert-manager issuer requires implementing a Kubernetes controller that watches for CertificateRequest resources and fulfills them through your internal PKI. This section walks through the complete implementation, from project scaffolding to production-ready reconciliation logic. By the end, you’ll have a fully functional issuer that integrates seamlessly with cert-manager’s certificate lifecycle management.
Scaffolding with Kubebuilder
Kubebuilder provides the foundation for building Kubernetes controllers with proper scaffolding, RBAC generation, and webhook support. It generates the boilerplate code for custom resource definitions, controller logic, and manager setup—allowing you to focus on the business logic of certificate signing. Initialize your project and create the necessary API types:
kubebuilder init --domain mycompany.io --repo github.com/mycompany/internal-pki-issuerkubebuilder create api --group pki --version v1alpha1 --kind InternalIssuerkubebuilder create api --group pki --version v1alpha1 --kind InternalClusterIssuerThe first command initializes the project structure with the domain suffix used for your CRD API groups. The subsequent commands create both a namespace-scoped issuer and a cluster-scoped variant, mirroring cert-manager’s own Issuer and ClusterIssuer pattern. This dual-scope approach provides flexibility: namespace-scoped issuers restrict certificate issuance to specific namespaces, while cluster-scoped issuers serve certificates across the entire cluster.
The custom issuer CRD defines configuration for connecting to your internal PKI system. Define the spec to capture authentication credentials and PKI endpoint details:
type InternalIssuerSpec struct { // URL is the base URL of the internal PKI API URL string `json:"url"`
// AuthSecretName references a Secret containing PKI credentials AuthSecretName string `json:"authSecretName"`
// CABundle contains PEM-encoded CA certificates for PKI API TLS CABundle []byte `json:"caBundle,omitempty"`
// TemplateName specifies the certificate template in the PKI system TemplateName string `json:"templateName"`}
type InternalIssuerStatus struct { Conditions []metav1.Condition `json:"conditions,omitempty"`}The AuthSecretName field references a Kubernetes Secret containing credentials for authenticating to your PKI API—typically an API token, client certificate, or service account credentials. Storing credentials in a Secret rather than directly in the issuer spec follows Kubernetes security best practices and enables credential rotation without modifying the issuer resource.
The Reconcile Loop
The controller watches CertificateRequest resources and processes those targeting your issuer type. The reconciliation logic handles the complete lifecycle from CSR extraction to certificate delivery. Kubernetes controllers follow an eventual consistency model: the Reconcile function is called whenever a watched resource changes, and your code must compare the current state against the desired state, taking action to converge them.
func (r *CertificateRequestReconciler) Reconcile(ctx context.Context, req ctrl.Request) (ctrl.Result, error) { log := log.FromContext(ctx)
// Fetch the CertificateRequest cr := &cmapi.CertificateRequest{} if err := r.Get(ctx, req.NamespacedName, cr); err != nil { return ctrl.Result{}, client.IgnoreNotFound(err) }
// Verify this request targets our issuer type if cr.Spec.IssuerRef.Group != "pki.mycompany.io" { return ctrl.Result{}, nil }
// Skip if already processed if cmutil.CertificateRequestHasCondition(cr, cmapi.CertificateRequestCondition{ Type: cmapi.CertificateRequestConditionReady, Status: cmmeta.ConditionTrue, }) { return ctrl.Result{}, nil }
// Fetch the issuer configuration issuer, err := r.getIssuer(ctx, cr) if err != nil { log.Error(err, "failed to retrieve issuer") return ctrl.Result{}, r.setStatus(ctx, cr, cmmeta.ConditionFalse, "IssuerNotFound", err.Error()) }
// Sign the certificate signedCert, ca, err := r.signCertificate(ctx, issuer, cr.Spec.Request) if err != nil { return r.handleSigningError(ctx, cr, err) }
// Update the CertificateRequest with signed certificate cr.Status.Certificate = signedCert cr.Status.CA = ca
return ctrl.Result{}, r.setStatus(ctx, cr, cmmeta.ConditionTrue, "Issued", "Certificate issued successfully")}The early return pattern for already-processed requests is critical for idempotency. Without this guard, the controller would resubmit signing requests to your PKI on every reconciliation—potentially triggered by unrelated status updates or periodic resyncs. The condition check ensures each CertificateRequest is signed exactly once.
Integrating with Your Internal PKI
The signing function bridges cert-manager’s CSR format to your PKI API. Extract the CSR, submit it to your PKI system, and return the signed certificate chain. This separation of concerns keeps the reconciler focused on Kubernetes resource management while delegating cryptographic operations to a dedicated client:
func (r *CertificateRequestReconciler) signCertificate( ctx context.Context, issuer *pkiv1alpha1.InternalIssuer, csrPEM []byte,) ([]byte, []byte, error) { // Parse and validate the CSR csr, err := parseCSR(csrPEM) if err != nil { return nil, nil, fmt.Errorf("invalid CSR: %w", err) }
// Build the PKI client with credentials from Secret client, err := r.buildPKIClient(ctx, issuer) if err != nil { return nil, nil, fmt.Errorf("failed to create PKI client: %w", err) }
// Submit signing request to internal PKI resp, err := client.SignCertificate(ctx, &pki.SignRequest{ CSR: csrPEM, TemplateName: issuer.Spec.TemplateName, ValidityDays: 365, }) if err != nil { return nil, nil, fmt.Errorf("PKI signing failed: %w", err) }
return resp.Certificate, resp.CAChain, nil}The buildPKIClient function should fetch the referenced Secret, extract credentials, and configure an HTTP client with appropriate TLS settings using the issuer’s CA bundle. Consider caching the client per issuer to avoid repeated Secret lookups on high-volume clusters.
Error Handling and Retry Logic
Transient failures from network issues or PKI system unavailability require careful handling. Distinguish between permanent failures that should mark the request as failed versus temporary issues that warrant retry. This distinction directly impacts user experience: permanent failures surface immediately in Certificate status, while transient failures are silently retried until resolution or timeout:
func (r *CertificateRequestReconciler) handleSigningError( ctx context.Context, cr *cmapi.CertificateRequest, err error,) (ctrl.Result, error) { var pkiErr *pki.Error if errors.As(err, &pkiErr) { switch { case pkiErr.IsRateLimited(): // Retry with exponential backoff return ctrl.Result{RequeueAfter: pkiErr.RetryAfter()}, nil
case pkiErr.IsTransient(): // Network or temporary PKI issues - retry in 30 seconds return ctrl.Result{RequeueAfter: 30 * time.Second}, nil
case pkiErr.IsPolicyViolation(): // Permanent failure - CSR violates PKI policy return ctrl.Result{}, r.setStatus(ctx, cr, cmmeta.ConditionFalse, "PolicyViolation", pkiErr.Message) } }
// Unknown errors get exponential backoff via controller-runtime return ctrl.Result{}, err}💡 Pro Tip: Implement circuit breaker patterns when your PKI system experiences extended outages. This prevents overwhelming the PKI with retry storms and allows graceful degradation while maintaining visibility into the backlog of pending requests.
Register the controller with appropriate watches to trigger reconciliation on CertificateRequest changes and issuer updates. The watch on InternalIssuer resources ensures that changes to issuer configuration—such as rotated credentials or updated endpoints—trigger reprocessing of pending certificate requests:
func (r *CertificateRequestReconciler) SetupWithManager(mgr ctrl.Manager) error { return ctrl.NewControllerManagedBy(mgr). For(&cmapi.CertificateRequest{}). Watches( &pkiv1alpha1.InternalIssuer{}, handler.EnqueueRequestsFromMapFunc(r.findRequestsForIssuer), ). WithOptions(controller.Options{ MaxConcurrentReconciles: 5, }). Complete(r)}The MaxConcurrentReconciles setting controls parallelism within your controller. For high-volume environments with hundreds of certificate requests, increase this value to improve throughput—but monitor your PKI system’s capacity to avoid overwhelming it with concurrent signing requests.
With the controller implemented, you have a functional custom issuer that integrates your internal PKI with cert-manager’s automation. The next step is establishing policies that govern which namespaces and workloads can request certificates from specific issuers—a critical requirement for multi-tenant environments.
Multi-Tenant Certificate Policies with ClusterIssuer Constraints
Shared Kubernetes clusters introduce a governance challenge: development teams need certificate autonomy while security teams require enforcement of organizational policies. Without constraints, a single namespace can request certificates with excessive durations, weak key sizes, or unauthorized domain patterns—creating compliance gaps and security vulnerabilities across the entire cluster.
cert-manager’s policy framework addresses this through CertificateRequestPolicy resources and the approver-policy controller, enabling fine-grained control over what certificates each tenant can request. This separation of concerns allows platform teams to define guardrails once while developers retain self-service certificate provisioning within those boundaries.
Namespace-Scoped Certificate Policies
The approver-policy controller intercepts CertificateRequest resources and evaluates them against defined policies before approval. Unlike admission webhooks that reject requests outright, this approach leaves denied requests visible in the cluster with clear status messages explaining the violation. Install the controller alongside cert-manager:
apiVersion: v1kind: Namespacemetadata: name: cert-manager---apiVersion: helm.toolkit.fluxcd.io/v2beta1kind: HelmReleasemetadata: name: cert-manager-approver-policy namespace: cert-managerspec: interval: 1h chart: spec: chart: cert-manager-approver-policy version: v0.14.1 sourceRef: kind: HelmRepository name: jetstack values: app: metrics: enabled: trueDefine a policy that restricts certificate parameters per namespace. This example enforces production-grade requirements for the payments namespace:
apiVersion: policy.cert-manager.io/v1alpha1kind: CertificateRequestPolicymetadata: name: payments-namespace-policyspec: allowed: commonName: value: "*.payments.internal.acme.corp" dnsNames: values: - "*.payments.internal.acme.corp" - "*.payments.acme.corp" usages: - digital signature - key encipherment - server auth constraints: minDuration: 168h # Minimum 7 days maxDuration: 2160h # Maximum 90 days privateKey: algorithm: ECDSA minSize: 256 selector: issuerRef: name: internal-pki-issuer kind: ClusterIssuer group: cert-manager.io namespace: matchNames: - paymentsThe constraints block enforces key size minimums and duration bounds, while the allowed block restricts which DNS names and usages teams can request. Any CertificateRequest violating these policies remains in a Denied state with a clear status message. The selector ensures this policy only applies to requests targeting the specified ClusterIssuer from the payments namespace—other namespaces require their own policies or fall back to a default-deny posture.
Multiple policies can apply to a single namespace, allowing layered governance. A base policy might enforce organization-wide minimums (2048-bit RSA keys, maximum 90-day validity) while team-specific policies add domain restrictions. The approver-policy controller requires all matching policies to pass before approving a request.
Implementing Approval Workflows
High-sensitivity certificates—those for production APIs, mTLS roots, or external-facing services—warrant human approval before issuance. Combine CertificateRequestPolicy with RBAC to create approval workflows that pause certificate issuance until a security engineer reviews the request:
apiVersion: policy.cert-manager.io/v1alpha1kind: CertificateRequestPolicymetadata: name: external-certificates-require-approvalspec: allowed: dnsNames: values: - "*.acme.corp" - "api.acme.corp" constraints: maxDuration: 720h # 30 days max for external certs privateKey: algorithm: RSA minSize: 4096 selector: issuerRef: name: public-issuer kind: ClusterIssuer namespace: matchNames: - production - staging---apiVersion: rbac.authorization.k8s.io/v1kind: ClusterRolemetadata: name: certificate-approverrules: - apiGroups: ["cert-manager.io"] resources: ["certificaterequests/status"] verbs: ["update"] - apiGroups: ["cert-manager.io"] resources: ["certificaterequests"] verbs: ["get", "list", "watch"]Security engineers bound to this role can approve pending requests through kubectl:
kubectl cert-manager approve payments-api-tls-abc123 \ --reason "Reviewed and approved per ticket SEC-4521" \ -n productionThe approval reason becomes part of the CertificateRequest’s status, providing an audit trail for compliance reviews. Requests awaiting approval remain pending indefinitely until explicitly approved or denied, ensuring no certificate issues without proper authorization.
💡 Pro Tip: Integrate approval workflows with your ticketing system using a small admission webhook. The webhook can automatically approve CertificateRequests that include an annotation referencing a valid, approved change ticket, reducing manual toil while maintaining governance.
Policy Violation Visibility
Teams need visibility into why certificate requests fail. Without proper observability, developers waste time debugging configuration issues that stem from policy violations. Configure alerts on denied requests to surface policy violations early:
apiVersion: monitoring.coreos.com/v1kind: PrometheusRulemetadata: name: certificate-policy-violations namespace: cert-managerspec: groups: - name: cert-manager-policy rules: - alert: CertificateRequestDenied expr: | increase(certmanager_certificaterequest_denied_total[5m]) > 0 for: 0m labels: severity: warning annotations: summary: "Certificate request denied by policy" description: "Namespace {{ $labels.namespace }} has denied certificate requests. Check CertificateRequest status for policy violation details."Beyond alerting, consider publishing policy documentation alongside your ClusterIssuers. A ConfigMap in each namespace summarizing allowed domains, key requirements, and duration limits helps developers craft compliant Certificate resources on the first attempt. This feedback loop ensures developers understand policy boundaries without requiring security team intervention for every rejected request.
With governance policies in place, the next challenge is automating certificate delivery to workloads. The following section explores injection patterns for Ingress controllers and service mesh sidecars.
Automated Certificate Injection for Ingress and Service Mesh
With your custom issuer controller deployed and tenant policies in place, the next step is eliminating manual certificate configuration from application deployments. cert-manager’s annotation-based provisioning transforms certificate management from an explicit task into an implicit infrastructure concern, reducing operational overhead while maintaining security guarantees.
Ingress Certificate Automation
The cert-manager.io/cluster-issuer annotation triggers automatic Certificate resource creation when you deploy an Ingress:
apiVersion: networking.k8s.io/v1kind: Ingressmetadata: name: api-gateway namespace: payments annotations: cert-manager.io/cluster-issuer: "internal-pki-issuer" cert-manager.io/common-name: "api.payments.internal" cert-manager.io/duration: "720h" cert-manager.io/renew-before: "168h"spec: ingressClassName: nginx tls: - hosts: - api.payments.internal - api.payments.svc.cluster.local secretName: api-gateway-tls rules: - host: api.payments.internal http: paths: - path: / pathType: Prefix backend: service: name: api-gateway port: number: 8080cert-manager watches for Ingress resources with these annotations and synthesizes a Certificate resource automatically. The generated certificate inherits the hosts from the tls block and stores the resulting key pair in the specified secret. This approach eliminates the need for separate Certificate manifests, keeping your deployment artifacts focused on application concerns rather than infrastructure plumbing.
💡 Pro Tip: Set
cert-manager.io/revision-history-limit: "3"to retain previous certificate versions, enabling rapid rollback if a newly issued certificate causes TLS handshake failures.
Istio mTLS Integration
Istio’s service mesh requires certificates at the sidecar level. While Istio includes its own CA (istiod), enterprises often need certificates from their internal PKI for compliance. The istio-csr project bridges this gap:
app: certmanager: issuer: group: cert-manager.io kind: ClusterIssuer name: internal-pki-issuer tls: certificateDuration: 24h renewBefore: 8h rootCAFile: /var/run/secrets/istio-csr/ca.pem
server: clusterID: production-east maxCertificateDuration: 48hDeploy istio-csr to intercept certificate signing requests from Istio sidecars and forward them to your custom issuer:
apiVersion: security.istio.io/v1beta1kind: PeerAuthenticationmetadata: name: default namespace: istio-systemspec: mtls: mode: STRICT---apiVersion: security.istio.io/v1beta1kind: AuthorizationPolicymetadata: name: require-valid-cert namespace: paymentsspec: action: ALLOW rules: - from: - source: principals: - cluster.local/ns/payments/sa/*This configuration enforces strict mTLS across the mesh while ensuring all workload certificates originate from your internal PKI.
Linkerd mTLS Integration
Linkerd takes a different approach to certificate management. Rather than intercepting CSRs, Linkerd expects you to provide a trust anchor and issuer certificate directly. Configure cert-manager to manage the Linkerd identity issuer:
apiVersion: cert-manager.io/v1kind: Certificatemetadata: name: linkerd-identity-issuer namespace: linkerdspec: secretName: linkerd-identity-issuer duration: 48h renewBefore: 12h issuerRef: name: internal-pki-issuer kind: ClusterIssuer commonName: identity.linkerd.cluster.local isCA: true usages: - cert sign - crl sign - server auth - client authLinkerd’s control plane automatically detects certificate rotation and propagates new credentials to proxies without requiring pod restarts, providing seamless certificate lifecycle management.
Managing Certificate Dependencies in CI/CD
Certificate provisioning introduces ordering dependencies. A pod mounting a TLS secret fails if that secret doesn’t exist. Handle this with init containers that wait for certificate readiness:
apiVersion: apps/v1kind: Deploymentmetadata: name: payment-processor namespace: paymentsspec: template: spec: initContainers: - name: wait-for-cert image: bitnami/kubectl:1.29 command: - /bin/sh - -c - | until kubectl get secret payment-processor-tls \ -n payments \ -o jsonpath='{.data.tls\.crt}' | base64 -d | \ openssl x509 -noout -checkend 3600; do echo "Waiting for valid certificate..." sleep 5 done securityContext: runAsNonRoot: true readOnlyRootFilesystem: true containers: - name: processor volumeMounts: - name: tls mountPath: /etc/tls readOnly: true volumes: - name: tls secret: secretName: payment-processor-tlsFor GitOps workflows with Argo CD or Flux, declare Certificate resources as sync-wave predecessors to dependent workloads, ensuring certificates exist before pods attempt to mount them. In Argo CD, use the argocd.argoproj.io/sync-wave: "-1" annotation on Certificate resources to guarantee they reconcile before application deployments. Flux users can leverage dependsOn in Kustomization resources to establish explicit ordering between certificate provisioning and workload deployment.
Consider implementing health checks that validate certificate chains during deployment pipelines. This catches configuration errors—such as incorrect issuer references or invalid subject alternative names—before they propagate to production environments.
With certificates flowing automatically to ingress controllers and service mesh sidecars, your infrastructure gains observability requirements. The next section covers monitoring certificate health and building alerting systems that catch renewal failures before they cause outages.
Monitoring, Alerting, and Renewal Observability
Certificate expiration remains one of the most preventable causes of production outages. A single expired certificate can cascade into authentication failures, broken service mesh communication, and customer-facing downtime. Robust observability transforms certificate management from a ticking time bomb into a predictable, automated process.
Essential Prometheus Metrics
cert-manager exposes metrics on port 9402 by default. The most critical metrics for operational health include certificate expiration timestamps, request durations, and controller reconciliation errors.
apiVersion: monitoring.coreos.com/v1kind: ServiceMonitormetadata: name: cert-manager namespace: cert-managerspec: selector: matchLabels: app.kubernetes.io/name: cert-manager endpoints: - port: tcp-prometheus-servicemonitor interval: 30s path: /metricsThe certmanager_certificate_expiration_timestamp_seconds metric provides the Unix timestamp when each certificate expires. Combined with certmanager_certificate_ready_status, you gain visibility into both current health and future risk. Additional metrics worth tracking include certmanager_controller_sync_call_count for understanding controller throughput and certmanager_http_acme_client_request_duration_seconds for identifying latency issues with ACME providers.
For custom issuers, expose metrics that track signing latency, queue depth, and error rates by issuer type. These custom metrics provide granular visibility into issuer-specific bottlenecks that generic cert-manager metrics cannot capture.
Proactive Alerting Rules
Alerts should fire well before certificates expire, giving teams time to investigate and remediate. Configure multiple severity tiers based on time remaining, and ensure your alerting strategy accounts for both individual certificate failures and systemic issues affecting multiple certificates simultaneously.
apiVersion: monitoring.coreos.com/v1kind: PrometheusRulemetadata: name: cert-manager-alerts namespace: monitoringspec: groups: - name: certificates rules: - alert: CertificateExpiringSoon expr: | (certmanager_certificate_expiration_timestamp_seconds - time()) < 604800 and certmanager_certificate_ready_status{condition="True"} == 1 for: 1h labels: severity: warning annotations: summary: "Certificate {{ $labels.name }} expires in less than 7 days"
- alert: CertificateRenewalFailed expr: | certmanager_certificate_ready_status{condition="False"} == 1 for: 15m labels: severity: critical annotations: summary: "Certificate {{ $labels.name }} is not ready"
- alert: CertificateRequestStuck expr: | certmanager_controller_sync_call_count{controller="certificaterequests-issuer-custom-issuer"} - certmanager_controller_sync_call_count{controller="certificaterequests-issuer-custom-issuer"} offset 10m == 0 for: 30m labels: severity: warning annotations: summary: "Custom issuer controller appears stuck"💡 Pro Tip: Set your warning threshold to at least twice your renewal window. If certificates renew 30 days before expiration, alert at 14 days remaining to catch renewal failures before they become urgent.
Consider adding aggregate alerts that trigger when a percentage of certificates in a namespace or cluster enter a degraded state. A single failed certificate might indicate an isolated issue, but multiple simultaneous failures often signal infrastructure problems like network partitions or issuer unavailability.
Debugging Failed Requests
When alerts fire, kubectl provides the fastest path to root cause analysis. Certificate resources maintain conditions that reveal the current state and any errors encountered during issuance.
## Check certificate status and conditionskubectl describe certificate api-gateway-tls -n production
## Examine the underlying CertificateRequestkubectl get certificaterequest -n production -o wide
## View controller events for the certificatekubectl get events -n production --field-selector involvedObject.name=api-gateway-tls
## Check custom issuer controller logskubectl logs -n cert-manager -l app=custom-issuer-controller --tail=100Common failure patterns include RBAC misconfigurations preventing secret creation, network policies blocking communication with internal PKI endpoints, and rate limiting from certificate authorities. The CertificateRequest’s status.conditions and status.failureTime fields pinpoint exactly where the issuance pipeline stalled.
For custom issuers, instrument your controller with structured logging that correlates request IDs across the signing workflow. This correlation becomes invaluable when tracing why a specific certificate failed to issue while others succeeded. Include contextual information such as the issuer configuration, requested DNS names, and any upstream error responses from your PKI infrastructure.
Building runbooks that map specific error conditions to remediation steps accelerates incident response. Document the most frequent failure modes your team encounters, along with the kubectl commands and log queries that expose root causes.
With comprehensive observability in place, the final consideration is ensuring your certificate infrastructure survives failure scenarios and scales with your organization’s growth.
Production Hardening and Disaster Recovery
Certificate infrastructure failures cascade quickly. When your root CA becomes unavailable or certificate secrets disappear, every mTLS connection in your cluster breaks simultaneously. Building resilience into your custom issuer deployment requires deliberate backup strategies, tested rotation procedures, and staging environments that mirror production failure modes.
Backup Strategies for Certificate State
Your custom issuer manages two categories of critical data: the issuer configuration (CRDs, controller deployments, RBAC) and the certificate secrets themselves. GitOps handles the first category naturally—store your issuer manifests in version control and let ArgoCD or Flux maintain desired state.
Certificate secrets require different treatment. While cert-manager regenerates certificates automatically, the private keys backing your CA hierarchy cannot be recreated. Use Velero with the --include-cluster-scoped-resources flag to snapshot ClusterIssuer resources alongside their backing secrets. Schedule backups to coincide with your CA signing key rotation windows, ensuring each backup contains a complete, consistent CA state.
For secrets containing intermediate CA certificates, enable encryption at rest using a KMS provider external to the cluster. If your etcd backup becomes compromised, encrypted secrets remain protected by keys stored in your HSM or cloud KMS.
CA Rotation Without Disruption
Rotating a CA certificate while maintaining service continuity requires overlapping validity periods. Configure your custom issuer to support multiple active signing certificates during the transition window. When rotation begins, the issuer signs new certificates with the replacement CA while workloads continue trusting both the old and new chains.
Implement a three-phase rotation: first, distribute the new CA certificate to trust stores across all workloads. Second, switch signing to the new CA. Third, after all certificates issued by the old CA expire, remove it from trust stores. Your controller should expose the current rotation phase as a status condition, enabling external tooling to gate deployments during sensitive transition periods.
Staging Environment Validation
Never test certificate renewal in production for the first time. Create a staging environment with accelerated certificate lifetimes—issue certificates valid for hours rather than months. This compression exposes renewal logic bugs that would otherwise take weeks to manifest.
Inject failure scenarios deliberately: revoke the CA certificate, corrupt the signing key secret, simulate HSM unavailability. Your issuer should emit events and metrics that clearly indicate the failure mode, enabling operators to diagnose issues without parsing controller logs.
💡 Pro Tip: Use chaos engineering tools to randomly delete certificate secrets in staging. Verify that your issuer recreates them within your SLA before the dependent workloads notice the disruption.
With resilience patterns established, you have a complete custom issuer implementation—from controller logic through production hardening. The patterns covered throughout this guide provide a foundation for integrating any internal PKI system with cert-manager’s powerful automation capabilities.
Key Takeaways
- Start with cert-manager’s external issuer template and customize the Reconcile loop to integrate your PKI’s signing API
- Implement ClusterIssuer policies early to enforce certificate standards before developers start requesting certificates directly
- Set up Prometheus alerts on cert_manager_certificate_expiration_timestamp_seconds with a 14-day threshold to catch renewal failures before they cause outages
- Test CA rotation procedures in staging quarterly—the first time you rotate a CA shouldn’t be during an incident