Hero image for Building Custom cert-manager Issuers: From Internal PKI to Zero-Touch Certificate Automation

Building Custom cert-manager Issuers: From Internal PKI to Zero-Touch Certificate Automation


Your security team just mandated that all internal services use certificates from the corporate PKI, but cert-manager only ships with Let’s Encrypt and Vault issuers. You’re staring at hundreds of microservices that need certificates rotated every 90 days, and the manual process that worked for five services won’t scale to five hundred.

The math is brutal. Five hundred services times four rotations per year equals two thousand certificate operations annually. Each rotation requires generating a CSR, submitting it to your internal CA, waiting for approval, downloading the certificate, and updating the Kubernetes secret. Even at fifteen minutes per operation—an optimistic estimate when you factor in ticket queues and approval workflows—you’re looking at five hundred hours of annual toil. That’s three months of an engineer’s time spent copying certificates around.

Meanwhile, your developers expect the same seamless experience they get with public certificates. They want to annotate an Ingress, watch the certificate appear, and forget about it until the next security audit. They don’t care that your corporate CA speaks a proprietary API instead of ACME. They shouldn’t have to.

This is where cert-manager’s extensibility model becomes essential. The same controller architecture that handles Let’s Encrypt challenges can integrate with any certificate authority—Microsoft AD CS, EJBCA, internal REST APIs, or that legacy PKI system running on hardware from 2008. The pattern is consistent: watch for CertificateRequest resources, translate them into your CA’s native format, and write the signed certificate back to the cluster.

The gap between cert-manager’s built-in issuers and enterprise PKI requirements isn’t a limitation—it’s an integration opportunity. Building a custom issuer transforms certificate management from a recurring operational burden into a one-time infrastructure investment.

The Certificate Lifecycle Problem at Scale

Certificate management follows a predictable trajectory in every growing organization. What begins as a handful of manually provisioned TLS certificates quickly spirals into an operational nightmare as microservices multiply and deployment frequency increases.

Visual: Certificate lifecycle complexity at scale

The Manual Management Breaking Point

Consider a platform running fifty services across three environments. Each service requires certificates for mutual TLS, ingress termination, and internal API authentication. With ninety-day certificate lifetimes—a security best practice—the operations team faces a constant stream of renewal work. Miss a single expiration, and production traffic drops. The cognitive overhead alone makes manual processes unsustainable.

The math is unforgiving. Fifty services across three environments means 150 certificates minimum. With ninety-day lifetimes, that translates to roughly five certificate renewals every single day. No team can sustain this manually while also shipping features.

cert-manager solves this automation problem elegantly for public certificate authorities. Deploy an ACME issuer pointed at Let’s Encrypt, annotate your Ingress resources, and certificates materialize automatically. The controller handles issuance, renewal, and secret management without human intervention.

Where Built-in Issuers Fall Short

Enterprise environments rarely operate with public CAs alone. Internal PKI systems—HashiCorp Vault, Microsoft Active Directory Certificate Services, custom certificate authorities built on EJBCA or similar platforms—remain the backbone of zero-trust architectures. These systems enforce organizational policies around key escrow, certificate templates, approval workflows, and audit logging that public ACME providers cannot satisfy.

cert-manager ships with issuers for Vault and Venafi, but the certificate authority landscape is fragmented. Many organizations run proprietary PKI solutions, legacy systems with custom APIs, or cloud provider certificate services that lack native cert-manager integration. The gap between what cert-manager supports out of the box and what enterprise PKI demands creates a significant operational burden.

💡 Pro Tip: Before building a custom issuer, verify that no existing issuer or external-issuer project covers your CA. The cert-manager ecosystem includes community-maintained issuers for AWS Private CA, Google Cloud CAS, and several other platforms.

Bridging the Automation Gap

Custom issuers extend cert-manager’s controller pattern to communicate with any certificate authority. They translate Kubernetes-native CertificateRequest resources into API calls against internal PKI systems, then write the resulting certificates back as Kubernetes Secrets. Application teams interact exclusively with standard cert-manager resources while the custom issuer handles protocol translation, authentication, and error handling behind the scenes.

This approach delivers the automation benefits of cert-manager—declarative configuration, automatic renewal, GitOps compatibility—while preserving the security policies and audit trails that enterprise PKI systems provide.

Understanding how cert-manager’s extension architecture enables this integration requires examining the issuer abstraction itself.

Anatomy of a cert-manager Issuer: Understanding the Extension Points

Before building a custom issuer, you need a solid understanding of how cert-manager orchestrates certificate lifecycle operations. The architecture follows a controller pattern that cleanly separates certificate requests from the issuance logic, creating well-defined extension points for custom PKI integration.

Visual: cert-manager extension architecture

The CertificateRequest Flow

Every certificate operation in cert-manager ultimately flows through the CertificateRequest resource. When a Certificate resource is created or needs renewal, the cert-manager controller generates a CertificateRequest containing the CSR (Certificate Signing Request) and references to the target issuer.

The flow proceeds as follows:

  1. A Certificate resource specifies the desired certificate properties and references an issuer
  2. The cert-manager controller creates a CertificateRequest with the generated CSR
  3. An issuer controller watches for CertificateRequests that reference its issuer type
  4. The issuer controller processes the request, communicates with the PKI backend, and updates the CertificateRequest status with the signed certificate
  5. cert-manager extracts the certificate and stores it in the specified Secret

This decoupled design means your custom issuer only needs to implement step 4—watching CertificateRequests and populating them with signed certificates. The cert-manager core handles CSR generation, Secret management, and renewal scheduling.

Cluster-Scoped vs Namespace-Scoped Issuers

cert-manager provides two issuer scope levels with distinct multi-tenancy implications:

Issuer resources are namespace-scoped. A Certificate in namespace team-alpha can only reference an Issuer in the same namespace. This provides strong isolation—each team manages their own issuer configuration and credentials without visibility into other namespaces.

ClusterIssuer resources are cluster-scoped. Any Certificate in any namespace can reference a ClusterIssuer. This centralization simplifies management but requires careful consideration of who can create Certificates referencing shared issuers.

For enterprise environments with internal PKI, you typically want ClusterIssuers managed by the platform team, combined with RBAC policies and admission controllers that restrict which namespaces or teams can reference specific issuers. A development ClusterIssuer might issue certificates from an internal CA with short validity periods, while a production ClusterIssuer connects to your enterprise PKI with stricter issuance policies.

💡 Pro Tip: Custom issuers should support both Issuer and ClusterIssuer variants from the start. The implementation difference is minimal—primarily how you resolve the issuer reference and locate its credentials—but retrofitting namespace-scoped support later creates unnecessary API versioning complexity.

The External Issuer Controller Pattern

Custom issuers follow the external issuer pattern: a separate controller deployment that watches for CertificateRequests referencing your custom issuer types. Your controller needs to:

  • Define Custom Resource Definitions (CRDs) for your issuer types (both namespaced and cluster-scoped variants)
  • Implement a controller that reconciles CertificateRequest resources
  • Check the issuer reference to determine if the request targets your issuer type
  • Validate the request against your issuer’s policy
  • Communicate with your PKI backend to obtain the signed certificate
  • Update the CertificateRequest status with the certificate chain or failure reason

The webhook architecture comes into play for validation. Your issuer CRDs should include validating admission webhooks that prevent invalid configurations—checking that referenced credentials exist, CA endpoints are reachable, and policy constraints are syntactically correct. Catching configuration errors at admission time provides immediate feedback rather than cryptic reconciliation failures.

cert-manager also supports approval policies through the CertificateRequest approval condition. Your issuer can implement approval logic that evaluates requests against organizational policies before proceeding with issuance, adding a policy enforcement layer between request and fulfillment.

With this architectural foundation in place, we can move to implementation. The next section walks through building a custom issuer controller in Go that integrates with an internal PKI system.

Implementing a Custom Issuer Controller in Go

Building a custom cert-manager issuer requires implementing a Kubernetes controller that watches for CertificateRequest resources and fulfills them through your internal PKI. This section walks through the complete implementation, from project scaffolding to production-ready reconciliation logic. By the end, you’ll have a fully functional issuer that integrates seamlessly with cert-manager’s certificate lifecycle management.

Scaffolding with Kubebuilder

Kubebuilder provides the foundation for building Kubernetes controllers with proper scaffolding, RBAC generation, and webhook support. It generates the boilerplate code for custom resource definitions, controller logic, and manager setup—allowing you to focus on the business logic of certificate signing. Initialize your project and create the necessary API types:

terminal
kubebuilder init --domain mycompany.io --repo github.com/mycompany/internal-pki-issuer
kubebuilder create api --group pki --version v1alpha1 --kind InternalIssuer
kubebuilder create api --group pki --version v1alpha1 --kind InternalClusterIssuer

The first command initializes the project structure with the domain suffix used for your CRD API groups. The subsequent commands create both a namespace-scoped issuer and a cluster-scoped variant, mirroring cert-manager’s own Issuer and ClusterIssuer pattern. This dual-scope approach provides flexibility: namespace-scoped issuers restrict certificate issuance to specific namespaces, while cluster-scoped issuers serve certificates across the entire cluster.

The custom issuer CRD defines configuration for connecting to your internal PKI system. Define the spec to capture authentication credentials and PKI endpoint details:

api/v1alpha1/internalissuer_types.go
type InternalIssuerSpec struct {
// URL is the base URL of the internal PKI API
URL string `json:"url"`
// AuthSecretName references a Secret containing PKI credentials
AuthSecretName string `json:"authSecretName"`
// CABundle contains PEM-encoded CA certificates for PKI API TLS
CABundle []byte `json:"caBundle,omitempty"`
// TemplateName specifies the certificate template in the PKI system
TemplateName string `json:"templateName"`
}
type InternalIssuerStatus struct {
Conditions []metav1.Condition `json:"conditions,omitempty"`
}

The AuthSecretName field references a Kubernetes Secret containing credentials for authenticating to your PKI API—typically an API token, client certificate, or service account credentials. Storing credentials in a Secret rather than directly in the issuer spec follows Kubernetes security best practices and enables credential rotation without modifying the issuer resource.

The Reconcile Loop

The controller watches CertificateRequest resources and processes those targeting your issuer type. The reconciliation logic handles the complete lifecycle from CSR extraction to certificate delivery. Kubernetes controllers follow an eventual consistency model: the Reconcile function is called whenever a watched resource changes, and your code must compare the current state against the desired state, taking action to converge them.

internal/controller/certificaterequest_controller.go
func (r *CertificateRequestReconciler) Reconcile(ctx context.Context, req ctrl.Request) (ctrl.Result, error) {
log := log.FromContext(ctx)
// Fetch the CertificateRequest
cr := &cmapi.CertificateRequest{}
if err := r.Get(ctx, req.NamespacedName, cr); err != nil {
return ctrl.Result{}, client.IgnoreNotFound(err)
}
// Verify this request targets our issuer type
if cr.Spec.IssuerRef.Group != "pki.mycompany.io" {
return ctrl.Result{}, nil
}
// Skip if already processed
if cmutil.CertificateRequestHasCondition(cr, cmapi.CertificateRequestCondition{
Type: cmapi.CertificateRequestConditionReady,
Status: cmmeta.ConditionTrue,
}) {
return ctrl.Result{}, nil
}
// Fetch the issuer configuration
issuer, err := r.getIssuer(ctx, cr)
if err != nil {
log.Error(err, "failed to retrieve issuer")
return ctrl.Result{}, r.setStatus(ctx, cr, cmmeta.ConditionFalse,
"IssuerNotFound", err.Error())
}
// Sign the certificate
signedCert, ca, err := r.signCertificate(ctx, issuer, cr.Spec.Request)
if err != nil {
return r.handleSigningError(ctx, cr, err)
}
// Update the CertificateRequest with signed certificate
cr.Status.Certificate = signedCert
cr.Status.CA = ca
return ctrl.Result{}, r.setStatus(ctx, cr, cmmeta.ConditionTrue,
"Issued", "Certificate issued successfully")
}

The early return pattern for already-processed requests is critical for idempotency. Without this guard, the controller would resubmit signing requests to your PKI on every reconciliation—potentially triggered by unrelated status updates or periodic resyncs. The condition check ensures each CertificateRequest is signed exactly once.

Integrating with Your Internal PKI

The signing function bridges cert-manager’s CSR format to your PKI API. Extract the CSR, submit it to your PKI system, and return the signed certificate chain. This separation of concerns keeps the reconciler focused on Kubernetes resource management while delegating cryptographic operations to a dedicated client:

internal/controller/signer.go
func (r *CertificateRequestReconciler) signCertificate(
ctx context.Context,
issuer *pkiv1alpha1.InternalIssuer,
csrPEM []byte,
) ([]byte, []byte, error) {
// Parse and validate the CSR
csr, err := parseCSR(csrPEM)
if err != nil {
return nil, nil, fmt.Errorf("invalid CSR: %w", err)
}
// Build the PKI client with credentials from Secret
client, err := r.buildPKIClient(ctx, issuer)
if err != nil {
return nil, nil, fmt.Errorf("failed to create PKI client: %w", err)
}
// Submit signing request to internal PKI
resp, err := client.SignCertificate(ctx, &pki.SignRequest{
CSR: csrPEM,
TemplateName: issuer.Spec.TemplateName,
ValidityDays: 365,
})
if err != nil {
return nil, nil, fmt.Errorf("PKI signing failed: %w", err)
}
return resp.Certificate, resp.CAChain, nil
}

The buildPKIClient function should fetch the referenced Secret, extract credentials, and configure an HTTP client with appropriate TLS settings using the issuer’s CA bundle. Consider caching the client per issuer to avoid repeated Secret lookups on high-volume clusters.

Error Handling and Retry Logic

Transient failures from network issues or PKI system unavailability require careful handling. Distinguish between permanent failures that should mark the request as failed versus temporary issues that warrant retry. This distinction directly impacts user experience: permanent failures surface immediately in Certificate status, while transient failures are silently retried until resolution or timeout:

internal/controller/errors.go
func (r *CertificateRequestReconciler) handleSigningError(
ctx context.Context,
cr *cmapi.CertificateRequest,
err error,
) (ctrl.Result, error) {
var pkiErr *pki.Error
if errors.As(err, &pkiErr) {
switch {
case pkiErr.IsRateLimited():
// Retry with exponential backoff
return ctrl.Result{RequeueAfter: pkiErr.RetryAfter()}, nil
case pkiErr.IsTransient():
// Network or temporary PKI issues - retry in 30 seconds
return ctrl.Result{RequeueAfter: 30 * time.Second}, nil
case pkiErr.IsPolicyViolation():
// Permanent failure - CSR violates PKI policy
return ctrl.Result{}, r.setStatus(ctx, cr, cmmeta.ConditionFalse,
"PolicyViolation", pkiErr.Message)
}
}
// Unknown errors get exponential backoff via controller-runtime
return ctrl.Result{}, err
}

💡 Pro Tip: Implement circuit breaker patterns when your PKI system experiences extended outages. This prevents overwhelming the PKI with retry storms and allows graceful degradation while maintaining visibility into the backlog of pending requests.

Register the controller with appropriate watches to trigger reconciliation on CertificateRequest changes and issuer updates. The watch on InternalIssuer resources ensures that changes to issuer configuration—such as rotated credentials or updated endpoints—trigger reprocessing of pending certificate requests:

internal/controller/setup.go
func (r *CertificateRequestReconciler) SetupWithManager(mgr ctrl.Manager) error {
return ctrl.NewControllerManagedBy(mgr).
For(&cmapi.CertificateRequest{}).
Watches(
&pkiv1alpha1.InternalIssuer{},
handler.EnqueueRequestsFromMapFunc(r.findRequestsForIssuer),
).
WithOptions(controller.Options{
MaxConcurrentReconciles: 5,
}).
Complete(r)
}

The MaxConcurrentReconciles setting controls parallelism within your controller. For high-volume environments with hundreds of certificate requests, increase this value to improve throughput—but monitor your PKI system’s capacity to avoid overwhelming it with concurrent signing requests.

With the controller implemented, you have a functional custom issuer that integrates your internal PKI with cert-manager’s automation. The next step is establishing policies that govern which namespaces and workloads can request certificates from specific issuers—a critical requirement for multi-tenant environments.

Multi-Tenant Certificate Policies with ClusterIssuer Constraints

Shared Kubernetes clusters introduce a governance challenge: development teams need certificate autonomy while security teams require enforcement of organizational policies. Without constraints, a single namespace can request certificates with excessive durations, weak key sizes, or unauthorized domain patterns—creating compliance gaps and security vulnerabilities across the entire cluster.

cert-manager’s policy framework addresses this through CertificateRequestPolicy resources and the approver-policy controller, enabling fine-grained control over what certificates each tenant can request. This separation of concerns allows platform teams to define guardrails once while developers retain self-service certificate provisioning within those boundaries.

Namespace-Scoped Certificate Policies

The approver-policy controller intercepts CertificateRequest resources and evaluates them against defined policies before approval. Unlike admission webhooks that reject requests outright, this approach leaves denied requests visible in the cluster with clear status messages explaining the violation. Install the controller alongside cert-manager:

approver-policy-install.yaml
apiVersion: v1
kind: Namespace
metadata:
name: cert-manager
---
apiVersion: helm.toolkit.fluxcd.io/v2beta1
kind: HelmRelease
metadata:
name: cert-manager-approver-policy
namespace: cert-manager
spec:
interval: 1h
chart:
spec:
chart: cert-manager-approver-policy
version: v0.14.1
sourceRef:
kind: HelmRepository
name: jetstack
values:
app:
metrics:
enabled: true

Define a policy that restricts certificate parameters per namespace. This example enforces production-grade requirements for the payments namespace:

payments-certificate-policy.yaml
apiVersion: policy.cert-manager.io/v1alpha1
kind: CertificateRequestPolicy
metadata:
name: payments-namespace-policy
spec:
allowed:
commonName:
value: "*.payments.internal.acme.corp"
dnsNames:
values:
- "*.payments.internal.acme.corp"
- "*.payments.acme.corp"
usages:
- digital signature
- key encipherment
- server auth
constraints:
minDuration: 168h # Minimum 7 days
maxDuration: 2160h # Maximum 90 days
privateKey:
algorithm: ECDSA
minSize: 256
selector:
issuerRef:
name: internal-pki-issuer
kind: ClusterIssuer
group: cert-manager.io
namespace:
matchNames:
- payments

The constraints block enforces key size minimums and duration bounds, while the allowed block restricts which DNS names and usages teams can request. Any CertificateRequest violating these policies remains in a Denied state with a clear status message. The selector ensures this policy only applies to requests targeting the specified ClusterIssuer from the payments namespace—other namespaces require their own policies or fall back to a default-deny posture.

Multiple policies can apply to a single namespace, allowing layered governance. A base policy might enforce organization-wide minimums (2048-bit RSA keys, maximum 90-day validity) while team-specific policies add domain restrictions. The approver-policy controller requires all matching policies to pass before approving a request.

Implementing Approval Workflows

High-sensitivity certificates—those for production APIs, mTLS roots, or external-facing services—warrant human approval before issuance. Combine CertificateRequestPolicy with RBAC to create approval workflows that pause certificate issuance until a security engineer reviews the request:

sensitive-approval-policy.yaml
apiVersion: policy.cert-manager.io/v1alpha1
kind: CertificateRequestPolicy
metadata:
name: external-certificates-require-approval
spec:
allowed:
dnsNames:
values:
- "*.acme.corp"
- "api.acme.corp"
constraints:
maxDuration: 720h # 30 days max for external certs
privateKey:
algorithm: RSA
minSize: 4096
selector:
issuerRef:
name: public-issuer
kind: ClusterIssuer
namespace:
matchNames:
- production
- staging
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: certificate-approver
rules:
- apiGroups: ["cert-manager.io"]
resources: ["certificaterequests/status"]
verbs: ["update"]
- apiGroups: ["cert-manager.io"]
resources: ["certificaterequests"]
verbs: ["get", "list", "watch"]

Security engineers bound to this role can approve pending requests through kubectl:

Terminal window
kubectl cert-manager approve payments-api-tls-abc123 \
--reason "Reviewed and approved per ticket SEC-4521" \
-n production

The approval reason becomes part of the CertificateRequest’s status, providing an audit trail for compliance reviews. Requests awaiting approval remain pending indefinitely until explicitly approved or denied, ensuring no certificate issues without proper authorization.

💡 Pro Tip: Integrate approval workflows with your ticketing system using a small admission webhook. The webhook can automatically approve CertificateRequests that include an annotation referencing a valid, approved change ticket, reducing manual toil while maintaining governance.

Policy Violation Visibility

Teams need visibility into why certificate requests fail. Without proper observability, developers waste time debugging configuration issues that stem from policy violations. Configure alerts on denied requests to surface policy violations early:

policy-violation-prometheusrule.yaml
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
name: certificate-policy-violations
namespace: cert-manager
spec:
groups:
- name: cert-manager-policy
rules:
- alert: CertificateRequestDenied
expr: |
increase(certmanager_certificaterequest_denied_total[5m]) > 0
for: 0m
labels:
severity: warning
annotations:
summary: "Certificate request denied by policy"
description: "Namespace {{ $labels.namespace }} has denied certificate requests. Check CertificateRequest status for policy violation details."

Beyond alerting, consider publishing policy documentation alongside your ClusterIssuers. A ConfigMap in each namespace summarizing allowed domains, key requirements, and duration limits helps developers craft compliant Certificate resources on the first attempt. This feedback loop ensures developers understand policy boundaries without requiring security team intervention for every rejected request.

With governance policies in place, the next challenge is automating certificate delivery to workloads. The following section explores injection patterns for Ingress controllers and service mesh sidecars.

Automated Certificate Injection for Ingress and Service Mesh

With your custom issuer controller deployed and tenant policies in place, the next step is eliminating manual certificate configuration from application deployments. cert-manager’s annotation-based provisioning transforms certificate management from an explicit task into an implicit infrastructure concern, reducing operational overhead while maintaining security guarantees.

Ingress Certificate Automation

The cert-manager.io/cluster-issuer annotation triggers automatic Certificate resource creation when you deploy an Ingress:

ingress-with-auto-tls.yaml
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: api-gateway
namespace: payments
annotations:
cert-manager.io/cluster-issuer: "internal-pki-issuer"
cert-manager.io/common-name: "api.payments.internal"
cert-manager.io/duration: "720h"
cert-manager.io/renew-before: "168h"
spec:
ingressClassName: nginx
tls:
- hosts:
- api.payments.internal
- api.payments.svc.cluster.local
secretName: api-gateway-tls
rules:
- host: api.payments.internal
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: api-gateway
port:
number: 8080

cert-manager watches for Ingress resources with these annotations and synthesizes a Certificate resource automatically. The generated certificate inherits the hosts from the tls block and stores the resulting key pair in the specified secret. This approach eliminates the need for separate Certificate manifests, keeping your deployment artifacts focused on application concerns rather than infrastructure plumbing.

💡 Pro Tip: Set cert-manager.io/revision-history-limit: "3" to retain previous certificate versions, enabling rapid rollback if a newly issued certificate causes TLS handshake failures.

Istio mTLS Integration

Istio’s service mesh requires certificates at the sidecar level. While Istio includes its own CA (istiod), enterprises often need certificates from their internal PKI for compliance. The istio-csr project bridges this gap:

istio-csr-values.yaml
app:
certmanager:
issuer:
group: cert-manager.io
kind: ClusterIssuer
name: internal-pki-issuer
tls:
certificateDuration: 24h
renewBefore: 8h
rootCAFile: /var/run/secrets/istio-csr/ca.pem
server:
clusterID: production-east
maxCertificateDuration: 48h

Deploy istio-csr to intercept certificate signing requests from Istio sidecars and forward them to your custom issuer:

mesh-certificate-policy.yaml
apiVersion: security.istio.io/v1beta1
kind: PeerAuthentication
metadata:
name: default
namespace: istio-system
spec:
mtls:
mode: STRICT
---
apiVersion: security.istio.io/v1beta1
kind: AuthorizationPolicy
metadata:
name: require-valid-cert
namespace: payments
spec:
action: ALLOW
rules:
- from:
- source:
principals:
- cluster.local/ns/payments/sa/*

This configuration enforces strict mTLS across the mesh while ensuring all workload certificates originate from your internal PKI.

Linkerd mTLS Integration

Linkerd takes a different approach to certificate management. Rather than intercepting CSRs, Linkerd expects you to provide a trust anchor and issuer certificate directly. Configure cert-manager to manage the Linkerd identity issuer:

linkerd-identity-issuer.yaml
apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
name: linkerd-identity-issuer
namespace: linkerd
spec:
secretName: linkerd-identity-issuer
duration: 48h
renewBefore: 12h
issuerRef:
name: internal-pki-issuer
kind: ClusterIssuer
commonName: identity.linkerd.cluster.local
isCA: true
usages:
- cert sign
- crl sign
- server auth
- client auth

Linkerd’s control plane automatically detects certificate rotation and propagates new credentials to proxies without requiring pod restarts, providing seamless certificate lifecycle management.

Managing Certificate Dependencies in CI/CD

Certificate provisioning introduces ordering dependencies. A pod mounting a TLS secret fails if that secret doesn’t exist. Handle this with init containers that wait for certificate readiness:

deployment-with-cert-wait.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: payment-processor
namespace: payments
spec:
template:
spec:
initContainers:
- name: wait-for-cert
image: bitnami/kubectl:1.29
command:
- /bin/sh
- -c
- |
until kubectl get secret payment-processor-tls \
-n payments \
-o jsonpath='{.data.tls\.crt}' | base64 -d | \
openssl x509 -noout -checkend 3600; do
echo "Waiting for valid certificate..."
sleep 5
done
securityContext:
runAsNonRoot: true
readOnlyRootFilesystem: true
containers:
- name: processor
volumeMounts:
- name: tls
mountPath: /etc/tls
readOnly: true
volumes:
- name: tls
secret:
secretName: payment-processor-tls

For GitOps workflows with Argo CD or Flux, declare Certificate resources as sync-wave predecessors to dependent workloads, ensuring certificates exist before pods attempt to mount them. In Argo CD, use the argocd.argoproj.io/sync-wave: "-1" annotation on Certificate resources to guarantee they reconcile before application deployments. Flux users can leverage dependsOn in Kustomization resources to establish explicit ordering between certificate provisioning and workload deployment.

Consider implementing health checks that validate certificate chains during deployment pipelines. This catches configuration errors—such as incorrect issuer references or invalid subject alternative names—before they propagate to production environments.

With certificates flowing automatically to ingress controllers and service mesh sidecars, your infrastructure gains observability requirements. The next section covers monitoring certificate health and building alerting systems that catch renewal failures before they cause outages.

Monitoring, Alerting, and Renewal Observability

Certificate expiration remains one of the most preventable causes of production outages. A single expired certificate can cascade into authentication failures, broken service mesh communication, and customer-facing downtime. Robust observability transforms certificate management from a ticking time bomb into a predictable, automated process.

Essential Prometheus Metrics

cert-manager exposes metrics on port 9402 by default. The most critical metrics for operational health include certificate expiration timestamps, request durations, and controller reconciliation errors.

servicemonitor.yaml
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: cert-manager
namespace: cert-manager
spec:
selector:
matchLabels:
app.kubernetes.io/name: cert-manager
endpoints:
- port: tcp-prometheus-servicemonitor
interval: 30s
path: /metrics

The certmanager_certificate_expiration_timestamp_seconds metric provides the Unix timestamp when each certificate expires. Combined with certmanager_certificate_ready_status, you gain visibility into both current health and future risk. Additional metrics worth tracking include certmanager_controller_sync_call_count for understanding controller throughput and certmanager_http_acme_client_request_duration_seconds for identifying latency issues with ACME providers.

For custom issuers, expose metrics that track signing latency, queue depth, and error rates by issuer type. These custom metrics provide granular visibility into issuer-specific bottlenecks that generic cert-manager metrics cannot capture.

Proactive Alerting Rules

Alerts should fire well before certificates expire, giving teams time to investigate and remediate. Configure multiple severity tiers based on time remaining, and ensure your alerting strategy accounts for both individual certificate failures and systemic issues affecting multiple certificates simultaneously.

prometheus-rules.yaml
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
name: cert-manager-alerts
namespace: monitoring
spec:
groups:
- name: certificates
rules:
- alert: CertificateExpiringSoon
expr: |
(certmanager_certificate_expiration_timestamp_seconds - time()) < 604800
and certmanager_certificate_ready_status{condition="True"} == 1
for: 1h
labels:
severity: warning
annotations:
summary: "Certificate {{ $labels.name }} expires in less than 7 days"
- alert: CertificateRenewalFailed
expr: |
certmanager_certificate_ready_status{condition="False"} == 1
for: 15m
labels:
severity: critical
annotations:
summary: "Certificate {{ $labels.name }} is not ready"
- alert: CertificateRequestStuck
expr: |
certmanager_controller_sync_call_count{controller="certificaterequests-issuer-custom-issuer"}
- certmanager_controller_sync_call_count{controller="certificaterequests-issuer-custom-issuer"} offset 10m == 0
for: 30m
labels:
severity: warning
annotations:
summary: "Custom issuer controller appears stuck"

💡 Pro Tip: Set your warning threshold to at least twice your renewal window. If certificates renew 30 days before expiration, alert at 14 days remaining to catch renewal failures before they become urgent.

Consider adding aggregate alerts that trigger when a percentage of certificates in a namespace or cluster enter a degraded state. A single failed certificate might indicate an isolated issue, but multiple simultaneous failures often signal infrastructure problems like network partitions or issuer unavailability.

Debugging Failed Requests

When alerts fire, kubectl provides the fastest path to root cause analysis. Certificate resources maintain conditions that reveal the current state and any errors encountered during issuance.

Terminal window
## Check certificate status and conditions
kubectl describe certificate api-gateway-tls -n production
## Examine the underlying CertificateRequest
kubectl get certificaterequest -n production -o wide
## View controller events for the certificate
kubectl get events -n production --field-selector involvedObject.name=api-gateway-tls
## Check custom issuer controller logs
kubectl logs -n cert-manager -l app=custom-issuer-controller --tail=100

Common failure patterns include RBAC misconfigurations preventing secret creation, network policies blocking communication with internal PKI endpoints, and rate limiting from certificate authorities. The CertificateRequest’s status.conditions and status.failureTime fields pinpoint exactly where the issuance pipeline stalled.

For custom issuers, instrument your controller with structured logging that correlates request IDs across the signing workflow. This correlation becomes invaluable when tracing why a specific certificate failed to issue while others succeeded. Include contextual information such as the issuer configuration, requested DNS names, and any upstream error responses from your PKI infrastructure.

Building runbooks that map specific error conditions to remediation steps accelerates incident response. Document the most frequent failure modes your team encounters, along with the kubectl commands and log queries that expose root causes.

With comprehensive observability in place, the final consideration is ensuring your certificate infrastructure survives failure scenarios and scales with your organization’s growth.

Production Hardening and Disaster Recovery

Certificate infrastructure failures cascade quickly. When your root CA becomes unavailable or certificate secrets disappear, every mTLS connection in your cluster breaks simultaneously. Building resilience into your custom issuer deployment requires deliberate backup strategies, tested rotation procedures, and staging environments that mirror production failure modes.

Backup Strategies for Certificate State

Your custom issuer manages two categories of critical data: the issuer configuration (CRDs, controller deployments, RBAC) and the certificate secrets themselves. GitOps handles the first category naturally—store your issuer manifests in version control and let ArgoCD or Flux maintain desired state.

Certificate secrets require different treatment. While cert-manager regenerates certificates automatically, the private keys backing your CA hierarchy cannot be recreated. Use Velero with the --include-cluster-scoped-resources flag to snapshot ClusterIssuer resources alongside their backing secrets. Schedule backups to coincide with your CA signing key rotation windows, ensuring each backup contains a complete, consistent CA state.

For secrets containing intermediate CA certificates, enable encryption at rest using a KMS provider external to the cluster. If your etcd backup becomes compromised, encrypted secrets remain protected by keys stored in your HSM or cloud KMS.

CA Rotation Without Disruption

Rotating a CA certificate while maintaining service continuity requires overlapping validity periods. Configure your custom issuer to support multiple active signing certificates during the transition window. When rotation begins, the issuer signs new certificates with the replacement CA while workloads continue trusting both the old and new chains.

Implement a three-phase rotation: first, distribute the new CA certificate to trust stores across all workloads. Second, switch signing to the new CA. Third, after all certificates issued by the old CA expire, remove it from trust stores. Your controller should expose the current rotation phase as a status condition, enabling external tooling to gate deployments during sensitive transition periods.

Staging Environment Validation

Never test certificate renewal in production for the first time. Create a staging environment with accelerated certificate lifetimes—issue certificates valid for hours rather than months. This compression exposes renewal logic bugs that would otherwise take weeks to manifest.

Inject failure scenarios deliberately: revoke the CA certificate, corrupt the signing key secret, simulate HSM unavailability. Your issuer should emit events and metrics that clearly indicate the failure mode, enabling operators to diagnose issues without parsing controller logs.

💡 Pro Tip: Use chaos engineering tools to randomly delete certificate secrets in staging. Verify that your issuer recreates them within your SLA before the dependent workloads notice the disruption.

With resilience patterns established, you have a complete custom issuer implementation—from controller logic through production hardening. The patterns covered throughout this guide provide a foundation for integrating any internal PKI system with cert-manager’s powerful automation capabilities.

Key Takeaways

  • Start with cert-manager’s external issuer template and customize the Reconcile loop to integrate your PKI’s signing API
  • Implement ClusterIssuer policies early to enforce certificate standards before developers start requesting certificates directly
  • Set up Prometheus alerts on cert_manager_certificate_expiration_timestamp_seconds with a 14-day threshold to catch renewal failures before they cause outages
  • Test CA rotation procedures in staging quarterly—the first time you rotate a CA shouldn’t be during an incident