Feb 17, 2026

NATS JetStream: Persistent Messaging and Event Replay in Production

Your service crashed at 2am and restarted clean—but the 847 events published while it was down are gone forever. No dead-letter queue, no replay, no audit trail. Just a gap in your event stream and a support ticket waiting to happen.

This is the sharp edge of core NATS. The broker achieves sub-millisecond latency precisely because it makes no durability guarantees—publish fires, subscribers receive or they don’t, and the server moves on. For internal fan-out, cache invalidation signals, or ephemeral coordination between stable services, that tradeoff is perfectly reasonable. But the moment you introduce consumer restarts, rolling deployments, or any requirement to reconstruct state from your event history, the fire-and-forget model stops being a feature and starts being a liability.

The instinct at that point is to reach for Kafka, or bolt on Redis Streams, or stand up a separate persistence layer alongside NATS. That instinct is wrong—not because those systems are bad, but because the solution is already sitting inside the NATS server binary you’re already running.

JetStream is not a plugin, a sidecar, or a separate process. It is a persistence and streaming layer built directly into the NATS server, activated by a single configuration flag, sharing the same connection infrastructure and security model your existing services already use. It brings durable consumers, consumer groups, acknowledged delivery, and full event replay—backed by a RAFT consensus protocol that handles cluster consistency without any external coordination.

Before getting into how JetStream works and how to deploy it in Kubernetes, it’s worth being precise about where exactly core NATS breaks down—because the failure modes are specific, and understanding them shapes every architectural decision that follows.

Core NATS vs JetStream: Where Ephemeral Pub/Sub Breaks Down

Core NATS operates on a fire-and-forget model that delivers sub-millisecond latency at scale. Messages route from publisher to subscriber with minimal overhead—no disk I/O, no replication negotiation, no acknowledgment ceremony. For high-throughput, low-latency scenarios where all consumers are online and keeping pace, this model is optimal. The problem surfaces the moment that assumption breaks.

Visual: diagram comparing ephemeral pub/sub message loss versus JetStream durable stream persistence

The Persistence Gap in Core NATS

When a subscriber disconnects and reconnects—even for a few seconds during a pod restart on Kubernetes—every message published during that window is permanently lost. Core NATS has no buffer, no backlog, no recovery path. The broker delivers or discards; there is no third option.

This isn’t a design flaw. It’s a deliberate tradeoff that makes NATS exceptionally fast. But three specific failure patterns consistently push teams toward persistence:

Consumer restarts. In a Kubernetes environment, pods restart constantly—rolling deployments, OOM kills, node evictions. A consumer that goes down for 30 seconds with a publisher emitting 10,000 messages per second has a 300,000-message gap it can never recover from. Upstream systems don’t replay; they’ve already moved on.

Slow consumers. When a subscriber processes messages slower than the publisher produces them, core NATS applies backpressure and eventually drops messages rather than buffer indefinitely. There’s no concept of “catch up later.” A slow consumer is effectively a lossy consumer.

Replay requirements. Audit trails, event sourcing, and new service onboarding all share the same need: access to historical events. Core NATS provides none. A new microservice joining a topic sees only messages published after it subscribes.

JetStream as a Native Extension

JetStream ships inside the same NATS server binary. There’s no separate process to deploy, no sidecar to manage, no external storage system to integrate at the messaging layer. Enabling JetStream is a configuration flag; the server handles the rest.

This architecture matters operationally. Teams evaluating Kafka frequently cite its dependency surface—ZooKeeper (or KRaft), separate broker processes, schema registry—as operational weight. JetStream uses RAFT-based consensus across cluster nodes for replication and leader election, which means it achieves cluster consistency without any external coordination service. The NATS cluster is self-contained.

💡 Pro Tip: JetStream’s RAFT implementation operates per-stream, not per-cluster. This means a cluster can run both JetStream streams with full replication and core NATS subjects simultaneously, letting you adopt persistence incrementally rather than migrating everything at once.

Understanding when to reach for JetStream is straightforward: if any of the three failure modes above apply to your workload, core NATS is insufficient. If none of them apply, JetStream adds complexity without benefit. The next section examines exactly how JetStream models persistent data through streams and consumers—the two primitives that replace NATS’s stateless topic model.

Streams and Consumers: The JetStream Data Model

JetStream introduces two foundational abstractions that transform NATS from an ephemeral message bus into a durable event store: streams and consumers. Understanding how these two primitives interact is the prerequisite for building reliable event-driven systems on top of NATS.

Visual: diagram of JetStream streams as ordered logs with consumers tracking position via sequence numbers

Streams: Durable, Ordered Logs

A stream is a persistent, ordered sequence of messages bound to one or more subject filters. When a publisher sends a message to a subject captured by a stream’s filter—say orders.> or events.payments.*—JetStream writes that message to disk and assigns it a monotonically increasing sequence number. That sequence number is the bedrock of everything JetStream does: replay, deduplication, and consumer position tracking all derive from it.

Streams are configured with retention policies that govern how messages age out. LimitsPolicy enforces bounds on message count, total byte size, and per-message age. WorkQueuePolicy deletes a message once all consumers have acknowledged it. InterestPolicy retains messages only while at least one consumer exists. Selecting the wrong retention policy is one of the most common early mistakes: a work-queue stream silently discards messages that no consumer has claimed, which produces data loss that looks identical to a slow consumer.

Consumers: Stateful Readers

A consumer is a stateful cursor into a stream. It tracks delivery position, manages acknowledgment state, and enforces retry semantics. JetStream offers two delivery modes: push and pull.

Push consumers have the server deliver messages to a fixed subject. They maximize throughput in single-subscriber scenarios but provide no built-in backpressure mechanism—a slow subscriber accumulates an unbounded pending queue on the server. Use push consumers when your subscriber is consistently faster than the ingest rate and horizontal scaling is not a requirement.

Pull consumers invert the flow: subscribers request batches of messages explicitly. This model is strictly preferable in production. Pull consumers compose naturally with worker pools, enable precise backpressure, and allow multiple competing consumers to drain a single stream without coordination overhead. When you scale a deployment horizontally, pull consumers distribute work automatically across all replicas without any additional configuration.

Consumers are further classified as durable or ephemeral. A durable consumer persists its acknowledgment state across restarts and network partitions—it has a name, and the server remembers where it left off. An ephemeral consumer exists only for the lifetime of the subscribing connection. Ephemeral consumers suit short-lived diagnostic reads or one-off replay operations; every long-running service should use a durable consumer.

Acknowledgment Policies and Delivery Guarantees

Three ack policies control delivery semantics. AckNone disables acknowledgment entirely, giving you at-most-once delivery with maximum throughput. AckAll acknowledges all messages up to the acknowledged sequence number, reducing round-trips at the cost of weaker per-message guarantees. AckExplicit requires a distinct acknowledgment per message and is the only policy that enables true at-least-once delivery with redelivery on timeout.

💡 Pro Tip: Pair AckExplicit with idempotent message handlers and NATS’s built-in deduplication window to approach exactly-once semantics without a distributed transaction coordinator.

With streams providing ordered, numbered storage and pull consumers providing backpressure-aware, durable reads, you have everything needed to implement deterministic replay. The next step is getting this infrastructure running—which means deploying JetStream on Kubernetes with the official Helm chart.

Setting Up JetStream on Kubernetes with Helm

Deploying NATS JetStream on Kubernetes requires more than flipping a feature flag. You need persistent volumes, a properly configured StatefulSet, and cluster-aware replication settings to survive pod restarts and node failures. The official NATS Helm chart handles the heavy lifting, but the defaults are not production-ready out of the box. This section walks through a three-node deployment with durable storage, anti-affinity scheduling, and post-deploy verification.

Prerequisites

Add the NATS Helm repository and update your index:

helm repo add nats https://nats-io.github.io/k8s/helm/charts/
helm repo update

You will also need kubectl configured against your target cluster and the NATS CLI installed for post-deploy verification. Ensure your cluster has a storage provisioner capable of dynamically binding ReadWriteOnce PVCs before proceeding — JetStream’s file storage backend will remain unbound otherwise.

Configuring values.yaml

Create a values.yaml that enables JetStream, sets up a three-node cluster, and mounts persistent volumes for stream storage:

cluster:
  enabled: true
  name: "prod-nats-cluster"
  replicas: 3

nats:
  jetstream:
    enabled: true
    memStorage:
      enabled: true
      size: 2Gi
    fileStorage:
      enabled: true
      size: 20Gi
      storageClassName: "gp3"
      accessModes:
        - ReadWriteOnce

container:
  env:
    GOMEMLIMIT: 3GiB
  merge:
    resources:
      requests:
        cpu: "500m"
        memory: "2Gi"
      limits:
        cpu: "2"
        memory: "4Gi"

statefulSet:
  merge:
    spec:
      podManagementPolicy: Parallel

podTemplate:
  merge:
    spec:
      affinity:
        podAntiAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            - labelSelector:
                matchLabels:
                  app.kubernetes.io/name: nats
              topologyKey: kubernetes.io/hostname

Several settings here deserve attention.

Storage class: storageClassName: "gp3" targets AWS EBS gp3 volumes. Substitute premium-rrs on Azure or standard-rwo on GKE. If your cluster uses a custom provisioner, confirm it supports ReadWriteOnce access mode — ReadWriteMany is not supported and will cause the StatefulSet to fail scheduling.

Pod anti-affinity: The requiredDuringSchedulingIgnoredDuringExecution rule is non-negotiable for production. It forces each NATS pod onto a distinct node, ensuring a single node failure does not eliminate the majority of your Raft quorum. With three replicas, you can lose one node and continue writing. Dropping this to preferredDuringScheduling on a smaller cluster is acceptable for staging, but never in production.

Memory vs. file storage: The memStorage allocation provides a fast in-memory tier for temporary stream data, while fileStorage backs durable streams that survive pod restarts. JetStream defaults to file storage for any stream carrying a retention policy. Use memory storage only for ephemeral, high-throughput workloads where data loss on restart is explicitly acceptable.

GOMEMLIMIT: Setting GOMEMLIMIT slightly below the container memory limit gives the Go runtime a soft ceiling that reduces the risk of OOM kills under GC pressure. The value here (3GiB against a 4Gi limit) leaves headroom for native memory allocations outside the Go heap.

Deploying the Cluster

helm install nats nats/nats \
  --namespace messaging \
  --create-namespace \
  --values nats-jetstream-values.yaml \
  --version 1.2.9

Wait for all three pods to reach Running status:

kubectl rollout status statefulset/nats -n messaging

Verifying JetStream Health

Install the NATS CLI and configure a context pointing at your cluster:

nats context add prod-nats \
  --server nats://nats.messaging.svc.cluster.local:4222 \
  --description "Production NATS cluster"

nats context select prod-nats

Confirm JetStream is enabled and all cluster members have elected a meta-leader:

nats server check jetstream

You should see output indicating JetStream is enabled across all three servers, with a single meta-leader and two followers. If any server reports disabled, the pod’s persistent volume claim likely failed to bind — check kubectl get pvc -n messaging to diagnose storage provisioner errors.

nats server report jetstream

This command shows per-server stream counts, memory usage, and file storage consumption. A healthy cluster shows consistent stream counts across all members, confirming Raft replication is functioning correctly.

Pro Tip: Run nats server check jetstream --server nats://nats-1.nats.messaging.svc.cluster.local:4222 targeting individual pod DNS names to isolate which cluster member is lagging during a replication incident.

With a healthy three-node cluster confirmed, the next step is creating streams and publishing messages using the Go client — which exposes the JetStream API’s full flexibility around subject mapping, storage policies, and message acknowledgment modes.

Creating Streams and Publishing with the Go Client

With JetStream running on your Kubernetes cluster, the next step is wiring up your application. The nats.go client library exposes a clean API for creating streams, publishing durably, and setting up consumers — all within a few dozen lines of idiomatic Go.

Connecting to the JetStream Context

Start by establishing a NATS connection and acquiring a JetStream context. The JetStream context is the entry point for all persistence-aware operations; standard NATS connection methods handle the underlying transport.

package main

import (
  "log"
  "time"

  "github.com/nats-io/nats.go"
)

func main() {
  nc, err := nats.Connect(
    "nats://nats.messaging.svc.cluster.local:4222",
    nats.RetryOnFailedConnect(true),
    nats.MaxReconnects(10),
    nats.ReconnectWait(2*time.Second),
  )
  if err != nil {
    log.Fatalf("failed to connect to NATS: %v", err)
  }
  defer nc.Drain()

  js, err := nc.JetStream(nats.PublishAsyncMaxPending(256))
  if err != nil {
    log.Fatalf("failed to get JetStream context: %v", err)
  }

  _ = js // proceed to stream creation
}

nats.PublishAsyncMaxPending caps the number of in-flight async publishes, which prevents unbounded memory growth under back-pressure. For synchronous publishing patterns, this option is still worth setting as a circuit-breaker default. Using nc.Drain() rather than nc.Close() is equally important: it flushes pending messages and waits for in-flight acknowledgments before tearing down the connection, avoiding silent message loss during graceful shutdown.

Creating a Stream Programmatically

Streams are idempotent to create: calling AddStream with an existing name and identical configuration is a no-op, making it safe to run at application startup. Conflicting configurations return an error, which surfaces misconfiguration early rather than at publish time.

func ensureStream(js nats.JetStreamContext) error {
  cfg := &nats.StreamConfig{
    Name:      "ORDERS",
    Subjects:  []string{"orders.>"},
    MaxAge:    7 * 24 * time.Hour,
    MaxBytes:  10 * 1024 * 1024 * 1024, // 10 GiB
    Storage:   nats.FileStorage,
    Replicas:  3,
    Retention: nats.LimitsPolicy,
    Discard:   nats.DiscardOld,
  }

  _, err := js.AddStream(cfg)
  if err != nil {
    return fmt.Errorf("AddStream: %w", err)
  }
  return nil
}

Key configuration decisions here:

Subjects: []string{"orders.>"} — the > wildcard captures every token below orders., so orders.created, orders.shipped, and orders.us-east-1.fulfilled all land in this stream.
Storage: nats.FileStorage — persists messages to disk. Use nats.MemoryStorage only for ephemeral high-throughput cases where durability is not required.
MaxAge and MaxBytes — these two limits work in tandem. JetStream enforces whichever threshold is reached first, evicting messages according to the Discard policy. Size your retention window against both your message rate and your slowest consumer’s expected lag.
Replicas: 3 — matches your cluster’s quorum size; a majority of replicas must acknowledge a write before the server confirms it to the publisher.
Discard: nats.DiscardOld — when limits are hit, the oldest messages are evicted. DiscardNew drops incoming messages instead, which suits rate-limiting scenarios.

Publishing with Acknowledgment

Synchronous publishing blocks until the server confirms the message is durable. This is the safest default for transactional workloads and the right starting point before introducing async publish pipelines.

func publishOrder(js nats.JetStreamContext, orderID string, payload []byte) error {
  msg := &nats.Msg{
    Subject: fmt.Sprintf("orders.created.%s", orderID),
    Data:    payload,
    Header:  nats.Header{},
  }
  msg.Header.Set("Content-Type", "application/json")
  msg.Header.Set("Order-ID", orderID)

  ack, err := js.PublishMsg(msg, nats.MsgId(orderID))
  if err != nil {
    return fmt.Errorf("publish failed for order %s: %w", orderID, err)
  }

  if ack.Duplicate {
    log.Printf("duplicate detected for order %s, skipping downstream processing", orderID)
    return nil
  }

  log.Printf("order %s stored at sequence %d in stream %s", orderID, ack.Sequence, ack.Stream)
  return nil
}

nats.MsgId(orderID) enables server-side deduplication. JetStream maintains a deduplication window (configurable via Duplicates on the stream config, defaulting to two minutes) and rejects messages with a previously seen ID, returning the original Ack with Duplicate: true. This gives you exactly-once semantics at the stream layer without application-level tracking.

Note: Always propagate correlation IDs through message headers rather than embedding them in the payload body. Headers are indexed by JetStream and accessible without deserializing the message, which matters when building consumer-side filters or audit tooling.

Creating a Durable Pull Consumer

Pull consumers give your application explicit control over fetch rate and batch size, which is essential for backpressure-aware processing. Unlike push consumers, the server never delivers messages faster than your application requests them.

func ensureConsumer(js nats.JetStreamContext) error {
  _, err := js.AddConsumer("ORDERS", &nats.ConsumerConfig{
    Durable:       "order-processor",
    AckPolicy:     nats.AckExplicitPolicy,
    AckWait:       30 * time.Second,
    MaxDeliver:    5,
    FilterSubject: "orders.created.>",
    DeliverPolicy: nats.DeliverAllPolicy,
  })
  return err
}

AckExplicitPolicy requires each message to be individually acknowledged; the server redelivers any message whose ack is not received within AckWait. MaxDeliver: 5 caps redelivery attempts before the message is considered poisoned. At that point, route it to a dead-letter stream by configuring a separate stream that mirrors the original with a FilterSubject scoped to messages at sequence positions you explicitly Nak with a terminal signal — or use NATS 2.10’s native MaxAckPending combined with an advisory subject to automate the handoff.

The Durable name is what survives server restarts and pod rescheduling: any new instance of your application that connects and references "order-processor" will resume exactly where the previous instance left off, with no sequence gaps or redelivery storms.

With streams and consumers in place, the next natural capability to unlock is replaying historical events — either from a specific sequence number or from a point in time. That is where JetStream’s persistence model starts to differentiate meaningfully from traditional message queues.

Implementing Event Replay: Sequence-Based and Time-Based

Event replay is where JetStream’s persistence model pays off in concrete operational terms. When a service crashes mid-processing, when you bootstrap a new consumer that needs historical state, or when a bug corrupts downstream data and you need to reprocess a window of events—JetStream gives you deterministic control over exactly where replay begins.

DeliverPolicy: Choosing Your Replay Entry Point

JetStream consumers expose five delivery policies that control where in the stream a consumer starts reading:

Policy	Use Case
`DeliverAllPolicy`	Full replay from stream beginning
`DeliverLastPolicy`	Only the most recent message per subject
`DeliverNewPolicy`	Live messages only, no history
`DeliverByStartSequencePolicy`	Replay from a specific sequence number
`DeliverByStartTimePolicy`	Replay from a specific UTC timestamp

The last two are the operational workhorses. Sequence-based replay is precise and cheap—you jump directly to a known position. Time-based replay is useful when you know the incident window but not the exact sequence range.

Sequence-Based Replay for Crash Recovery

When your service shuts down cleanly, persist the last successfully acknowledged sequence number. On restart, use that sequence to resume without reprocessing events you already handled.

package main

import (
    "context"
    "fmt"
    "log"

    "github.com/nats-io/nats.go"
    "github.com/nats-io/nats.go/jetstream"
)

func resumeFromSequence(nc *nats.Conn, lastAckedSeq uint64) error {
    js, err := jetstream.New(nc)
    if err != nil {
        return err
    }

    cons, err := js.CreateOrUpdateConsumer(context.Background(), "ORDERS", jetstream.ConsumerConfig{
        Durable:        "orders-processor-v1",
        DeliverPolicy:  jetstream.DeliverByStartSequencePolicy,
        OptStartSeq:    lastAckedSeq + 1,
        AckPolicy:      jetstream.AckExplicitPolicy,
        FilterSubject:  "orders.>",
        MaxAckPending:  50,
    })
    if err != nil {
        return err
    }

    msgs, err := cons.Messages()
    if err != nil {
        return err
    }

    for {
        msg, err := msgs.Next()
        if err != nil {
            return err
        }

        if err := processOrder(msg.Data()); err != nil {
            msg.Nak()
            continue
        }

        // Persist this sequence before acking so crash recovery is safe
        if err := persistCheckpoint(msg.Metadata().Sequence.Stream); err != nil {
            msg.Nak()
            return err
        }

        msg.Ack()
    }
}

The critical detail: write the checkpoint to durable storage before calling Ack(). If the process dies between Ack() and the checkpoint write, JetStream redelivers the message—but without a checkpoint update, you process it again. Writing the checkpoint first means a crash after the write but before Ack() causes a redeliver that your idempotent handler discards.

💡 Pro Tip: Use the message’s Metadata().Sequence.Stream value (the stream sequence) as your checkpoint key, not Metadata().Sequence.Consumer (which resets if you recreate the consumer).

Time-Based Replay for Incident Reprocessing

When you need to replay everything from 14:00 UTC yesterday because a downstream bug corrupted three hours of order state, DeliverByStartTimePolicy is the right tool:

func replayFromTime(nc *nats.Conn, startTime time.Time) error {
    js, err := jetstream.New(nc)
    if err != nil {
        return err
    }

    // Use an ephemeral consumer—no durable name—so it doesn't conflict
    // with the live production consumer
    cons, err := js.OrderedConsumer(context.Background(), "ORDERS", jetstream.OrderedConsumerConfig{
        DeliverPolicy: jetstream.DeliverByStartTimePolicy,
        OptStartTime:  &startTime,
        FilterSubjects: []string{"orders.>"},
    })
    if err != nil {
        return err
    }

    ctx, cancel := context.WithTimeout(context.Background(), 30*time.Minute)
    defer cancel()

    return cons.Consume(func(msg jetstream.Msg) {
        if err := reprocessOrder(msg.Data()); err != nil {
            log.Printf("reprocess failed seq=%d: %v", msg.Metadata().Sequence.Stream, err)
            msg.Nak()
            return
        }
        msg.Ack()
    }, jetstream.PullExpiry(ctx))
}

Using an OrderedConsumer here is intentional. Ordered consumers are ephemeral, automatically reconnect on failure, and enforce message ordering—exactly what you want for a one-shot reprocessing job that runs alongside production consumers without polluting your durable consumer registry.

Building Idempotent Consumers

Replay only works safely if your consumer is idempotent. The standard pattern is a deduplication key stored in your processing database:

func processOrder(db *sql.DB, seq uint64, data []byte) error {
    var order Order
    if err := json.Unmarshal(data, &order); err != nil {
        return err
    }

    _, err := db.Exec(`
        INSERT INTO orders (id, payload, nats_seq, processed_at)
        VALUES ($1, $2, $3, NOW())
        ON CONFLICT (id) DO NOTHING
    `, order.ID, data, seq)

    return err
}

ON CONFLICT DO NOTHING means replaying an already-processed event is a no-op. The idempotency key is the business entity ID (order.ID), not the NATS sequence, because the same logical event can arrive under different sequence numbers if streams are ever trimmed and republished.

Practical Replay Patterns

New service bootstrap: Deploy a new downstream service with DeliverAllPolicy to build its initial state from the full event history, then switch to DeliverNewPolicy once caught up. JetStream’s consumer position tracking means you transition seamlessly without a gap.

Audit log reconstruction: Use time-bounded replay to reconstruct what the system state looked like at any point in time. Pair with a separate read-only consumer so production processing is unaffected.

Bug replay in staging: Mirror production stream data to staging using NATS subject import/export, then replay specific sequence ranges to reproduce bugs deterministically without touching live consumers.

With replay mechanics in place, the next operational concern is keeping your streams healthy at scale—configuring retention policies, enforcing storage limits, and surfacing the right JetStream metrics before your streams silently drop messages or saturate disk.

Production Considerations: Retention, Limits, and Monitoring

Running JetStream in production without explicit resource bounds is how you wake up to a storage alert at 3 AM. JetStream streams grow until you tell them to stop—and that requires deliberate configuration across three dimensions: what gets retained, how much space is allowed, and how you observe it all.

Retention Policies

JetStream offers three retention models, and choosing the wrong one creates subtle operational problems.

Limits-based retention (the default) keeps messages until they hit a configured ceiling—age, byte count, or message count. This works well for audit logs and event sourcing where you want a sliding window of history regardless of consumer activity.

Interest-based retention removes messages only after all bound consumers have acknowledged them. Use this when every consumer must process every message, such as a fan-out pipeline where multiple downstream services each need a guaranteed copy.

Workqueue retention deletes a message as soon as any single consumer acknowledges it. This mirrors a traditional queue semantic and is the right model for task distribution where you want exactly-one processing across a consumer group.

Mismatching the retention policy to the use case is a common source of storage bloat. A workqueue stream with a slow consumer fills up just as fast as an unconfigured limits-based stream.

Setting Hard Limits

Every stream in production needs explicit ceilings. Set MaxAge to bound the time window—24h for operational events, 168h for anything feeding downstream analytics. Set MaxBytes to cap storage at the infrastructure level; a stream that cannot grow past 10 GB cannot take down your NATS cluster regardless of publish rate. Set MaxMsgs as a secondary guard when you care more about message count than raw byte size.

💡 Pro Tip: Enable Discard: DiscardNew rather than the default DiscardOld when you are running a workqueue. Dropping the oldest unprocessed work silently can corrupt job state; rejecting new publishes makes the producer fail fast and surfaces backpressure explicitly.

Monitoring Consumer Lag and Failure Modes

nats consumer info <stream> <consumer> gives you the NumPending count—messages published but not yet delivered. In Prometheus, the nats_consumer_num_pending metric exposes this for alerting. Alert when NumPending exceeds a threshold relative to your publish rate, not an absolute number.

The two failure modes that catch teams off guard are slow consumers filling streams past their MaxBytes limit, triggering discard policies mid-flight, and AckWait timeouts causing redelivery loops. If a consumer consistently fails to ack within the configured window, redelivered messages stack on top of new work and NumPending spirals upward. The fix is either increasing AckWait to match realistic processing time or adding more consumer instances to reduce per-instance load.

When JetStream Is Not the Right Tool

JetStream does not provide total partition-level ordering across a consumer group—Kafka does. If your consumers must process events in strict per-key order at scale, or if you need a mature ecosystem of connectors, schema registry integration, and stream processing frameworks like Flink, Kafka remains the stronger choice. JetStream wins on operational simplicity and latency; Kafka wins on ordering guarantees and ecosystem depth.

With retention, limits, and observability properly configured, JetStream behaves predictably under load. The next step is validating that behavior with realistic throughput benchmarks and failure injection before your first production deployment.

Key Takeaways

Enable JetStream on an existing NATS cluster with a single config flag—no separate deployment—then define streams per subject namespace with explicit retention limits before publishing production traffic
Use pull consumers with explicit ack policy for all critical workloads; configure ack_wait and max_deliver to bound redelivery behavior and prevent infinite retry loops on poison messages
Design consumers to be idempotent from day one so you can safely use DeliverAll or DeliverByStartSequence to replay events during incident recovery or new service bootstrap without side effects
Set stream max_age and max_bytes limits in every stream definition—unbounded streams will silently fill persistent volumes until your cluster fails under load