Hero image for NATS JetStream in Production: Durable Event Streaming Without the Kafka Overhead

NATS JetStream in Production: Durable Event Streaming Without the Kafka Overhead


Your Kafka cluster costs more to operate than the services it connects. That’s not a hot take—it’s a pattern that repeats across engineering organizations of every size. ZooKeeper coordination overhead, broker replication tuning, partition count decisions you’ll regret six months later, consumer group lag monitoring, schema registry management. Before your first message flows in production, you’ve already committed to a distributed systems problem that runs parallel to your actual product.

The operational surface area is the real tax. Senior engineers who should be designing service boundaries instead spend cycles babysitting retention policies and rebalancing partitions. Teams that started with Kafka because it’s “industry standard” find themselves maintaining a second full-time infrastructure concern—one that demands deep expertise to run well and punishes you immediately when that expertise walks out the door.

NATS JetStream is a different bet. It’s not a Kafka replacement in the sense of feature parity—it’s a different philosophy about what a message broker should cost you operationally. Persistent streams, durable consumers, at-least-once delivery guarantees, push and pull consumption models: JetStream covers the semantics that matter for production microservices without dragging in the coordination complexity that makes Kafka expensive to own. A single NATS server binary, no external dependencies, cluster membership handled natively.

The tradeoffs are real, and the mental model is different enough that lifting Kafka patterns directly into NATS will cause you problems. Before looking at how a migration actually works in practice, it’s worth being precise about what NATS JetStream is—and equally important, what the core NATS layer beneath it is not.

Core NATS vs JetStream: What You Actually Get

Before committing to any architectural decision, you need a precise mental model of what NATS actually is—because the ecosystem has three distinct layers that engineers routinely conflate, often with painful consequences mid-migration.

Core NATS: Fast, Ephemeral, Intentionally Simple

Core NATS is a fire-and-forget pub/sub system. When a publisher sends a message to a subject, NATS delivers it to any active subscribers and discards it immediately. No persistence. No acknowledgment. No replay. If a subscriber is offline when a message arrives, that message is gone.

Visual: core NATS ephemeral pub/sub vs JetStream persistent stream architecture

This is not a limitation to work around—it is the design. Core NATS is optimized for low-latency, high-throughput scenarios where you control subscriber uptime and can tolerate message loss: service discovery, health checks, live telemetry fans, internal RPC calls. At sub-millisecond latency with millions of messages per second on modest hardware, it outperforms systems carrying far more operational weight.

The subject-based routing model is where NATS diverges most sharply from Kafka. Subjects are hierarchical strings (orders.us-east.created, payments.refunds.>) with wildcard matching built into the protocol. This enables expressive, fine-grained routing without consumer-side filtering logic. The tradeoff versus Kafka’s topic/partition model is real: subjects have no ordering guarantees across multiple publishers, and there is no native concept of partition-level parallelism in core NATS.

JetStream: Persistence and Delivery Guarantees Without a Separate System

JetStream is NATS’s persistence layer, built directly into the NATS server binary—no separate process, no ZooKeeper, no broker coordination cluster. Enabling it is a single server configuration flag.

JetStream introduces two primitives that change the delivery contract entirely:

Streams are persistent, append-only logs bound to one or more subject patterns. A stream named ORDERS capturing orders.> stores every matching message with configurable retention policies—limits by message count, total bytes, or age.

Consumers are stateful subscription views into a stream. They track delivery state per subscriber, support acknowledgment modes (explicit, implicit, none), and enable message replay from any point in the stream history. Multiple consumers reading the same stream do so independently, each maintaining its own cursor.

This combination gives you at-least-once delivery with acknowledgment, consumer group semantics via queue groups, and replay without re-publishing—the core capabilities that push teams toward Kafka in the first place.

NATS Streaming (Stan) Is Dead

NATS Streaming, commonly called Stan, was the previous attempt at durable messaging on top of core NATS. It ran as a separate process, had known limitations with cluster failover, and has been officially superseded. If you encounter Stan in existing codebases or tutorials, treat it as legacy. JetStream is the production path forward and has been the recommended approach since NATS Server 2.2.

With the ecosystem boundaries clear, the next step is translating these primitives into stream and consumer configurations that match real workload patterns—retention policies, consumer delivery modes, and the decisions that determine whether your consumers keep up or fall behind under load.

Modeling Streams and Consumers for Real Workloads

Before writing a single line of client code, the decisions you make about stream configuration and consumer topology determine whether your system delivers the semantics you need—or silently drops messages under load.

Streams: Persistent Logs with Configurable Retention

A NATS JetStream stream is a durable log that captures messages published to one or more subjects. Unlike core NATS, which is fire-and-forget, a stream persists messages across server restarts and makes them available for replay. Three retention policies cover the vast majority of production use cases:

Visual: JetStream stream retention policies and consumer topology patterns

  • Limits retains messages up to a configured threshold (count, byte size, or age). Use this for event logs where you want a rolling window of history.
  • Interest retains messages only as long as at least one consumer exists. This maps naturally to event-driven workflows where you own the full consumer lifecycle.
  • WorkQueue removes a message from the stream once any consumer acknowledges it. This is the JetStream equivalent of a traditional message queue and guarantees each message is processed exactly once across a consumer group.

Choosing the wrong retention policy is the most common misconfiguration in early JetStream deployments. If you want competing consumers without losing messages after acknowledgment, WorkQueue is what you need—not Limits with manual cleanup.

Durable Consumers: Surviving Restarts

A consumer is JetStream’s unit of subscription state. Ephemeral consumers exist only for the lifetime of the connection; durable consumers persist their acknowledgment cursor on the server, so a restarting service resumes exactly where it left off. Each durable consumer maintains independent progress through the stream, which means two services consuming the same stream advance at their own pace without interfering with each other.

Push vs. Pull: Matching Delivery to Processing Semantics

Push consumers deliver messages to a bound subject as fast as the stream allows—ideal for low-latency fanout scenarios where downstream services can keep up. Pull consumers require the client to explicitly request batches of messages, giving you precise control over processing rate and backpressure. For any workload where processing time is variable or downstream resources are constrained, pull consumers eliminate the backpressure management you would otherwise have to build yourself.

💡 Pro Tip: Queue groups apply to push consumers and distribute delivery across a group of subscribers sharing the same durable name. This is JetStream’s horizontal scaling primitive—add consumer instances without changing stream or subject configuration.

With your stream topology and consumer model defined at the infrastructure level, the next step is translating these semantics into working Node.js code using the nats.js client library.

Publishing and Subscribing with nats.js

The official nats.js client exposes JetStream through a clean async/await API that maps directly onto the stream and consumer model you define at the infrastructure layer. The examples below assume a ORDERS stream already exists, accepting subjects matching orders.>.

Connecting and Acquiring a JetStream Context

Install the client and codec libraries:

terminal
npm install nats

Establishing a connection is straightforward. The JetStream context is a thin wrapper around the core connection that routes publishes through the persistence layer and enables consumer operations:

client.js
import { connect, StringCodec } from "nats";
const nc = await connect({
servers: ["nats://nats-1.internal:4222", "nats://nats-2.internal:4222"],
reconnect: true,
maxReconnectAttempts: -1,
reconnectTimeWait: 1000,
});
const js = nc.jetstream();
const sc = StringCodec();

Pass multiple server addresses to give the client failover endpoints across your cluster. Setting maxReconnectAttempts: -1 keeps the client trying indefinitely—appropriate for long-running services where a temporary NATS unavailability should not terminate the process.

Publishing with Acknowledgment

A JetStream publish differs fundamentally from a core NATS publish: the server returns a PubAck that confirms the message was persisted to the stream before your code continues. This is your durability guarantee.

producer.js
async function publishOrder(order) {
const payload = sc.encode(JSON.stringify(order));
const ack = await js.publish("orders.created", payload, {
msgID: `order-${order.id}`, // idempotency key
timeout: 5000,
});
console.log(`Persisted to stream ${ack.stream}, seq ${ack.seq}`);
return ack.seq;
}
await publishOrder({ id: "ord-8821", customerId: "cust-4410", total: 149.99 });

The msgID option enables server-side deduplication within the stream’s duplicate window (configurable at stream creation, defaulting to two minutes). Publishing the same msgID twice within that window returns the original PubAck without storing a duplicate—critical for retry logic on network errors.

💡 Pro Tip: If the publish call throws a timeout error, the message may or may not have been persisted. Always retry with the same msgID rather than generating a new one. The server’s idempotency window makes this safe.

Creating a Durable Pull Consumer and Processing Messages

Pull consumers give you explicit flow control: your service requests batches of messages at a rate it can handle rather than receiving an unbounded push. Combine this with explicit acknowledgment and you have at-least-once delivery with backpressure.

consumer.js
const consumerInfo = await js.consumers.get("ORDERS", "orders-processor");
async function processMessages() {
const messages = await consumerInfo.fetch({ max_messages: 10, expires: 5000 });
for await (const msg of messages) {
try {
const order = JSON.parse(sc.decode(msg.data));
await handleOrder(order);
msg.ack();
} catch (err) {
console.error(`Processing failed for seq ${msg.seq}:`, err.message);
// Negative-acknowledge with a redelivery delay
msg.nak(30000);
}
}
}
// Poll continuously
while (true) {
await processMessages();
}

msg.ack() removes the message from the consumer’s pending set. msg.nak(30000) signals the server to redeliver after 30 seconds, giving downstream dependencies time to recover without tight-loop retries hammering a failing service.

Handling Redelivery and Poison Messages

Unbounded redelivery loops are a production hazard. Set MaxDeliver when creating the consumer to cap retry attempts, and route exhausted messages to a dead-letter subject for inspection:

consumer-config.js
import { AckPolicy, DeliverPolicy } from "nats";
await jsm.consumers.add("ORDERS", {
durable_name: "orders-processor",
ack_policy: AckPolicy.Explicit,
deliver_policy: DeliverPolicy.All,
max_deliver: 5,
ack_wait: 30_000_000_000, // nanoseconds: 30 seconds
filter_subject: "orders.created",
dead_letter: "orders.deadletter",
});

After five failed delivery attempts the server publishes the message to orders.deadletter, where a separate consumer can log it, alert on-call, or park it in object storage for forensic review. ack_wait defines how long the server holds the message in an unacknowledged state before treating it as a redelivery candidate—align this with the realistic upper bound of your processing time plus safe margin.

💡 Pro Tip: Subscribe a lightweight consumer to orders.deadletter that writes directly to a structured log with the original message headers intact. JetStream preserves headers through redelivery, so you get the original Nats-Msg-Id, publish timestamp, and any custom metadata without additional instrumentation.

With reliable publish acknowledgment and explicit consumer control in place, the next step is lifting this pattern into a NestJS service using the framework’s built-in transport abstraction—eliminating the polling loop and wiring JetStream into NestJS’s decorator-driven message handler model.

Microservices Integration: NestJS Transport Layer

NestJS ships with a built-in NATS transport that lets you wire up microservices in minutes. It handles connection lifecycle, serialization, and pattern routing cleanly. What it does not do is touch JetStream. That distinction is the source of most durability bugs in NestJS-based event-driven systems.

What the Built-in Transport Actually Gives You

The @nestjs/microservices NATS transport operates over core NATS pub/sub. Messages are fire-and-forget at the broker level: if your consumer is offline when a message arrives, it is gone. For internal request-reply flows between services that are always running, this is acceptable. For anything requiring at-least-once delivery, replay, or consumer acknowledgment, you need to layer JetStream on top manually.

The built-in transport is configured at bootstrap time. The queue option places all instances of a service into a NATS queue group, which load-balances messages across replicas automatically—useful behavior, but still no durability:

main.ts
import { NestFactory } from '@nestjs/core';
import { MicroserviceOptions, Transport } from '@nestjs/microservices';
import { AppModule } from './app.module';
async function bootstrap() {
const app = await NestFactory.createMicroservice<MicroserviceOptions>(
AppModule,
{
transport: Transport.NATS,
options: {
servers: ['nats://nats.internal:4222'],
queue: 'orders-service',
},
},
);
await app.listen();
}
bootstrap();

Handlers use @MessagePattern for request-reply and @EventPattern for one-way events. The difference matters operationally: @MessagePattern blocks the caller until a response is returned, while @EventPattern is fully decoupled and the broker makes no delivery guarantees:

orders.controller.ts
import { Controller } from '@nestjs/common';
import { MessagePattern, EventPattern, Payload } from '@nestjs/microservices';
@Controller()
export class OrdersController {
@MessagePattern('orders.create')
async createOrder(@Payload() data: CreateOrderDto) {
// Returns a response to the caller—synchronous RPC semantics
return this.ordersService.create(data);
}
@EventPattern('inventory.updated')
async onInventoryUpdated(@Payload() data: InventoryEvent) {
// Fire-and-forget from the broker's perspective; no ack, no replay
await this.ordersService.syncInventory(data);
}
}

These handlers work well for lightweight coordination between services that maintain high availability. The moment you need guaranteed delivery after a service restart, consumer lag visibility, or controlled replay, switch to the JetStream integration described below.

Layering JetStream onto a NestJS Service

The correct pattern is to run the NestJS NATS transport for synchronous RPC while initializing a separate JetStream client for durable event consumption. Both share the same underlying NATS server, but they use independent connections—this is intentional. Reusing the transport’s internal connection is not a supported pattern and couples lifecycle management in ways that are difficult to reason about under failure.

Implement JetStream consumption as an injectable service that hooks into NestJS module lifecycle events:

jetstream.service.ts
import { Injectable, OnModuleInit, OnModuleDestroy } from '@nestjs/common';
import { connect, JetStreamClient, NatsConnection, StringCodec } from 'nats';
@Injectable()
export class JetStreamService implements OnModuleInit, OnModuleDestroy {
private nc: NatsConnection;
private js: JetStreamClient;
private readonly sc = StringCodec();
async onModuleInit() {
this.nc = await connect({ servers: 'nats://nats.internal:4222' });
this.js = this.nc.jetstream();
const sub = await this.js.subscribe('orders.placed', {
config: {
durable_name: 'orders-service-consumer',
ack_policy: 'explicit',
deliver_policy: 'all',
max_deliver: 5,
},
});
(async () => {
for await (const msg of sub) {
try {
const payload = JSON.parse(this.sc.decode(msg.data));
await this.ordersService.processPlaced(payload);
msg.ack();
} catch (err) {
msg.nak(); // Triggers redelivery immediately
}
}
})();
}
async publish(subject: string, data: unknown) {
const encoded = this.sc.encode(JSON.stringify(data));
await this.js.publish(subject, encoded);
}
async onModuleDestroy() {
await this.nc.drain();
}
}

Note: Call msg.nak() explicitly rather than letting the ack timeout expire. Explicit negative acknowledgment triggers immediate redelivery up to your max_deliver limit, rather than waiting for the full ack_wait duration. In high-throughput pipelines, this distinction routinely saves several seconds per failed message and keeps consumer lag from accumulating during transient downstream errors.

Register JetStreamService as a provider in your module alongside the standard microservice bootstrap. The NestJS transport handles @MessagePattern routing; JetStreamService owns durable consumption independently. Neither is aware of the other, which is the point—failure in one path does not cascade into the other.

Why Two Clients and Not One

A common reflex is to consolidate: subscribe to JetStream subjects using @EventPattern and avoid the separate service entirely. This does not work. The NestJS NATS transport has no mechanism for issuing ack() or nak() calls on JetStream messages—its abstraction layer sits below that concern. Attempting to intercept raw message metadata through the NestJS handler context is fragile and unsupported across minor version bumps.

The dual-client architecture keeps request-reply latency low while giving durable subscribers the delivery guarantees they need. The next section picks up from here, covering how to tune replay behavior, enforce message ordering across consumer groups, and implement dead-letter handling when max_deliver is exhausted.

Message Replay, Ordering, and Failure Recovery

JetStream’s replay capabilities are where it earns its keep over basic pub/sub. Unlike Kafka’s offset management—which requires careful coordination between consumer groups and partition assignments—JetStream exposes replay through consumer configuration, with no separate offset tracking infrastructure required.

Replaying from Sequence or Time

When a service restarts after a crash or deploys for the first time, it often needs to reprocess historical events. JetStream consumers accept a DeliverPolicy that controls the starting point:

consumer-replay.js
import { connect, AckPolicy, DeliverPolicy } from "nats";
import { StringCodec } from "nats";
const nc = await connect({ servers: "nats://nats.internal:4222" });
const js = nc.jetstream();
const sc = StringCodec();
// Bootstrap from a known timestamp (e.g. last successful checkpoint)
const consumer = await js.consumers.get("ORDERS", "order-processor");
// Or create with time-based replay for audit recovery
const jsm = await nc.jetstreamManager();
await jsm.consumers.add("ORDERS", {
durable_name: "audit-replay",
ack_policy: AckPolicy.Explicit,
deliver_policy: DeliverPolicy.StartTime,
opt_start_time: new Date("2025-01-15T00:00:00Z").toISOString(),
filter_subject: "orders.>",
});

For sequence-based replay, swap DeliverPolicy.StartTime for DeliverPolicy.StartSequence and provide opt_start_seq. This is the canonical pattern for service bootstrapping: store the last processed sequence number in your state store, then resume from that point on restart rather than reprocessing the entire stream. Time-based replay serves a different use case—audit trails, compliance investigations, and forensic reconstruction of system state across a specific window—where a wall-clock boundary matters more than a logical sequence position.

Pro Tip: Pair sequence-based replay with your database transaction commits. Write the NATS sequence number into the same transaction as your domain state. On restart, read that sequence number and use it as your opt_start_seq. This gives you a recovery point that’s always consistent with your application state, with no external coordination service required.

Ordered Consumers for Strict Sequencing

When processing order matters—event sourcing, ledger updates, configuration propagation—ordered consumers eliminate the coordination overhead of distributed locking:

ordered-consumer.js
const ordered = await js.consumers.get("ORDERS", {
ordered: true,
filter_subject: "orders.customer.cust-88291",
});
const messages = await ordered.consume();
for await (const msg of messages) {
await applyEvent(sc.decode(msg.data));
msg.ack();
}

Ordered consumers are push-based, single-subscriber, and automatically redelivered in sequence if a gap is detected. The server tracks delivery position and resets to the last acknowledged sequence on reconnect, so the consumer never observes an out-of-order message. They trade throughput for correctness—the right tool for per-entity event streams, not high-volume fan-out.

Idempotent Handlers and Flow Control

At-least-once delivery is a guarantee, not a bug. Any message can be redelivered after a consumer restart or ack timeout. Every handler must be safe to run twice:

idempotent-handler.js
async function handleOrderCreated(msg) {
const event = JSON.parse(StringCodec().decode(msg.data));
const alreadyProcessed = await db.query(
"SELECT 1 FROM processed_events WHERE nats_seq = $1",
[msg.seq]
);
if (alreadyProcessed.rowCount > 0) {
msg.ack();
return;
}
await db.transaction(async (tx) => {
await tx.query("INSERT INTO orders ...", [event]);
await tx.query(
"INSERT INTO processed_events (nats_seq) VALUES ($1)",
[msg.seq]
);
});
msg.ack();
}

Using msg.seq as the idempotency key is preferable to a business-level identifier because it’s always present, requires no domain knowledge, and is guaranteed unique within a stream. The deduplication record can be pruned on a retention schedule once your redelivery window has passed.

For push consumers receiving high-throughput streams, set flow_control: true and configure idle_heartbeat in your consumer definition. Flow control signals the server to back off when the subscriber’s processing falls behind delivery, preventing unbounded in-memory message accumulation under sustained load spikes. Without it, a slow consumer on a fast stream will eventually exhaust heap space—a failure mode that typically surfaces under production load rather than during testing.

With replay, ordering, and idempotency addressed at the application layer, the next challenge is operational: running NATS reliably across multiple nodes and maintaining visibility into cluster health. The Kubernetes deployment patterns in the next section cover exactly that.

Deploying NATS on Kubernetes: Clustering and Observability

Running NATS in production means running it clustered. A single-node deployment is fine for local development, but any serious workload requires a three-node minimum to survive node failures without losing JetStream state. The official NATS Helm chart handles the StatefulSet configuration, PVC provisioning, and cluster routing, so start there rather than hand-rolling manifests.

Helm Deployment

Add the NATS chart repository and install a production-ready cluster with JetStream enabled:

install-nats.sh
helm repo add nats https://nats-io.github.io/k8s/helm/charts
helm repo update
helm install nats nats/nats \
--namespace messaging \
--create-namespace \
--values nats-values.yaml

The values file does the real work. Below is a production baseline covering clustering, JetStream file storage, resource limits, and anti-affinity scheduling:

nats-values.yaml
config:
cluster:
enabled: true
replicas: 3
jetstream:
enabled: true
fileStore:
enabled: true
dir: /data
maxSize: 20Gi
storageDirectory: /data/jetstream
memStore:
enabled: true
maxSize: 2Gi
container:
image:
tag: 2.10.14-alpine
merge:
resources:
requests:
cpu: 250m
memory: 512Mi
limits:
cpu: 1000m
memory: 1Gi
natsBox:
enabled: true
promExporter:
enabled: true
port: 7777
podTemplate:
merge:
spec:
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchLabels:
app.kubernetes.io/name: nats
topologyKey: kubernetes.io/hostname
volumeClaimTemplates:
- metadata:
name: nats-js
spec:
accessModes: ["ReadWriteOnce"]
storageClassName: gp3
resources:
requests:
storage: 20Gi

The podAntiAffinity rule is non-negotiable in production—without it, Kubernetes will happily schedule all three pods onto the same node and your cluster tolerates zero node failures. The requiredDuringSchedulingIgnoredDuringExecution policy enforces hard placement constraints rather than soft preferences, which is the correct choice when cluster quorum depends on physical node separation.

Storage Backend Selection

File storage is the correct default for JetStream in production. It persists stream data across pod restarts via the PVC, giving you durable replay semantics even after a rolling update or unplanned eviction. Memory storage offers lower latency but loses all data when the pod terminates. Reserve memory storage for ephemeral streams where replayability is not required—short-lived notification fans, metric aggregation windows, or canary pipelines where data loss is acceptable.

Note: Set storageClassName to your cloud provider’s SSD-backed class (gp3 on AWS, pd-ssd on GCP). Spinning disk causes write amplification under JetStream’s write-ahead log pattern and degrades throughput significantly under sustained load. Benchmark your storage class with fio before committing to a configuration in production.

Observability

The Helm chart’s promExporter sidecar—backed by the NATS Surveyor—exposes JetStream metrics at :7777/metrics. Surveyor polls the NATS server’s internal monitoring HTTP endpoint and translates the JSON responses into Prometheus-compatible gauges and counters. A minimal ServiceMonitor wires it into your Prometheus stack:

servicemonitor.yaml
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: nats
namespace: messaging
spec:
selector:
matchLabels:
app.kubernetes.io/name: nats
endpoints:
- port: prom-metrics
interval: 15s
path: /metrics

Key metrics to alert on: nats_jetstream_streams_total, nats_jetstream_consumers_total, nats_jetstream_messages_bytes, and nats_server_slow_consumer_count. A rising slow consumer count is the earliest signal of consumer lag before your stream hits its MaxBytes limit and starts dropping messages. Pair it with an alert on nats_jetstream_api_errors_total to catch misconfigured consumer durable names and ACL denials before they surface as silent data loss.

Operational Footprint vs Kafka

A three-node NATS cluster with JetStream runs comfortably under 1.5 GB of total memory. Kafka’s equivalent—three brokers plus a three-node ZooKeeper ensemble (or KRaft quorum)—typically demands 12–20 GB before you account for heap tuning and page cache. NATS has no separate coordination plane, no topic partition leader election delays, and no need for MirrorMaker to replicate across clusters. Cluster membership uses the NATS route protocol directly; there is no external consensus system to operate or upgrade independently. The failure surface is smaller by an order of magnitude, which translates directly into reduced on-call burden for teams running lean infrastructure.

With the cluster running and metrics flowing into Prometheus, the next logical step is validating consumer behavior end-to-end—which means revisiting the replay and acknowledgment patterns covered earlier with real traffic against your production stream configuration.

Key Takeaways

  • Use JetStream (not core NATS or legacy Stan) for any workload requiring durability—configure durable named consumers and explicit ack to guarantee at-least-once delivery
  • Design your stream retention policy (limits, interest, or work-queue) before writing client code; changing it later requires stream recreation
  • Make every message handler idempotent from day one—JetStream’s at-least-once guarantee means redelivery is not an edge case but a routine operational event
  • Start with pull consumers in production; push consumers require careful flow control configuration to avoid overwhelming slow subscribers at scale