Hero image for NATS JetStream in Production: Persistent Messaging Without the Kafka Tax

NATS JetStream in Production: Persistent Messaging Without the Kafka Tax


The alert fires at 2 AM. A payment service processed a webhook, published the confirmation event, and moved on—but the downstream notification service was mid-restart. The message is gone. No dead letter queue, no replay, no recovery. Your users never got their receipt.

This is the failure mode Core NATS was never designed to solve. It’s a pub/sub system optimized for speed and simplicity: a message arrives, subscribers receive it, done. If no subscriber is listening at that exact moment, the message vanishes. For internal fan-out, service discovery, and request-reply patterns where latency matters more than durability, that tradeoff is perfectly acceptable. But as microservice architectures mature, the edge cases multiply—deployments, restarts, consumer lag, traffic spikes—and fire-and-forget stops being a feature.

The standard answer is Kafka. And Kafka works. But it arrives with a significant operational surface: brokers, ZooKeeper or KRaft, partition tuning, consumer group management, replication factors, retention policies. Teams that reach for Kafka to solve a durability problem often find themselves maintaining a distributed system that demands dedicated expertise to run safely in production. The tool becomes a project.

JetStream, the persistence layer built directly into NATS since version 2.2, closes that gap without forcing that tradeoff. It adds durable streams, consumer acknowledgments, message replay, and at-least-once delivery on top of the same NATS infrastructure you’re already running. No additional processes, no separate cluster, no new mental model for your networking layer.

Before committing to the Kafka migration your team is scoping, it’s worth understanding exactly where Core NATS breaks down under production load—and whether JetStream already covers the ground you need.

The Messaging Gap: When Fire-and-Forget Isn’t Enough

Core NATS is purpose-built for speed. It operates as a pure publish-subscribe broker with a straightforward contract: if a subscriber is listening when a message arrives, it gets delivered. If no subscriber is connected, the message is gone. This fire-and-forget model is not a bug—it is a deliberate architectural decision that lets NATS achieve sub-millisecond latencies and handle millions of messages per second with minimal resource consumption.

Visual: Core NATS fire-and-forget vs. JetStream durable delivery model

That simplicity is genuinely useful. Service-to-service RPC calls, real-time telemetry fan-out, and live event broadcasting all work perfectly well on Core NATS. The problem surfaces the moment your architecture needs something more: a consumer that restarts after a crash, a new service that needs to replay events from before it was deployed, or a processing pipeline where every message must be acknowledged before it is considered delivered.

The Standard Answer and Its Costs

The conventional response to these requirements is Kafka. And Kafka delivers—durable, ordered, replayable event streams at massive scale. But Kafka brings an operational footprint that many teams underestimate at adoption time. A production-grade Kafka cluster means managing ZooKeeper or KRaft consensus, tuning JVM heap sizes, sizing partition counts ahead of traffic growth, and operating a separate schema registry if you need structured data contracts. Dedicated platform engineers are not optional; they are load-bearing infrastructure. For teams running lean on Kubernetes, that overhead frequently exceeds the value being extracted from the broker.

JetStream Fills the Gap

NATS JetStream is the persistence layer built directly into the NATS server. It adds streams—durable, ordered sequences of messages stored on disk—and consumers, which are named, stateful views into those streams with configurable delivery semantics. The NATS server binary gains JetStream capability through a single configuration flag. No separate process, no external consensus system, no JVM.

Critically, JetStream does not replace Core NATS. Both run in the same server. Subjects that need persistence are backed by streams; subjects that do not stay ephemeral. Teams already running Core NATS in production add JetStream incrementally, routing only the subjects that require durability into streams while leaving low-latency fire-and-forget traffic untouched.

💡 Pro Tip: JetStream and Core NATS share the same subject namespace. A Core NATS publisher requires zero code changes to write to a JetStream-backed subject—the stream captures messages transparently.

The result is a messaging system that handles durable workloads without the Kafka tax. Understanding how JetStream structures that durability—through streams, consumers, and retention policies—is the next step before writing a single line of client code.

JetStream Architecture: Streams, Consumers, and Durability Primitives

JetStream extends Core NATS with three foundational abstractions: streams, consumers, and storage backends. Understanding how these fit together determines every configuration decision you make downstream.

Visual: JetStream streams, consumers, and storage backend architecture

Streams: Append-Only Logs with Subject Scope

A stream is an ordered, append-only log that captures messages published to one or more subject filters. When you define a stream, you specify which subjects it captures—orders.> catches every subject under that hierarchy, while payments.completed captures only that exact subject. Multiple streams can overlap on subjects, giving you flexibility to fan data into separate retention contexts.

Retention policy is configured per stream, not per message. The three modes are:

  • Limits-based: discard messages once the stream hits a size or message-count ceiling
  • Interest-based: retain messages only while at least one consumer is active
  • Workqueue: delete messages after they have been acknowledged by any single consumer

This makes the stream definition your primary durability contract. Get the retention policy wrong here and no amount of consumer configuration fixes it.

Consumers: Pull, Push, Durable, and Ephemeral

Consumers are views into a stream. They track delivery progress independently, so ten consumers on the same stream each maintain their own position without interfering.

The push/pull distinction matters for production deployments. Push consumers have NATS deliver messages to a subject the consumer subscribes to—low latency, but the broker controls the flow rate. Pull consumers let the application explicitly request batches, which gives you backpressure control and cleaner behavior under load spikes. For most service-to-service workloads, pull consumers are the right default.

Durable consumers persist their state on the server under a named identifier. If the client restarts, it reconnects to the same consumer and resumes from the last acknowledged message. Ephemeral consumers exist only while the connection is alive and are appropriate for transient read-through scenarios like live dashboards.

Acknowledgement policies complete the delivery guarantee picture. AckExplicit requires each message to be individually acknowledged—this is what you want for critical workloads. AckAll acknowledges all messages up to a given sequence in one call, trading granularity for throughput. AckNone disables acknowledgement entirely, equivalent to at-most-once delivery.

Storage Backends and Replication

JetStream supports file-based and in-memory storage. File storage survives process restarts and is the correct choice for any workload requiring durability. In-memory storage is faster but ephemeral—useful for caching patterns or replay buffers where loss is acceptable.

Replication factor is set per stream, not cluster-wide. A replication factor of 3 means the stream is replicated across three NATS server instances, tolerating one node failure without data loss. For a three-node cluster, R3 gives you full fault tolerance with minimal write overhead.

💡 Pro Tip: Set replication factor at stream creation. Changing it later requires stream recreation, and there is no in-place migration path.

With these abstractions mapped out, the next step is getting a JetStream-enabled NATS server running so you can verify this model against a live system.

Standing Up NATS with JetStream Enabled

JetStream ships inside the standard NATS server binary — there is no separate process to run or plugin to install. Enabling it requires a single configuration directive and a storage path. Everything else you already know about NATS stays the same.

Start the Server

Create a minimal server configuration file:

server.conf
jetstream {
store_dir: "/data/jetstream"
max_memory_store: 1GB
max_file_store: 10GB
}
port: 4222
http_port: 8222

Then run the server with Docker, mounting a local directory so stream data survives container restarts:

run-nats.sh
docker run -d \
--name nats-js \
-p 4222:4222 \
-p 8222:8222 \
-v $(pwd)/data:/data \
-v $(pwd)/server.conf:/etc/nats/server.conf \
nats:2.10-alpine \
-c /etc/nats/server.conf

Confirm JetStream is active by hitting the monitoring endpoint:

Terminal window
curl -s http://localhost:8222/jsz | jq '.config'

You see "enabled": true alongside the memory and file storage limits from your config.

Create a Stream and Durable Consumer

Install the NATS CLI and run the following commands. The stream captures every message published to orders.> — a subject wildcard that matches orders.placed, orders.shipped, and any future subject in that hierarchy.

setup-stream.sh
## Create a stream that retains messages for 24 hours
nats stream add ORDERS \
--subjects "orders.>" \
--storage file \
--retention limits \
--max-age 24h \
--replicas 1
## Create a durable pull consumer
nats consumer add ORDERS fulfillment-worker \
--pull \
--durable fulfillment-worker \
--deliver all \
--ack explicit \
--max-deliver 5

--deliver all tells JetStream to replay messages from the beginning of the stream when this consumer first connects. --ack explicit means JetStream holds each message until your consumer sends a positive acknowledgment — unacknowledged messages are redelivered up to the --max-deliver limit.

💡 Pro Tip: Name your durable consumer after the service that owns it (fulfillment-worker, not consumer-1). JetStream tracks per-consumer delivery state using this name, so it must be stable across deployments and restarts.

Verify Persistence

This three-step sequence confirms that messages survive a complete consumer disconnect:

verify-persistence.sh
## Step 1: Publish three messages while no consumer is running
nats pub orders.placed '{"order_id": "ORD-1001", "sku": "WIDGET-42"}'
nats pub orders.placed '{"order_id": "ORD-1002", "sku": "GADGET-7"}'
nats pub orders.shipped '{"order_id": "ORD-1001", "tracking": "1Z9999W99999999999"}'
## Step 2: Inspect the stream — messages are stored
nats stream info ORDERS
## Step 3: Reconnect the consumer and pull all pending messages
nats consumer next ORDERS fulfillment-worker --count 3

The stream info output shows Messages: 3 and Bytes: ~280 B. When you run consumer next, JetStream delivers all three messages in sequence, starting from the oldest. Acknowledge them and the consumer’s sequence position advances — reconnect tomorrow and you pick up exactly where you left off.

At this point you have a fully durable message store running locally with verified replay behavior. The next section moves from CLI interaction to production code, wiring the Go client library into a service that publishes order events and processes them with proper acknowledgment handling and error recovery.

Publishing and Consuming Messages with the Go Client

With your stream configured, the next step is wiring up producers and consumers that handle failures gracefully. The Go NATS client makes this straightforward, but the details of acknowledgment handling separate production-ready code from code that loses messages under load.

Connecting to the JetStream Context

JetStream functionality lives behind a separate context layered on top of a standard NATS connection. After establishing the connection, call JetStream() to get the handle you’ll use for all durable operations.

client.go
package main
import (
"log"
"time"
"github.com/nats-io/nats.go"
)
func main() {
nc, err := nats.Connect(
"nats://nats.internal.svc.cluster.local:4222",
nats.RetryOnFailedConnect(true),
nats.MaxReconnects(10),
nats.ReconnectWait(2*time.Second),
)
if err != nil {
log.Fatalf("failed to connect: %v", err)
}
defer nc.Drain()
js, err := nc.JetStream()
if err != nil {
log.Fatalf("failed to get JetStream context: %v", err)
}
_ = js // use js for publishing and consumer management
}

nats.Drain() instead of Close() ensures in-flight messages finish processing before the connection tears down—an important distinction during graceful shutdowns. Without it, messages acknowledged by the server but not yet processed locally can be silently lost when the process exits.

Publishing with Acknowledgment

Core NATS publishing is fire-and-forget. JetStream publishing is synchronous by default: js.Publish() blocks until the server confirms the message has been persisted to the stream, giving you a durable delivery guarantee before execution continues.

publisher.go
func publishOrder(js nats.JetStreamContext, orderID string, payload []byte) error {
ack, err := js.Publish("orders.created", payload,
nats.MsgId(orderID), // idempotency key—safe to retry
)
if err != nil {
return fmt.Errorf("publish failed: %w", err)
}
log.Printf("persisted to stream %s, seq %d", ack.Stream, ack.Sequence)
return nil
}

nats.MsgId() attaches a deduplication key. If you retry a publish with the same ID within the stream’s duplicate window (configurable, default 2 minutes), the server acknowledges without storing a duplicate. This makes retries safe at the publisher without leaking duplicate messages downstream.

The ack.Sequence value in the response is worth logging. It gives you a monotonically increasing position in the stream that you can use to correlate events, audit gaps, or resume processing after a failure without re-scanning the entire stream.

Note: Use js.PublishAsync() when throughput matters more than per-message latency. It returns a PubAckFuture you can check in batches, letting you pipeline dozens of publishes before waiting for confirmations—useful for high-volume ingestion paths where synchronous round-trips become the bottleneck.

Pull Consumers for Controlled Processing

Push consumers deliver messages as fast as the server can send them. Pull consumers let your application fetch exactly what it can handle—critical when downstream dependencies have variable latency or you’re processing CPU-intensive workloads.

consumer.go
func startConsumer(js nats.JetStreamContext) error {
sub, err := js.PullSubscribe(
"orders.created",
"order-processor", // durable consumer name
nats.BindStream("ORDERS"),
nats.AckExplicit(),
)
if err != nil {
return fmt.Errorf("subscribe failed: %w", err)
}
for {
msgs, err := sub.Fetch(10, nats.MaxWait(5*time.Second))
if err != nil && err != nats.ErrTimeout {
log.Printf("fetch error: %v", err)
continue
}
for _, msg := range msgs {
if err := processOrder(msg.Data); err != nil {
// signal failure—redelivery after AckWait expires
msg.Nak()
continue
}
msg.Ack()
}
}
}

The nats.AckExplicit() option is required for pull consumers and disables implicit acknowledgment. Without it, the client library would automatically ack messages on delivery, removing your ability to signal failures back to the server. The durable name "order-processor" persists consumer state—including the delivery cursor and pending ack set—across restarts, so a redeployed instance picks up exactly where the previous one left off.

Acknowledgment Signals

JetStream defines three signals your consumer sends back to the server:

SignalMethodEffect
Acknowledgemsg.Ack()Message processing complete, remove from pending
Negative acknowledgemsg.Nak()Processing failed, redeliver after backoff
In-progressmsg.InProgress()Reset the AckWait timer, still processing

msg.InProgress() is the one teams consistently miss. When a message takes longer than the consumer’s AckWait duration, the server assumes the consumer died and redelivers. For operations that legitimately take time—calling a slow third-party API, writing a large batch to a database—call InProgress() periodically to hold the lease without faking completion.

long_running.go
func processWithHeartbeat(msg *nats.Msg) {
ticker := time.NewTicker(15 * time.Second)
defer ticker.Stop()
done := make(chan struct{})
go func() {
for {
select {
case <-ticker.C:
msg.InProgress()
case <-done:
return
}
}
}()
err := runExpensiveOperation(msg.Data)
close(done)
if err != nil {
msg.Nak()
return
}
msg.Ack()
}

Set the heartbeat interval to roughly half your AckWait value to give yourself a comfortable safety margin against clock skew and network jitter. If AckWait is 30 seconds, a 15-second ticker ensures the lease is renewed well before expiry even if one tick fires slightly late.

With publishing and consumption patterns in place, the next consideration is horizontal scaling. A single consumer instance is a ceiling—queue groups and competing consumers let you distribute load across replicas without duplicating work, which is where the real throughput gains come from.

Queue Groups and Consumer Scaling Patterns

Scaling message consumers horizontally introduces a coordination problem: how do multiple instances share work without processing the same message twice? Core NATS solves this with queue groups—a lightweight mechanism where the server randomly selects one subscriber from a named group to receive each message. JetStream extends this concept with competing consumers, adding durability and acknowledgment semantics on top of the same horizontal scaling model.

Core NATS Queue Groups vs. JetStream Competing Consumers

In Core NATS, queue groups are implicit. Any subscriber that joins the same queue group name competes for messages, and the server load-balances across them. There is no state: if a subscriber crashes mid-processing, that message is gone.

JetStream competing consumers operate on a named, durable consumer bound to a stream. Multiple goroutines—or multiple pods—pull from the same consumer. The server tracks which messages have been delivered and acknowledged, so an unacknowledged message redelivers automatically after the AckWait timeout expires.

consumer_scaling.go
package main
import (
"context"
"fmt"
"log"
"time"
"github.com/nats-io/nats.go"
"github.com/nats-io/nats.go/jetstream"
)
func startWorker(ctx context.Context, js jetstream.JetStream, workerID int) {
cons, err := js.Consumer(ctx, "ORDERS", "order-processor")
if err != nil {
log.Fatalf("worker %d: failed to bind consumer: %v", workerID, err)
}
cc, err := cons.Consume(func(msg jetstream.Msg) {
log.Printf("worker %d processing: %s", workerID, msg.Subject())
// Simulate order processing
time.Sleep(50 * time.Millisecond)
if err := msg.Ack(); err != nil {
log.Printf("worker %d: ack failed: %v", workerID, err)
}
})
if err != nil {
log.Fatalf("worker %d: consume failed: %v", workerID, err)
}
defer cc.Stop()
<-ctx.Done()
}
func main() {
nc, _ := nats.Connect("nats://nats.prod-us-east-1.svc.cluster.local:4222")
defer nc.Drain()
js, _ := jetstream.New(nc)
ctx, cancel := context.WithCancel(context.Background())
defer cancel()
// Spin up 4 competing workers sharing one durable consumer
for i := 1; i <= 4; i++ {
go startWorker(ctx, js, i)
}
select {}
}

Each call to js.Consumer binds to the same durable consumer named order-processor. The JetStream server distributes in-flight messages across all active fetch operations, ensuring each message lands with exactly one worker at a time.

Avoiding Duplicate Processing

The most common scaling mistake is creating a new consumer per replica instead of binding all replicas to the same durable consumer. A unique consumer per pod means every pod receives every message—the opposite of work distribution.

💡 Pro Tip: Set MaxAckPending on the consumer to control the maximum number of unacknowledged messages in flight across all workers combined. A value of MaxAckPending: 1000 with four workers gives each worker roughly 250 messages in flight, preventing any single slow replica from stalling the entire pipeline.

When a pod restarts, JetStream redelivers any messages that worker held but never acknowledged. Set AckWait to match your realistic processing deadline—not an arbitrary large number. A 30-second AckWait on a job that completes in under two seconds means a crashing worker blocks redelivery for half a minute unnecessarily.

The combination of a single durable consumer, bounded MaxAckPending, and a realistic AckWait gives you horizontal scaling with exactly-once delivery semantics per message—without a distributed lock or an external coordination service.

With consumer scaling handled at the application layer, the next challenge is operating the NATS server itself reliably. The following section covers deploying a clustered, JetStream-enabled NATS topology on Kubernetes, including persistent volume configuration and rolling upgrades without message loss.

Deploying NATS JetStream on Kubernetes

Running NATS JetStream in production means treating it as stateful infrastructure—not a stateless deployment you can scale and replace at will. File-backed streams require persistent storage, cluster coordination requires stable network identities, and graceful shutdown requires careful pod lifecycle management. The NATS Helm chart handles most of this complexity, but you need to configure it deliberately.

Helm Chart Configuration

Add the NATS chart repository and deploy with a values file that explicitly enables JetStream and file storage:

install.sh
helm repo add nats https://nats-io.github.io/k8s/helm/charts/
helm repo update
helm install nats nats/nats -f nats-values.yaml -n messaging --create-namespace
nats-values.yaml
config:
cluster:
enabled: true
replicas: 3
name: nats-cluster
jetstream:
enabled: true
fileStore:
enabled: true
dir: /data
pvc:
enabled: true
size: 20Gi
storageClassName: gp3
container:
image:
tag: 2.10.14-alpine
natsBox:
enabled: true
promExporter:
enabled: true
port: 7777

This configuration provisions a three-node cluster with each node backed by a 20Gi persistent volume. The gp3 storage class on AWS delivers consistent IOPS without the burst credit model of gp2—critical for write-heavy stream workloads. Size PVCs conservatively: JetStream’s file store does not automatically reclaim space from deleted streams until compaction runs, so operational headroom matters more than the raw sum of your MaxBytes limits.

Clustering and Replication

A three-node cluster is the minimum for a functioning Raft quorum. NATS uses the Raft consensus algorithm for stream metadata and, with JetStream, for replication of stream data itself. With three nodes, the cluster tolerates one node failure. Five nodes tolerate two. For most production deployments, three nodes strike the right balance between fault tolerance and write amplification overhead.

The replication factor is set per-stream at creation time, not globally. A stream defined with Replicas: 3 writes each message to all three nodes before acknowledging the producer. Set this in your stream configuration:

stream_config.go
_, err = js.AddStream(&nats.StreamConfig{
Name: "ORDERS",
Subjects: []string{"orders.>"},
Replicas: 3,
Storage: nats.FileStorage,
Retention: nats.LimitsPolicy,
MaxAge: 72 * time.Hour,
MaxBytes: 10 * 1024 * 1024 * 1024, // 10GB
})

💡 Pro Tip: Set MaxBytes on every stream. Without it, a runaway producer fills your PVC and crashes the pod. JetStream enforces the limit by discarding old messages (with LimitsPolicy) or rejecting new ones (with InterestPolicy)—choose based on whether data loss or backpressure is acceptable for the workload.

Monitoring with Prometheus

The promExporter sidecar exposes JetStream metrics at :7777/metrics. The most operationally useful metrics are gnatsd_varz_jetstream_stats_memory, gnatsd_varz_jetstream_stats_storage, and gnatsd_consumer_num_pending—the last being your primary signal for consumer lag.

Alert on gnatsd_varz_jetstream_stats_storage approaching your PVC capacity, and track gnatsd_consumer_num_pending per consumer to distinguish slow consumers from upstream throughput spikes. Configure a ServiceMonitor if you’re running the Prometheus Operator, or add a scrape config pointing to port 7777 on the NATS pods directly.

Graceful Shutdown

NATS pods need a terminationGracePeriodSeconds long enough for in-flight messages to drain and Raft leadership to transfer. Sixty seconds is a practical baseline:

nats-values.yaml (partial)
statefulSet:
patch:
- op: add
path: /spec/template/spec/terminationGracePeriodSeconds
value: 60

Without this, a rolling update during peak traffic triggers redelivery storms as consumers reconnect and reprocess unacknowledged messages. Pair the grace period with a preStop hook if your workload has strict exactly-once requirements—the hook gives the server time to complete leadership handoff before Kubernetes sends SIGTERM.

With the cluster running and metrics flowing, the natural next question is architectural: where does NATS JetStream fit relative to Kafka, and what workload characteristics make one the better choice over the other?

When to Choose NATS JetStream Over Kafka

The honest answer to “NATS JetStream or Kafka?” is not a universal recommendation—it is a function of your workload profile, team size, and operational tolerance. Here is how to make that call with concrete criteria.

Throughput and Latency

NATS JetStream delivers sub-millisecond publish latency at moderate throughput, typically excelling in the range of tens of thousands to low hundreds of thousands of messages per second per node. Kafka’s architecture is purpose-built for sustained, sequential disk writes at extreme scale—multi-million message-per-second throughput across large partitioned topics is where Kafka genuinely leads.

If your service mesh consists of dozens of microservices exchanging domain events, command results, and audit records, JetStream handles that load with headroom to spare. If you are building a data pipeline ingesting clickstream events from tens of millions of users and feeding a real-time analytics warehouse, Kafka’s partitioned log model remains the stronger choice.

The latency story favors JetStream for request-reply patterns and low-fan-out scenarios. Kafka’s batching model introduces inherent latency floors that are difficult to eliminate without sacrificing throughput.

Operational Complexity

This is where the comparison sharpens. A production Kafka deployment involves the broker cluster, historically ZooKeeper (and now KRaft for newer versions), a schema registry if you enforce Avro or Protobuf contracts, a Kafka Connect layer for integrations, and often a separate consumer lag monitoring stack. Each layer adds operational surface area, upgrade coordination, and on-call burden.

NATS JetStream ships as a single binary. A three-node clustered JetStream deployment requires no external coordination service, no schema enforcement infrastructure, and no sidecar processes. Replication, leader election, and message acknowledgment are built into the core server. For teams without a dedicated platform engineering function, this difference is not marginal—it is the difference between owning a capability and being owned by it.

💡 Pro Tip: If your team cannot dedicate at least one engineer to ongoing Kafka operations and tuning, JetStream will likely deliver better production reliability over time, not because it is technically superior at scale, but because operational simplicity compounds.

Running Core NATS and JetStream Side by Side

NATS lets you run ephemeral Core NATS subjects and persistent JetStream streams on the same server cluster simultaneously. Migrate service by service: publish to a JetStream stream from new consumers while legacy services continue reading from Core NATS subjects unchanged. This makes incremental adoption safe and reversible without a flag-day cutover.

The subject namespace is shared, so a Core NATS publisher writing to orders.placed will automatically have its messages captured by a JetStream stream with a matching subject filter—zero changes required on the producer side. You can introduce durability to individual data flows without coordinating changes across multiple teams or service owners. Add a stream, verify replay behavior, and migrate consumers one at a time.

This incremental path is JetStream’s practical advantage over any Kafka migration: there is no big-bang cutover, no dual-run period where you operate two separate clusters, and no forced compatibility layer between your existing NATS infrastructure and the new durable messaging substrate. The upgrade is additive, not disruptive.

Key Takeaways

  • Enable JetStream on your existing NATS deployment with a single config flag—then create typed streams to add durability only where your architecture actually requires it
  • Use pull consumers with explicit ack policies for at-least-once processing guarantees, and size your replication factor to match your availability requirements
  • Deploy a minimum 3-node NATS cluster on Kubernetes using the official Helm chart before going to production—single-node JetStream has no fault tolerance
  • Benchmark NATS JetStream against your actual workload before migrating from Kafka; for low-to-medium throughput with simpler ops requirements, JetStream wins decisively