Production-grade multi-node messaging for SEA cells: NATS JetStream clustering for high availability, optional inter-cell federation via NATS leafnodes and stream mirroring, plus cluster-aware workers and observability. This delivers self-healing, optimized, and seamless integration with sea-cell and existing outbox/inbox patterns.
infra/docker/docker-compose.dev.yml.sea-mq-worker connects to a single NATS_URL and creates streams/consumers at runtime.sea-mq-worker that supports multiple server endpoints, durable consumer groups, and resilient reconnect behavior.Goal: Introduce a stable, replicated NATS cluster with deterministic configuration for both dev and prod.
Deliverables
infra/nats/.Plan
infra/nats/ with:
jetstream { store_dir, max_file_store, max_mem_store }cluster { name, port, routes }server_name and advertise for stable peer discoveryhttp_port and healthz enabled for probesinfra/docker/docker-compose.dev.yml with a 3-node NATS cluster profile:
nats-1, nats-2, nats-3 servicesnum_replicas=3 for SEA_EVENTS/VIBESPRO_EVENTSack_wait and max_ack_pending aligned with PRD-022Validation
jsz shows replication.SEA_EVENTS, confirm stream state persists across node restarts.Goal: First-class NATS cluster in sea-cell deployments, self-healing and storage-safe.
Deliverables
Plan
deploy/pulumi/components/sea-cell.ts to add:
nats with replicas=3http://:8222/healthznats for client connectionsnats-headless for cluster routesValidation
sea-mq-workerGoal: Worker is resilient across NATS cluster nodes and scales horizontally without duplicate processing.
Deliverables
Plan
NATS_SERVERS (comma-separated list) with fallback to NATS_URLNATS_CONNECT_TIMEOUT_MS, NATS_RECONNECT_MAX, NATS_RECONNECT_DELAY_MSJETSTREAM_DOMAIN for federation-aware routingSTREAM_REPLICAS and CONSUMER_MAX_ACK_PENDINGCONSUMER_DURABLE and CONSUMER_GROUP to allow multiple workers to share a pull consumerasync_nats::ConnectOptions for:
retry_on_initial_connectCONSUMER_DURABLEack_wait and max_deliver with PRD-022Validation
Goal: Allow controlled cross-cell messaging without coupling clusters into a single failure domain.
Deliverables
Plan
*.event.> and *.dlq.> as required)Validation
Goal: Production-grade monitoring and automated recovery verification.
Deliverables
sea-mq-worker.Plan
varz, jsz, and connz metricsValidation
just ci-quick equivalent messaging checks.num_replicas=3 for primary event streams; num_replicas=2 acceptable only for non-critical streams.sea-mq-worker supports multi-endpoint NATS connectivity and shared durable consumers.infra/nats/ with JetStream + cluster routes.nats-1/2/3, dedicated volumes, and clustered routes.4222 (client) and 8222 (monitoring) per node for local validation.http://:8222/healthz.NATS_SERVERS in all services and workers (comma-separated endpoints).JETSTREAM_DOMAIN and stream/consumer names in all modes.varz/jsz/connz on :8222.nats-1 and verify publish/consume continues.Start cluster
1
docker compose -f infra/docker/docker-compose.dev.yml --profile nats-ha up -d
Expected
curl http://localhost:8222/healthz returns {"status":"ok"}.Verify replication
1
curl -s http://localhost:8222/jsz | jq '.meta.cluster'
Expected
leader present, replicas ≥ 2.Failover check
1
docker stop nats-1
Expected
jsz shows new leader.Check pods and services
1
2
kubectl -n <namespace> get pods -l app=nats
kubectl -n <namespace> get svc nats nats-headless
Expected
Running.Health and replication
1
2
kubectl -n <namespace> port-forward svc/nats 8222:8222
curl -s http://localhost:8222/jsz | jq '.meta.cluster'
Expected
Failover check
1
kubectl -n <namespace> delete pod nats-0
Expected
Connectivity check
1
curl -s http://localhost:8222/connz | jq '.connections[] | select(.name | test("leafnode"))'
Expected
Partition + resync
1
# Simulate link loss by blocking leafnode port or stopping the remote NATS
Expected
Recommendation
nats-ha compose profile with three services: nats-1, nats-2, nats-3.Layout
infra/nats/nats.conf (shared across all nodes)nats-1: client 4222, monitor 8222, routes 6222nats-2: client 4223, monitor 8223, routes 6223nats-3: client 4224, monitor 8224, routes 6224sea_nats_data_1, sea_nats_data_2, sea_nats_data_3Why this layout
Client endpoint strategy
NATS_SERVERS=nats://localhost:4222,nats://localhost:4223,nats://localhost:42241
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
services:
nats-1:
image: nats:2.10-alpine
container_name: sea-nats-1
command: ["-c", "/etc/nats/nats.conf"]
environment:
NATS_SERVER_NAME: nats-1
ports:
- "4222:4222"
- "8222:8222"
- "6222:6222"
volumes:
- ./infra/nats/nats.conf:/etc/nats/nats.conf:ro
- sea_nats_data_1:/data
networks: [sea-net]
profiles: ["nats-ha"]
nats-2:
image: nats:2.10-alpine
container_name: sea-nats-2
command: ["-c", "/etc/nats/nats.conf"]
environment:
NATS_SERVER_NAME: nats-2
ports:
- "4223:4222"
- "8223:8222"
- "6223:6222"
volumes:
- ./infra/nats/nats.conf:/etc/nats/nats.conf:ro
- sea_nats_data_2:/data
networks: [sea-net]
profiles: ["nats-ha"]
nats-3:
image: nats:2.10-alpine
container_name: sea-nats-3
command: ["-c", "/etc/nats/nats.conf"]
environment:
NATS_SERVER_NAME: nats-3
ports:
- "4224:4222"
- "8224:8222"
- "6224:6222"
volumes:
- ./infra/nats/nats.conf:/etc/nats/nats.conf:ro
- sea_nats_data_3:/data
networks: [sea-net]
profiles: ["nats-ha"]
networks:
sea-net:
name: sea-net
volumes:
sea_nats_data_1:
sea_nats_data_2:
sea_nats_data_3:
1
2
3
4
5
6
7
8
9
10
11
12
13
server_name: ${NATS_SERVER_NAME}
port: 4222
http: 8222
jetstream {
store_dir: /data
}
cluster {
name: sea-nats
port: 6222
routes: [
${NATS_CLUSTER_ROUTES}
]
}
Environment Variables for Routes:
nats://nats-1:6222,nats://nats-2:6222,nats://nats-3:6222nats://nats-0.nats-headless:6222,nats://nats-1.nats-headless:6222,nats://nats-2.nats-headless:6222
```1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
apiVersion: v1
kind: Service
metadata:
name: nats
spec:
selector:
app: nats
ports:
- name: client
port: 4222
targetPort: 4222
- name: monitor
port: 8222
targetPort: 8222
---
apiVersion: v1
kind: Service
metadata:
name: nats-headless
spec:
clusterIP: None
selector:
app: nats
ports:
- name: routes
port: 6222
targetPort: 6222
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: nats
spec:
serviceName: nats-headless
replicas: 3
selector:
matchLabels:
app: nats
template:
metadata:
labels:
app: nats
spec:
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: app
operator: In
values:
- nats
topologyKey: "kubernetes.io/hostname"
containers:
- name: nats
image: nats:2.10-alpine
command: ["nats-server", "-c", "/etc/nats/nats.conf"]
ports:
- containerPort: 4222
name: client
- containerPort: 8222
name: monitor
- containerPort: 6222
name: routes
volumeMounts:
- name: config
mountPath: /etc/nats
- name: data
mountPath: /data
readinessProbe:
httpGet:
path: /healthz
port: 8222
livenessProbe:
httpGet:
path: /healthz
port: 8222
volumes:
- name: config
configMap:
name: nats-config
volumeClaimTemplates:
- metadata:
name: data
spec:
accessModes: ["ReadWriteOnce"]
resources:
requests:
storage: 10Gi
---
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
name: nats-pdb
spec:
minAvailable: 2
selector:
matchLabels:
app: nats
SEA_NATS_PROFILE=nats-ha for local HA validation.NATS_SERVERS for multi-endpoint clients:
nats://localhost:4222,nats://localhost:4223,nats://localhost:4224.secrets.env.sops./etc/nats/secrets and includes them in nats.conf.Required keys
NATS_AUTH_TOKENNATS_TLS_CERTNATS_TLS_KEYNATS_TLS_CA (optional)