NATS HA + Secrets Runbook
Purpose
Operate the NATS JetStream HA cluster and manage production secrets with SOPS while keeping dev on env vars.
Scope
- Local HA validation (docker compose profile)
- K3s/mesh deployment checks
- Secrets management via SOPS
Local HA (Docker)
Start
1
| SEA_NATS_PROFILE=nats-ha just dev-up
|
Verify health
1
2
3
| curl http://localhost:8222/healthz
curl http://localhost:8223/healthz
curl http://localhost:8224/healthz
|
Verify JetStream replication
1
| curl -s http://localhost:8222/jsz | jq '.meta.cluster'
|
Failover check
Expected: publish/consume continues and a new leader is elected.
- Reset volumes:
SEA_NATS_PROFILE=nats-ha just dev-reset
- Verify
NATS_ROUTE_1..3 are set in compose for all nodes.
K3s/Mesh (Kubernetes)
Check pods and services
1
2
| kubectl -n <namespace> get pods -l app=nats
kubectl -n <namespace> get svc nats nats-headless
|
Routing env
NATS_ROUTE_1=nats://nats-0.nats-headless:6222
NATS_ROUTE_2=nats://nats-1.nats-headless:6222
NATS_ROUTE_3=nats://nats-2.nats-headless:6222
Health and replication
1
2
| kubectl -n <namespace> port-forward svc/nats 8222:8222
curl -s http://localhost:8222/jsz | jq '.meta.cluster'
|
Failover check
1
| kubectl -n <namespace> delete pod nats-0
|
Expected: new leader elected, streams remain available.
Production Secrets (SOPS)
Required keys
NATS_AUTH_TOKEN
NATS_TLS_CERT
NATS_TLS_KEY
NATS_TLS_CA (optional)
Manage secrets
Warning: Do not pass secrets as command arguments; use interactive mode or file redirection.
Interactive (Recommended):
1
2
| just sops-init
just sops-edit
|
File-based:
1
2
3
| just sops-add NATS_AUTH_TOKEN < token.txt
just sops-add NATS_TLS_CERT < cert.pem
just sops-add NATS_TLS_KEY < key.pem
|
Notes
- Pulumi decrypts
.secrets.env.sops and mounts secrets into /etc/nats/secrets.
nats.conf includes nats-auth.conf only when secrets are present.
Troubleshooting
- Verify
nats-headless exists and StatefulSet uses it as serviceName.
- Ensure routes port
6222 is exposed and reachable across pods.
JetStream not available
- Check
jsz endpoint for meta leader and replicas.
- Confirm disk volume mounts are present and writable.
Auth/TLS failures
- Verify secrets exist:
kubectl -n <namespace> get secret nats-secrets.
- Check that cert/key values are correct and PEM formatted.