This packet replaces the old Task 23 notes. It is aligned to Phase 16 / Task 23 in docs/plans/2026-01-25-end-state.md and the current repo infrastructure.
Use this as the exact execution guide for an agent implementing Task 23.
Add a chaos test suite that validates platform invariants under failure:
Based on repo state:
infra/docker/docker-compose.dev.yml (sea-openobserve).infra/docker/docker-compose.dev.yml (sea-otel-collector).infra/docker/docker-compose.skeleton.yml (sea-opa).apps/sea-mq-worker (no existing docker-compose service). We will add it to a chaos compose file.infra/docker/docker-compose.mesh.yml).
sea-nats-a-1 … sea-nats-b-3.4422, B client 5422, gateway port 7522.Default ports for chaos stack (match dev unless mesh is used):
54324222 (monitor 8222)81818080 (entrypoint default)50804317/4318/8888/13133Create:
infra/docker/docker-compose.chaos.ymlInclude services (with profiles):
postgres (profile chaos-outbox)nats (single-node JetStream, profile chaos-outbox)sea-mq-worker (profile chaos-outbox)opa (profile chaos-opa)policy-gateway (profile chaos-opa)openobserve (profile chaos-observe)otel-collector (profile chaos-observe)Rule: services must be profile-gated so scenarios run only what they need.
Port consistency: use the default ports listed in Step 0 unless a scenario explicitly requires alternate ports.
Create:
1
2
3
4
5
6
7
8
9
10
11
12
tests/chaos/
run_chaos.py
scenarios/
nats_partition_mesh.py
opa_restart.py
openobserve_stall.py
postgres_restart_outbox.py
probes/
http_probe.py
nats_probe.py
jetstream_probe.py
db_probe.py
Add a manual just chaos recipe (do not add to just ci).
nats_partition_meshCompose file: infra/docker/docker-compose.mesh.yml (from Task 15)
Fault injection: block gateway traffic inside one node container (e.g., sea-nats-a-1) by dropping port 7522.
Invariant:
Probe:
SEA_EVENTS subject sea.event.chaos_probe.v1.SEA_EVENTS_MIRROR).Defaults:
CHAOS_NATS_A_URL=nats://localhost:4422CHAOS_NATS_B_URL=nats://localhost:5422CHAOS_DOMAIN_A=sea-aCHAOS_DOMAIN_B=sea-bopa_restartCompose file: infra/docker/docker-compose.chaos.yml with profile chaos-opa
Fault injection: docker restart sea-opa.
Invariant:
openobserve_stallCompose file: infra/docker/docker-compose.chaos.yml with profile chaos-observe
Fault injection: stop sea-openobserve container.
Invariant (collector-based, resolved):
postgres_restart_outboxCompose file: infra/docker/docker-compose.chaos.yml with profile chaos-outbox
Fault injection: docker restart <postgres>.
Invariant:
Probe path (resolved):
infra/db/migrations/001_outbox_events.sql).Prefer CLI-based probes for determinism:
psql or docker exec for DB insertscurl for HTTP health and policy evaluationIf Python clients are used, keep deps minimal and pinned in tests/chaos/requirements.txt.
Each scenario must:
inject_fault() completes within 10sassert_invariants() completes within 60–180sheal() runs even on failureAdd a manual recipe (not in just ci):
chaos scenario="nats_partition_mesh":
CHAOS_SCENARIO= python tests/chaos/run_chaos.py