End-State Implementation Plan

For Claude: REQUIRED SUB-SKILL: Use superpowers:executing-plans to implement this plan Task-by-Task.

Goal: Close all known gaps from last-mile, runtime correlation, evidence pipeline audits, and project state to reach 100% intended end-state.

Architecture: Spec-first pipeline with deterministic compilers and append-only ledgers. All missing runtime behaviors must be implemented by updating specs/generators and regenerating outputs where required, avoiding handwritten changes in generated zones.

Tech Stack: Python (FastAPI, SQLAlchemy), TypeScript (Nx), Rust (CLI/workers), Postgres, Oxigraph, OPA, NATS, OpenTelemetry/OpenObserve, Garage (S3 API).


Cycle Structure (per .agent/workflows/cycle.md)

Idempotent Execution Protocol (Follow Exactly)

This plan is designed to be idempotent: an agent can be pointed at it repeatedly and will always execute the next incomplete Task.

How to choose the next Task

  1. Scan for the first checklist item or Task that is not marked complete (no [x]).
  2. If the Task appears partially done (files exist but tests fail, or outputs are missing), finish it; do not skip.
  3. If the Task is complete but not marked, verify quickly (run the cited test/command) and then mark it [x].

Completion rules

Proposed Cycle Index

Phase 10 (Evidence Pipeline Core)

Phase 11 (Connectors + Parsing)

Phase 12 (Declared Intent + SEA Emission)

Phase 13 (Runtime Behavior Correlation to Production)

Phase 14 (Platform Gaps)

Phase 15 (Spec-First Dogfooding)

Phase 16 (Product & Release Readiness)


Pre-Flight Checks (Before Starting Any Cycle)


Phase 10 — Evidence Pipeline Core

[x] Task 1: Evidence ledger schema + append-only DB model

Files:

Step 1: Write the failing test

1
2
3
4
5
6
async def test_append_only_ledger_rejects_updates(db_session):
    ledger = EvidenceLedger(db_session)
    event = EvidenceEvent(...)
    await ledger.append(event)
    with pytest.raises(ValueError):
        await ledger.update(event.evidence_id, {...})

Step 2: Run test to verify it fails

Run: pytest services/workbench-bff/tests/test_evidence_ledger.py::test_append_only_ledger_rejects_updates -v Expected: FAIL (EvidenceLedger not implemented)

Step 3: Write minimal implementation

Step 4: Run tests to verify they pass

Run: pytest services/workbench-bff/tests/test_evidence_ledger.py -v Expected: PASS

Step 5: Commit

1
2
git add services/workbench-bff/src/adapters/evidence_ledger.py services/workbench-bff/migrations/versions/00x_add_evidence_ledger.py services/workbench-bff/tests/test_evidence_ledger.py
git commit -m "feat(evidence): add append-only evidence ledger"

[x] Task 2: Generate EvidenceArtifact repository adapter (spec-first)

Files:

Step 1: Write failing test (already present)

Step 2: Run test to verify it fails

Run: pytest libs/governance-runtime/adapters/tests/integration/test_evidence_artifact_repository_integration.py -v Expected: FAIL (missing module file)

Step 3: Update specs/templates and regenerate

Step 4: Run tests to verify they pass

Run: pytest libs/governance-runtime/adapters/tests/integration/test_evidence_artifact_repository_integration.py -v Expected: PASS

Step 5: Commit

1
2
git add docs/specs/governance-runtime/governance-runtime.sds.yaml tools/codegen/templates libs/governance-runtime/adapters/src/gen/evidence_artifact_repository.py
git commit -m "feat(governance-runtime): generate evidence artifact repository"

[x] Task 3: Wire CollectEvidence handler to ledger + repository

Files:

Step 1: Write the failing test

1
2
3
4
5
async def test_collect_evidence_persists_metadata(fake_repo, fake_ledger):
    handler = CollectEvidenceHandlerImpl(fake_repo, fake_ledger)
    result = await handler.execute(CollectEvidenceCommand(...))
    assert result.success is True
    assert fake_ledger.appended_count == 1

Step 2: Run test to verify it fails

Run: pytest libs/governance-runtime/application/tests/test_collect_evidence_handler.py -v Expected: FAIL (handler not wired)

Step 3: Write minimal implementation

Step 4: Run tests to verify they pass

Run: pytest libs/governance-runtime/application/tests/test_collect_evidence_handler.py -v Expected: PASS

Step 5: Commit

1
2
git add libs/governance-runtime/application/src/collect_evidence_handler_impl.py libs/governance-runtime/application/tests/test_collect_evidence_handler.py libs/governance-runtime/adapters/src/impl.py
git commit -m "feat(governance-runtime): wire collect evidence handler"

Phase 11 — Connectors + Deterministic Parsing

[x] Task 4: GitHub connector + parser → EvidenceEvent

Files:

Step 1: Write the failing test

1
2
3
4
def test_github_webhook_to_evidence_event():
    payload = load_fixture("github/push.json")
    event = GitHubParser().parse(payload)
    assert event.source == "github"

Step 2: Run test to verify it fails

Run: pytest services/evidence-ingest/tests/test_github_ingest.py -v Expected: FAIL (parser missing)

Step 3: Write minimal implementation

Step 4: Run tests to verify they pass

Run: pytest services/evidence-ingest/tests/test_github_ingest.py -v Expected: PASS

Step 5: Commit

1
2
git add services/evidence-ingest/src/adapters/github_connector.py services/evidence-ingest/src/parsers/github_parser.py services/evidence-ingest/tests/test_github_ingest.py
git commit -m "feat(evidence): add GitHub connector + parser"

[x] Task 5: Okta connector + parser → EvidenceEvent

Files:

Step 1: Write the failing test

1
2
3
4
def test_okta_event_to_evidence_event():
    payload = load_fixture("okta/system_log.json")
    event = OktaParser().parse(payload)
    assert event.source == "okta"

Step 2: Run test to verify it fails

Run: pytest services/evidence-ingest/tests/test_okta_ingest.py -v Expected: FAIL

Step 3: Write minimal implementation

Step 4: Run tests to verify they pass

Run: pytest services/evidence-ingest/tests/test_okta_ingest.py -v Expected: PASS

Step 5: Commit

1
2
git add services/evidence-ingest/src/adapters/okta_connector.py services/evidence-ingest/src/parsers/okta_parser.py services/evidence-ingest/tests/test_okta_ingest.py
git commit -m "feat(evidence): add Okta connector + parser"

[x] Task 6: CI/CD connector + parser → EvidenceEvent

Files:

Step 1: Write the failing test

1
2
3
4
def test_ci_event_to_evidence_event():
    payload = load_fixture("ci/build.json")
    event = CIParser().parse(payload)
    assert event.source == "ci"

Step 2: Run test to verify it fails

Run: pytest services/evidence-ingest/tests/test_ci_ingest.py -v Expected: FAIL

Step 3: Write minimal implementation

Step 4: Run tests to verify they pass

Run: pytest services/evidence-ingest/tests/test_ci_ingest.py -v Expected: PASS

Step 5: Commit

1
2
git add services/evidence-ingest/src/adapters/ci_connector.py services/evidence-ingest/src/parsers/ci_parser.py services/evidence-ingest/tests/test_ci_ingest.py
git commit -m "feat(evidence): add CI/CD connector + parser"

Phase 12 — Declared Intent + SEA Emission

Base branch: dev

[x] Task 7: Declared intent ingestion (deterministic + optional LLM)

Files:

Step 1: Write the failing test

1
2
3
4
def test_declared_intent_from_markdown():
    text = load_fixture("intent/operating_model.md")
    items = IntentParser().parse(text)
    assert len(items) > 0

Step 2: Run test to verify it fails

Run: pytest services/intent-ingest/tests/test_declared_intent_parser.py -v Expected: FAIL

Step 3: Write minimal implementation

Step 4: Run tests to verify they pass

Run: pytest services/intent-ingest/tests/test_declared_intent_parser.py -v Expected: PASS

Step 5: Commit

1
2
git add services/intent-ingest/src/parsers/intent_parser.py services/intent-ingest/src/llm/intent_extractor.py services/intent-ingest/tests/test_declared_intent_parser.py
git commit -m "feat(intent): add deterministic declared intent ingestion"

[x] Task 8: SEA-DSL emitter from assertions

Files:

Step 1: Write the failing test

1
2
3
4
def test_emitter_is_deterministic(tmp_path):
    emit_sea(input_data, tmp_path / "out.sea")
    emit_sea(input_data, tmp_path / "out2.sea")
    assert (tmp_path / "out.sea").read_text() == (tmp_path / "out2.sea").read_text()

Step 2: Run test to verify it fails

Run: pytest tools/tests/test_sea_emitter_determinism.py -v Expected: FAIL

Step 3: Write minimal implementation

Step 4: Run tests to verify they pass

Run: pytest tools/tests/test_sea_emitter_determinism.py -v Expected: PASS

Step 5: Commit

1
2
git add tools/emit_sea_from_assertions.py tools/tests/test_sea_emitter_determinism.py
git commit -m "feat(sea): add deterministic assertion emitter"

[x] Task 9: Protobuf export via SEA CLI (Domainforge)

Decision: ir_to_proto is not needed because SEA CLI already exports protobuf directly from .sea models.

Files:

Step 1: Add verification test (shell-level)

1
sea project --format protobuf docs/specs/<ctx>/<ctx>.sea ./tmp/<ctx>.proto

Step 2: Validate deterministic output

1
2
3
sea project --format protobuf docs/specs/<ctx>/<ctx>.sea ./tmp/<ctx>.proto
sea project --format protobuf docs/specs/<ctx>/<ctx>.sea ./tmp/<ctx>.proto.2
diff -u ./tmp/<ctx>.proto ./tmp/<ctx>.proto.2

Step 3: Validate gRPC service emission

1
2
sea project --format protobuf --include-services docs/specs/<ctx>/<ctx>.sea ./tmp/<ctx>.services.proto
rg -n "^service " ./tmp/<ctx>.services.proto

Step 4: Validate multi-file output

1
2
3
rm -rf ./tmp/<ctx>-proto
sea project --format protobuf --multi-file --output-dir ./tmp/<ctx>-proto docs/specs/<ctx>/<ctx>.sea
rg --files ./tmp/<ctx>-proto

Step 5: Validate package and language options

1
2
3
4
5
6
sea project --format protobuf --package "com.example.api" \
  --option java_package="com.example.api" \
  --option go_package="github.com/example/api" \
  docs/specs/<ctx>/<ctx>.sea ./tmp/<ctx>.options.proto
rg -n "package com.example.api;" ./tmp/<ctx>.options.proto
rg -n "java_package = \\\"com.example.api\\\"" ./tmp/<ctx>.options.proto

Step 6: (Optional) buf lint/breaking checks if buf installed

1
sea project --format protobuf --buf-lint docs/specs/<ctx>/<ctx>.sea ./tmp/<ctx>.proto

Step 7: Commit

1
2
git add docs/plans/2026-01-25-end-state.md
git commit -m "docs(plan): use SEA CLI for protobuf export"

[x] Task 9b: Domainforge LSP + VS Code extension wiring

Files:

Step 1: Add install-domainforge-lsp recipe (shell)

1
2
3
# If repo exists, build and install binary to PATH
cargo build --release --manifest-path $PROJECTS_DIR/domainforge-lsp/Cargo.toml
cp $PROJECTS_DIR/domainforge-lsp/target/release/domainforge-lsp $HOME/.local/bin/

Step 2: Wire setup/install to call install-domainforge-lsp

1
2
# Add to just/20-setup.just (or just/30-cli.just) so `just setup` or `just install-cli`
# invokes `install-domainforge-lsp` when the repo exists

Step 3: Verify LSP binary on PATH

1
domainforge-lsp --help

Step 4: Build VS Code extension

1
2
3
cd \$PROJECTS_DIR/domainforge-vsc-extension
pnpm install
pnpm run package

Step 5: Verify extension can locate the LSP (manual check)

1
2
Set VS Code setting: domainforge.server.path=$HOME/.local/bin/domainforge-lsp
Restart VS Code, open a .sea file, confirm LSP starts and diagnostics appear.

Step 6: Commit

1
2
git add just/30-cli.just just/20-setup.just
git commit -m "build(cli): install domainforge-lsp via just setup"

Phase 13 — Runtime Behavior Correlation (Production)

[x] Task 10: Replace mock behavior routes with real storage

Files:

Step 1: Write a failing test

1
2
3
async def test_behavior_summary_uses_db(db_session, client):
    # seed DB, call /behavior/summary, expect DB results
    assert "mock" not in response

Step 2: Run test to verify it fails

Run: pytest services/workbench-bff/tests/test_behavior_api.py::test_behavior_summary_uses_db -v Expected: FAIL

Step 3: Implement minimal wiring

Step 4: Run tests to verify they pass

Run: pytest services/workbench-bff/tests/test_behavior_api.py -v Expected: PASS

Step 5: Commit

1
2
git add services/workbench-bff/src/api/behavior_routes.py services/workbench-bff/src/adapters/behavior_indexer.py services/workbench-bff/tests/test_behavior_api.py
git commit -m "feat(behavior): wire API to DB summaries"

[x] Task 11: OTLP/OpenObserve ingest wiring

Files:

Step 1: Write failing test

1
2
3
def test_openobserve_query_builds_url():
    client = OpenObserveClient(...)
    assert "trace_id" in client.build_query_url(...)

Step 2: Run test to verify it fails

Run: pytest services/workbench-bff/tests/test_openobserve_client.py -v Expected: FAIL

Step 3: Implement client

Step 4: Run tests to verify they pass

Run: pytest services/workbench-bff/tests/test_openobserve_client.py -v Expected: PASS

Step 5: Commit

1
2
git add services/workbench-bff/src/adapters/openobserve_client.py services/workbench-bff/src/api/behavior_routes.py services/workbench-bff/tests/test_openobserve_client.py
git commit -m "feat(behavior): add OpenObserve client"

[x] Task 12: CI drift gate for runtime correlation

Files:

Step 1: Write failing test

1
2
def test_gate_fails_on_high_drift():
    assert run_gate("fixtures/drift_high.json") == 1

Step 2: Run test to verify it fails

Run: pytest tests/ci/test_behavior_drift_gate.py -v Expected: FAIL

Step 3: Implement gate script

Step 4: Run tests to verify they pass

Run: pytest tests/ci/test_behavior_drift_gate.py -v Expected: PASS

Step 5: Commit

1
2
git add scripts/ci/behavior_drift_gate.py just/50-ci.just tests/ci/test_behavior_drift_gate.py
git commit -m "feat(ci): add behavior drift gate"

Phase 14 — Platform Gaps (Truth, Reasoning, Federation)

[x] Task 13: Provenance historical comparison (deterministic)

Intent: Provide audit-grade, deterministic comparison between two declared states.

Primary comparison source: Manifest snapshot registry (content-addressed DAG), not git working tree.

Files:

Behavioral contract:

Step 1: Write failing test

1
2
3
4
def test_compare_versions_returns_changes(tmp_repo):
    registry = ProvenanceRegistry()
    added, removed, changed = registry.compare_versions(...)
    assert len(added) >= 0

Step 2: Run test to verify it fails

Run: pytest services/workbench-bff/tests/test_provenance_compare.py -v Expected: FAIL (NotImplementedError)

Step 3: Implement minimal comparison

Step 4: Run tests to verify they pass

Run: pytest services/workbench-bff/tests/test_provenance_compare.py -v Expected: PASS

Step 5: Commit

1
2
git add services/workbench-bff/src/adapters/provenance_registry.py services/workbench-bff/src/api/provenance_routes.py services/workbench-bff/tests/test_provenance_compare.py
git commit -m "feat(provenance): implement historical compare"

Acceptance criteria:


[x] Task 14: Knowledge graph reasoning integration (bounded, deterministic)

Intent: Enable explainable inference without unbounded reasoning.

Reasoning model:

Files:

Implementation rules:

Step 1: Write failing test

1
2
3
async def test_inferred_triples_available():
    result = await adapter.query_sparql(...)
    assert "inferred" in result

Step 2: Run test to verify it fails

Run: pytest services/knowledge-graph/tests/test_reasoning.py -v Expected: FAIL

Step 3: Implement reasoning

Step 4: Run tests to verify they pass

Run: pytest services/knowledge-graph/tests/test_reasoning.py -v Expected: PASS

Step 5: Commit

1
2
git add services/knowledge-graph/src/reasoner.py services/knowledge-graph/src/adapters/oxigraph_adapter.py services/knowledge-graph/tests/test_reasoning.py
git commit -m "feat(kg): add reasoning integration"

Acceptance criteria:


[x] Task 15: Messaging federation and clustering (SEA-Cell scale invariant)

Intent: Messaging must scale seamlessly from single instance → cluster → mesh.

Scope (mandatory):

Files:

Required behaviors:

Step 1: Write failing test

1
2
#[test]
fn cluster_failover_recovers() { /* ... */ }

Step 2: Run test to verify it fails

Run: cargo test -p sea-mq-worker Expected: FAIL

Step 3: Implement minimal cluster config

Step 4: Run tests to verify they pass

Run: cargo test -p sea-mq-worker Expected: PASS

Step 5: Commit

1
2
git add apps/sea-mq-worker/src/main.rs infra/docker/docker-compose.*.yml apps/sea-mq-worker/tests/e2e_messaging_test.rs
git commit -m "feat(messaging): add NATS clustering support"

Test cases (minimum):

Acceptance criteria:

Phase 15 — Spec-First Dogfooding

[x] Task 16: Replace handwritten gen.py with SDS-021-generated code

Files:

Rules:

Step 1: Write failing test

1
2
def test_codegen_is_spec_generated():
    assert "HANDWRITTEN" not in Path("tools/codegen/gen.py").read_text()

Step 2: Run test to verify it fails

Run: pytest tools/tests/test_codegen_determinism.py -v Expected: FAIL (HANDWRITTEN marker present)

Step 3: Implement spec + generator updates and regenerate

Step 4: Run tests to verify they pass

Run: pytest tools/tests/test_codegen_determinism.py -v Expected: PASS

Step 5: Commit

1
2
git add docs/specs/shared/sds tools/codegen/gen.py tools/tests/test_codegen_determinism.py
git commit -m "feat(codegen): dogfood gen.py from SDS-021"

Acceptance criteria:

[x] Task 17: Replace handwritten gap_report.py

Files:

Behavioral contract:

Step 1: Write failing test

Completed: Created tools/tests/test_gap_report_generation.py with 10 comprehensive tests covering:

Step 2: Run test to verify it fails

Completed: Initial run showed HANDWRITTEN marker present (expected failure)

Step 3: Implement spec + generator updates and regenerate

Completed:

Step 4: Run tests to verify they pass

Completed: All 10 tests in test_gap_report_generation.py passed

1
2
pytest tools/tests/test_gap_report_generation.py -v
# Result: 10 passed in 0.10s

Step 5: Commit

1
2
git add tools/codegen/gap_report.py tools/tests/test_gap_report_generation.py tools/codegen/bootstrap_gap_report.py
git commit -m "feat(codegen): dogfood gap report generator from SDS-021"

Acceptance criteria:

Phase 16 — Product & Release Readiness

[x] Task 18: Workbench authentication + governance export

Intent: Secure, role-aware control plane suitable for commercial deployment.

Auth model:

Files:

RBAC requirements:

Governance export:

Step 1: Write failing test

1
it('requires login for governance pages', async () => { /* ... */ })

Step 2: Run test to verify it fails

Run: pnpm exec nx test workbench -- --testNamePattern="requires login" Expected: FAIL

Step 3: Implement minimal auth & export wiring

Step 4: Run tests to verify they pass

Run: pnpm exec nx test workbench -- --testNamePattern="requires login" Expected: PASS

Step 5: Commit

1
2
git add services/workbench-bff/src/api/auth.py apps/workbench/src/contexts/auth.tsx services/policy-gateway/src/api/routes.py apps/workbench/e2e/auth.spec.ts
git commit -m "feat(workbench): harden auth + policy export"

Acceptance criteria:

[ ] Task 19: Zed extension marketplace publishing

Intent: Ship editor tooling with clean licensing and zero implicit telemetry.

Requirements:

Files:

Step 1: Add release metadata and tags Step 2: Validate build Step 3: Submit PR to zed extensions repo Step 4: Verify install from marketplace Step 5: Commit


[ ] Task 20: WASM publishing (npm + crates)

Intent: Supply-chain-safe distribution.

Requirements:

Files:

Steps:


[ ] Task 21: Incident runbooks (derived from invariants)

Minimum runbooks:

Each runbook must include:

Files:


[x] Task 22: Performance testing suite

Focus areas:

Requirements:

Files:


[x] Task 23: Chaos testing suite

Mandatory scenarios:

Acceptance criteria:

Files:


[x] Task 24: Domain E2E tests (federal/finance/healthcare)

Intent: Validate platform invariants across regulated domains.

Structure:

Requirement: E2E suites must compile and deploy minimal cells from specs, not handwritten scenarios.

Files:

Completion Summary:

Step 1: Create E2E contract test directory structure

Created tests/e2e/contracts/ with:

Step 2: Create validation probes

Created 5 probe modules in tests/e2e/contracts/probes/:

Step 3: Create domain test suites

Created 3 domain test suites with 96 total tests:

Step 4: Add just e2e recipes

Added 8 opt-in recipes to just/40-test.just:

Step 5: Verify tests

Run: pytest --collect-only tests/e2e/contracts/ Result: 96 tests collected successfully

Acceptance criteria:


Final Plan Validation


Global Definition of Done (Applies to All Tasks Phase 14+)

A task is complete only when: