SEA-Forge Last-Mile Execution Plan

Created: 2026-01-17 Target: 100% Implementation Current State: ~89% Complete (v0.6.0)

Executive Summary

This plan prioritizes the remaining ~11% of implementation work based on:

Dependency order — Items that unblock other items come first
Impact magnitude — High-visibility or high-risk gaps prioritized
Effort-to-value ratio — Quick wins before heavy lifts

SEA-DSL is the canonical semantic code for SEA-Forge; projections and compiler outputs are derived artifacts.

Priority Tiers

Tier	Theme	Why First
P0	Foundation & Security	Blocks production adoption; security gaps are showstoppers
P1	Core Platform Completion	Fills gaps in primary value proposition
P2	Developer Experience	Improves adoption but not blocking
P3	Advanced Features	Nice-to-have, can ship without

Execution Checklist

Tier P0: Foundation & Security (Do First)

These items block production use or create security/reliability risks.

[x] P0.1: Workbench Authentication

Area: Workbench UI (85% → 95%) Impact: 🔴 CRITICAL — UI is unsecured without this Effort: Medium (3-5 days) Dependencies: None

Scope:

Implement auth provider integration (OAuth2/OIDC)
Add protected route wrapper to apps/workbench/
Session management with token refresh
Role-based access control for governance console

Files:

apps/workbench/src/contexts/auth.tsx (new)
apps/workbench/src/components/ProtectedRoute.tsx (new)
apps/workbench/src/App.tsx (wrap routes)

Verification: Manual login flow test + E2E auth test

[x] P0.2: Governance Audit Trail Persistence

Area: Governance Runtime (100% → 100%+) Impact: 🔴 CRITICAL — Required for compliance/audit Effort: Medium (3-4 days) Dependencies: P0.1 (auth context needed for actor tracking)

Scope:

Persist OPA policy decisions to PostgreSQL
Add audit log query API to policy-gateway
Wire audit log viewer into Workbench Governance Console

Files:

services/policy-gateway/src/audit.py (new)
services/policy-gateway/src/models.py (add AuditLog model)
libs/governance-runtime/adapters/src/audit_repository.py (new)

Verification: just test-adapters governance-runtime + manual console check

Tier P1: Core Platform Completion (Do Second)

These complete the primary value proposition.

[x] P1.1: DSL Parser Error Recovery (100%)

Area: DSL Parsing & Compilation Impact: 🟡 HIGH — Better DX, fewer failed parses Effort: Low-Medium (2-3 days) Dependencies: None

Scope:

Improve error recovery in apps/sea-forge-cli/ parser
Add structured error codes with suggestions
Ensure LSP surfaces these errors correctly

Files:

apps/sea-forge-cli/src/parser/ (error handling)
extensions/domainforge-lsp/ (error display)

Verification: Unit tests for error cases + LSP error display test

[x] P1.2: Knowledge Graph Reasoning Engine (85% → 100%)

Area: Knowledge Graph Impact: 🟡 HIGH — Enables inference-based queries Effort: High (5-7 days) Dependencies: None

Scope:

Integrate OWL reasoning (e.g., HermiT/ELK via Python wrapper or SPARQL 1.1 entailment)
Add reasoning mode flag to services/knowledge-graph/
Update SHACL validation to use inferred triples

Files:

services/knowledge-graph/src/reasoner.py (new)
services/knowledge-graph/main.py (add reasoning endpoint)
tools/kg_validate.py (optional reasoning flag)

Verification: SPARQL query returning inferred triples + SHACL test with inference

[x] P1.3: Messaging Federation/Clustering (100%)

Area: Messaging Impact: 🟡 HIGH — Required for multi-node deployments Effort: High (5-7 days) Dependencies: None

Scope:

Configure NATS JetStream clustering
Add cluster-aware consumer groups to apps/sea-mq-worker/
Update Pulumi/Docker Compose for multi-node NATS

Files:

apps/sea-mq-worker/src/config.rs (cluster settings)
deploy/pulumi/components/sea-cell.ts (NATS cluster)
infra/docker-compose.*.yaml (multi-node config)

Verification: Message delivery test across 2+ nodes

[x] P1.4: Workbench Manifest Inspector (100%)

Area: Workbench UI Impact: 🟡 MEDIUM — Key debugging/inspection tool Effort: Medium (3-4 days) Dependencies: None

Scope:

Add manifest tree view component
JSON/YAML toggle view
Diff view against previous manifest version

Files:

apps/workbench/src/pages/ManifestInspector.tsx (new)
apps/workbench/src/components/ManifestTree.tsx (new)
apps/workbench/src/App.tsx (add route)

Verification: Visual inspection + E2E navigation test

[x] P1.5: Workbench Ops Actions (100%)

Area: Workbench UI Impact: 🟡 MEDIUM — Enables operational workflows from UI Effort: Medium (3-4 days) Dependencies: P0.1 (auth required for privileged actions)

Scope:

Add “Regenerate Code” action button
Add “Validate Specs” action button
Add “Run Gap Report” action button
Connect to justfile recipes via API

Files:

apps/workbench/src/pages/Operations.tsx (new)
apps/workbench/src/lib/operations-api.ts (new)
apps/workbench/src/types/operations.ts (new)
services/workbench-bff/src/api/ops_routes.py (new)
services/workbench-bff/src/adapters/ops_runner.py (new)

Verification: Manual click-through + action execution verification

Tier P2: Developer Experience (Do Third)

These improve adoption but aren’t blocking.

[ ] P2.1: Zed Extension Marketplace (90% → 100%)

Area: IDE Integration Impact: 🟢 MEDIUM — Expands IDE coverage Effort: Low (1-2 days) Dependencies: tree-sitter-sea must be published to GitHub

Scope:

Publish tree-sitter-sea to GitHub with releases
Create Zed extension manifest
Submit PR to zed-industries/extensions

Files:

extensions/tree-sitter-sea/ (publish)
extensions/zed-sea/ (new or update)

Verification: Install from Zed extensions marketplace

[ ] P2.2: WASM npm/cargo Publishing (Partial → 100%)

Area: Distribution Impact: 🟢 MEDIUM — Enables browser/edge use cases Effort: Low (1-2 days) Dependencies: None

Scope:

Verify WASM build passes all tests
Publish to npm as @sea-forge/wasm
Publish to crates.io (verify sea-forge-wasm crate)

Files:

.github/workflows/release.yml (add WASM publish step)
apps/sea-forge-cli/Cargo.toml (verify WASM target)

Verification: npm install @sea-forge/wasm + import test

[ ] P2.3: Incident Runbooks (95% → 100%)

Area: Observability Impact: 🟢 LOW — Operational documentation Effort: Low (1-2 days) Dependencies: None

Scope:

Create runbook templates for common incidents
Link runbooks to OpenObserve alerts
Add runbook references to observability docs

Files:

docs/runbooks/ (new directory)
docs/runbooks/high-latency.md, message-dlq.md, opa-failure.md

Verification: Documentation review

[ ] P2.4: Performance Testing Suite (90% → 95%)

Area: Testing & E2E Impact: 🟢 MEDIUM — Validates scalability claims Effort: Medium (3-4 days) Dependencies: None

Scope:

Add k6/Locust load tests for policy-gateway
Add compilation pipeline throughput benchmarks
Add memory/CPU profiling for codegen

Files:

tests/performance/ (new directory)
tests/performance/test_policy_gateway_load.py
tests/performance/test_codegen_throughput.py

Verification: just test-performance recipe

[ ] P2.5: Chaos Testing Suite (90% → 95%)

Area: Testing & E2E Impact: 🟢 MEDIUM — Validates resilience claims Effort: Medium (3-4 days) Dependencies: P1.3 (federation needed for meaningful chaos tests)

Scope:

Add Chaos Mesh or Litmus experiments
Test NATS node failure recovery
Test OPA sidecar restart behavior
Test PostgreSQL failover

Files:

tests/chaos/ (new directory)
tests/chaos/nats_partition.yaml
tests/chaos/opa_restart.yaml

Verification: Chaos experiment execution with service recovery validation

Tier P3: Advanced Features (Do Last)

Nice-to-have features that can ship without.

[x] P3.1: Provenance Tracking System

Area: Drift / Misalignment Handling Impact: 🔵 LOW — Vision feature, not core Effort: High (7-10 days) Dependencies: P0.2 (audit trail foundation)

Scope:

Track spec→artifact lineage in database
Add provenance query API
Visualize lineage in Workbench

What this enables: Answer “which spec produced this code?” and “what changed between v1.0 and v1.1?”

[x] P3.2: Automatic Drift Remediation

Area: Drift / Misalignment Handling Impact: 🔵 LOW — Vision feature, not core Effort: Very High (10+ days) Dependencies: P3.1 (provenance needed first)

Scope:

Suggest spec updates when drift detected
Auto-generate PRs for simple remediations
Integrate with CI for drift-aware merges

[x] P3.3: Runtime Behavior Correlation

Area: Drift / Misalignment Handling Impact: 🔵 LOW — Vision feature, not core Effort: Very High (10+ days) Dependencies: P0.2, P3.1

Scope:

Correlate OTLP traces with declared specs
Detect behavioral drift (not just config drift)
Alert on spec-behavior divergence

[ ] P3.4: Federal/Finance/Healthcare E2E Tests

Area: Testing & E2E Impact: 🔵 LOW — Domain-specific, optional Effort: Medium (3-5 days each) Dependencies: Domain experts for requirements

Scope:

Implement deferred E2E tests for regulated domains
Add compliance-specific fixtures
Validate against domain regulations

[ ] P3.5: Self-Hosting gen.py via SDS-021

Area: Dogfooding Impact: 🔵 LOW — Internal consistency Effort: Medium (4-5 days) Dependencies: SDS-021 spec must be complete

Scope:

Replace handwritten tools/codegen/gen.py with SDS-021-generated version
Remove “HANDWRITTEN” TODO
Validate determinism still holds

Red flag addressed: Main code generator should dogfood the spec-first philosophy.

Execution Order Summary

Week 1-2: P0 (Security & Foundation)
└── P0.2 Audit Trail Persistence

Week 3-4: P1 (Core Completion)
├── P1.1 DSL Error Recovery
├── P1.2 KG Reasoning Engine
├── P1.3 Messaging Federation
├── P1.4 Manifest Inspector
└── P1.5 Ops Actions

Week 5-6: P2 (Developer Experience)
├── P2.1 Zed Extension
├── P2.2 WASM Publishing
├── P2.3 Incident Runbooks
├── P2.4 Performance Tests
└── P2.5 Chaos Tests

Week 7+: P3 (Advanced Features)
├── P3.1 Provenance Tracking
├── P3.2 Drift Remediation
├── P3.3 Runtime Correlation
├── P3.4 Domain E2E Tests
└── P3.5 gen.py Dogfooding

Completion Tracking

Area	Current	After P0	After P1	After P2	After P3
DSL Parsing & Compilation	95%	95%	100%	100%	100%
Code Generation	100%	100%	100%	100%	100%
Governance Runtime	100%	100%+	100%+	100%+	100%+
Knowledge Graph	85%	85%	100%	100%	100%
Messaging	75%	75%	100%	100%	100%
Observability	95%	95%	95%	100%	100%
Workbench UI	85%	85%	95%	95%	100%
IDE Integration	90%	90%	90%	100%	100%
Testing & E2E	90%	90%	90%	98%	100%
Production Infra	100%	100%	100%	100%	100%
Overall	89%	91%	96%	99%	100%

Notes

P0 items are non-negotiable for any production deployment
P1 completes the core platform — can ship a “v1.0” after this tier
P2 and P3 can be parallelized if team capacity allows
P3.1-P3.3 realize the “vision” stated in README — without these, SEA-Forge is a code generator, not an organizational semantics platform