SEA-Forge Capability Report (Code-Backed)

Scope: This report is evidence-based and grounded in repository code/config as of this workspace. It does not rely on README claims.

SEA-DSL is treated as the canonical semantic code for SEA-Forge; projections and compiler outputs are derived artifacts.

1. Core Capabilities (Implemented)

Semantic modeling / specifications

SEA DSL compilation artifacts (SEA → AST → IR → Manifest)
- What it does: Parses SEA DSL into AST JSON, compiles AST to IR, then IR to a manifest JSON used downstream.
- Evidence: just/62-compiler.just (pipeline/parse/compile steps), tools/ast_to_ir.py, tools/ir_to_manifest.py.
- Status: End-to-end usable for generating .ast.json, .ir.json, .manifest.json when a sea parser is installed.
SDS YAML validation and SDS→manifest compilation
- What it does: Validates SDS YAML against schema and can compile SDS directly to manifest.
- Evidence: tools/validate_sds.py, tools/codegen/sds_to_manifest.py, just/62-compiler.just (sds-validate, sds-compile).
- Status: End-to-end usable.
Flow annotation linting (CQRS/runtime metadata)
- What it does: Lints SEA AST to enforce CQRS annotations and runtime metadata (tx, idempotency, outbox, read_model).
- Evidence: tools/flow_lint.py.
- Status: End-to-end usable.

Compilation / transformation pipelines

Full spec pipeline (SDS/SEA → AST → IR → Manifest)
- What it does: Orchestrates validation + compilation steps per bounded context.
- Evidence: just/62-compiler.just (pipeline, pipeline-all).
- Status: End-to-end usable.
IR → Knowledge Graph snapshot
- What it does: Generates deterministic RDF triples and snapshot metadata from SEA IR, with optional SHACL validation.
- Evidence: tools/ir_to_kgs.py, tools/kg_validate.py.
- Status: End-to-end usable (requires Python deps for RDF/SHACL).

Validation / governance / invariant checking

Schema validation for AST/IR/manifest
- What it does: Enforces JSON schema constraints during compilation steps.
- Evidence: tools/ast_to_ir.py, tools/ir_to_manifest.py, tools/schemas/*.json.
- Status: End-to-end usable.
SHACL validation & shape linting
- What it does: Validates RDF graphs against SHACL, lints shapes, and performs shape impact analysis.
- Evidence: tools/kg_validate.py, tools/shape_lint.py, tools/shape_impact_analysis.py, just/63-kg.just.
- Status: End-to-end usable.
Governance gateway (OPA-backed)
- What it does: Evaluates LLM requests against policies, supports enforcement modes, audit logging, and policy reload.
- Evidence: services/policy-gateway/main.py, services/policy-gateway/src/api/routes.py, infra/opa/.
- Status: End-to-end usable (depends on OPA + config).

Code / artifact generation

Manifest → generated code
- What it does: Renders code into src/gen/** from manifest using Jinja templates.
- Evidence: tools/codegen/gen.py, tools/codegen/templates/*.
- Status: Partially implemented (file is handwritten with --hand-write-ok bypass notice).
Nx generators (bounded context, adapter, API surface)
- What it does: Scaffolds hexagonal architecture projects, adapters, and API surfaces.
- Evidence: generators/ (e.g., generators/bounded-context, generators/adapter, generators/api-surface).
- Status: End-to-end usable.
Post-codegen gap report
- What it does: Reports missing adapters, packages, env vars; can populate .env and check versions online.
- Evidence: tools/codegen/gap_report.py.
- Status: Partially implemented (handwritten with --hand-write-ok bypass notice; uses online lookups).

CLI tooling

SEA-Forge CLI (Rust)
- What it does: Setup, validate specs, manage services (up/down/status/logs/watch), apply declared state.
- Evidence: apps/sea-forge-cli/src/main.rs, apps/sea-forge-cli/src/commands/*.rs.
- Status: End-to-end usable (requires Rust build and just/Docker tooling).
Ops runner (Workbench)
- What it does: Allowlisted, auditable execution of just commands (spec validation, pipeline, gap report), with logs and status.
- Evidence: services/workbench-bff/src/adapters/ops_runner.py, services/workbench-bff/src/api/ops_routes.py.
- Status: End-to-end usable.
CLI utilities for validation and drift
- What it does: Drift scan/remediate, KG validation, shape lint, schema compatibility, etc.
- Evidence: tools/drift_heal.py, tools/kg_validate.py, tools/schema-compat-check.py.
- Status: End-to-end usable.

Persistence / provenance / history

Provenance graph (spec → artifact lineage)
- What it does: Builds a DAG of ADR/PRD/SDS/SEA → manifest → artifact with hashes and timestamps.
- Evidence: services/workbench-bff/src/adapters/provenance_registry.py, services/workbench-bff/src/api/provenance_routes.py.
- Status: Partially implemented (graph build works; historical compare is not implemented).
Manifest registry and history
- What it does: Indexes manifests, validates payloads, serves versions and diffs.
- Evidence: services/workbench-bff/src/adapters/manifest_registry.py, services/workbench-bff/src/api/routes.py.
- Status: End-to-end usable.
Behavior correlation summaries (DB-backed)
- What it does: Stores behavior drift summaries in PostgreSQL for the Workbench UI.
- Evidence: services/workbench-bff/src/adapters/behavior_indexer.py, services/workbench-bff/src/api/database.py.
- Status: End-to-end usable, but depends on DB setup.

AI / agent integration hooks

A2A protocol gateway
- What it does: Agent card discovery, task lifecycle endpoints, OpenAI-compatible routes, MCP routes, and WebSocket updates.
- Evidence: services/a2a/src/main.py, services/a2a/src/api/routes.py, services/a2a/src/api/openai_routes.py, services/a2a/src/api/mcp_routes.py.
- Status: End-to-end usable.
LLM provider service (LiteLLM-based)
- What it does: OpenAI-compatible chat completions and embeddings, optional Policy Gateway routing.
- Evidence: services/llm-provider/src/main.py, services/llm-provider/src/api/routes.py, services/llm-provider/src/adapters/litellm_adapter.py.
- Status: End-to-end usable (depends on model/provider credentials).
Policy Gateway (OPA enforcement + audit)
- What it does: Evaluates prompt policies, proxies LLM requests, supports hot reload and audit queries.
- Evidence: services/policy-gateway/src/api/routes.py, services/policy-gateway/main.py.
- Status: End-to-end usable (depends on OPA + config).
Embedding service (local llama.cpp + fallback)
- What it does: OpenAI-compatible embeddings, uses local model or deterministic fallback.
- Evidence: services/embedding/src/main.py.
- Status: End-to-end usable.
Knowledge Graph service (Oxigraph-backed)
- What it does: SPARQL endpoint, event-to-RDF projection, snapshot retrieval, SHACL validation.
- Evidence: services/knowledge-graph/main.py, services/knowledge-graph/src/api/routes.py.
- Status: End-to-end usable.

Infrastructure / orchestration

Local orchestration and service recipes
- What it does: Docker compose definitions, OPA policies, NATS, OTEL, and just recipes.
- Evidence: infra/docker/docker-compose.*.yml, infra/opa/, infra/nats/, justfile and just/*.
- Status: End-to-end usable.
Messaging worker (outbox/inbox/DLQ)
- What it does: NATS JetStream worker for outbox publishing, inbox consumption, and DLQ replay.
- Evidence: apps/sea-mq-worker/src/main.rs and related modules.
- Status: End-to-end usable (depends on Postgres + NATS).

2. Supported Workflows

A. Spec pipeline: SDS/SEA → AST → IR → Manifest

Inputs: docs/specs/<ctx>/<ctx>.sea, optional docs/specs/<ctx>/<ctx>.sds.yaml.
Outputs: <ctx>.ast.json, <ctx>.ir.json, <ctx>.manifest.json in docs/specs/<ctx>/.
How to run: just pipeline <ctx>.
Guarantees: Schema validation on AST/IR/manifest, deterministic compilation in ast_to_ir.py/ir_to_manifest.py.
Evidence: just/62-compiler.just, tools/ast_to_ir.py, tools/ir_to_manifest.py.

B. SDS-only compile → manifest

Inputs: *.sds.yaml.
Outputs: *.manifest.json.
How to run: just sds-compile <file>.
Guarantees: SDS schema validation (if run); deterministic output from compiler.
Evidence: tools/validate_sds.py, tools/codegen/sds_to_manifest.py, just/62-compiler.just.

C. Manifest inspection and diff (Workbench BFF)

Inputs: manifest files and history/index (artifacts/manifests/...).
Outputs: manifest summaries, full payload, YAML render, version history, JSON patch diffs.
How to run: Workbench BFF routes /manifests, /manifests/{id}, /diff.
Guarantees: Manifest payloads validated against schema.
Evidence: services/workbench-bff/src/api/routes.py, services/workbench-bff/src/adapters/manifest_registry.py.

D. Provenance graph queries

Inputs: Specs and manifest files in docs/specs/ and artifacts/manifests/.
Outputs: lineage graph and impact lists.
How to run: Workbench BFF /provenance/* endpoints.
Guarantees: Hashing for nodes; lineage based on file discovery rules.
Evidence: services/workbench-bff/src/adapters/provenance_registry.py, services/workbench-bff/src/api/provenance_routes.py.

E. Drift scan → analysis → remediation

Inputs: Provenance graph + file hashes, manifests, local repo state.
Outputs: drift reports, classifications, remediation suggestions, audit log entries, optional PR.
How to run: python tools/drift_heal.py scan/check/remediate, or Workbench BFF /drift/* endpoints.
Guarantees: Hash-based drift detection with optional deep diff, safety policy for auto-fix.
Evidence: tools/drift_heal.py, services/workbench-bff/src/adapters/drift_detector.py, services/workbench-bff/src/adapters/remediation_engine.py, services/workbench-bff/src/api/drift_routes.py.

F. Knowledge Graph ingestion and querying

Inputs: SPARQL queries or event projections.
Outputs: SPARQL results, RDF triples, snapshot metadata.
How to run: Knowledge Graph service endpoints /kg/sparql, /kg/projection, /kg/snapshot/{id}.
Guarantees: SHACL validation available for graphs; snapshot IDs are content-addressable.
Evidence: services/knowledge-graph/src/api/routes.py, tools/ir_to_kgs.py.

G. Governance evaluation and proxying of LLM calls

Inputs: Prompt + context; or OpenAI-compatible chat/embeddings requests.
Outputs: policy decision + audit entry; or proxied LLM responses.
How to run: Policy Gateway /policy/evaluate, /policy/chat/completions, /policy/embeddings, /governance/audit.
Guarantees: OPA evaluation and enforcement mode, audit logging.
Evidence: services/policy-gateway/src/api/routes.py, services/policy-gateway/main.py.

H. Workbench Ops workflows

Inputs: Ops action + context (validate specs, regenerate code, gap report).
Outputs: run status, streamed logs, persisted run metadata.
How to run: Workbench BFF /ops/* endpoints and Workbench UI.
Guarantees: Allowlisted commands only; timeouts and output caps.
Evidence: services/workbench-bff/src/adapters/ops_runner.py, services/workbench-bff/src/api/ops_routes.py, apps/workbench/src/pages/Operations.tsx.

3. Guardrails, Constraints, and Guarantees

Schema gates: AST, IR, and manifest are validated against JSON schemas before proceeding. (tools/ast_to_ir.py, tools/ir_to_manifest.py, tools/schemas/*)
Flow annotation invariants: CQRS annotations and runtime metadata enforced by tools/flow_lint.py (errors/warnings and GHA output supported).
SDS schema validation: SDS YAML validated with JSON Schema (tools/validate_sds.py).
SHACL compliance: KG validation and shape linting for RDF graphs (tools/kg_validate.py, tools/shape_lint.py).
Ops safety: Workbench Ops only runs allowlisted commands with timeouts and log limits (services/workbench-bff/src/adapters/ops_runner.py).
Drift remediation safety policy: Disallows editing generated paths, limits file types and diff size before auto-fix (services/workbench-bff/src/adapters/remediation_engine.py).
Governance enforcement: Policy Gateway can block/log/pass based on OPA results; audit logging is enforced (services/policy-gateway/main.py, services/policy-gateway/src/api/routes.py).
Audit logging: Workbench BFF creates tamper-evident audit events for operations (services/workbench-bff/src/api/audit.py).

4. Drift / Misalignment Handling

Spec↔artifact drift detection: Hash-based comparisons with optional deep diffs and drift classification (benign/spec_stale/code_stale/semantic). (services/workbench-bff/src/adapters/drift_detector.py)
Provenance graph: DAG linking ADR→PRD→SDS→SEA→manifest→artifact, used by drift detector. (services/workbench-bff/src/adapters/provenance_registry.py)
Remediation engine: Suggests actions and can run just pipeline <context>, optionally creating PRs. (services/workbench-bff/src/adapters/remediation_engine.py, services/workbench-bff/src/adapters/pr_creator.py)
KG drift check: Warn-only diff of inferred triples vs baseline. (tools/kg_drift_check.py)

Limitations / gaps:

Historical provenance comparison is explicitly not implemented (throws NotImplementedError). (services/workbench-bff/src/adapters/provenance_registry.py, services/workbench-bff/src/api/provenance_routes.py)
Behavior correlation scan progress is simulated, and summary data is mocked (no live OpenObserve ingestion in the current code path). (services/workbench-bff/src/api/behavior_routes.py)

5. What Is Clearly NOT Implemented (Yet)

Historical provenance comparison: compare_versions is not implemented and endpoints note placeholder behavior. (services/workbench-bff/src/adapters/provenance_registry.py, services/workbench-bff/src/api/provenance_routes.py)
Workbench auth: BFF auth is a stub using headers; production OIDC is not implemented here. (services/workbench-bff/src/api/auth.py)
Runtime behavior correlation ingestion: Scan uses mocked summaries and simulated steps; no real OpenObserve queries in current implementation. (services/workbench-bff/src/api/behavior_routes.py)
Code generation pipeline completeness: tools/codegen/gen.py and tools/codegen/gap_report.py are handwritten with --hand-write-ok bypass comments, indicating incomplete generator/spec alignment.
LSP implementation: just/30-cli.just references apps/sea-lsp/src, but no apps/sea-lsp/ directory exists in this repo.
Policy Gateway evidence export: /governance/export returns a zip built from in-memory mock data. (services/policy-gateway/src/api/routes.py)

6. One-Paragraph “What This System Really Is”

SEA-Forge today is a spec-first toolchain and runtime suite that compiles SEA DSL and SDS YAML into JSON artifacts (AST/IR/manifest), validates them via schema and linting rules, and exposes services for governance (OPA-backed Policy Gateway), LLM access (LiteLLM provider), a Knowledge Graph API, and a Workbench UI/BFF for manifest inspection, provenance, drift detection, and ops automation; it also includes Rust CLI tooling and a NATS/DB worker for outbox/inbox processing, with some areas (behavior correlation ingestion, historical provenance diffs, and generator completeness) explicitly stubbed or marked as handwritten.

7. Evidence-Backed Capability List

SEA-Forge currently supports SEA DSL → AST → IR → Manifest compilation (tools/ast_to_ir.py, tools/ir_to_manifest.py, just/62-compiler.just).
SEA-Forge currently supports SDS YAML validation and SDS → manifest compilation (tools/validate_sds.py, tools/codegen/sds_to_manifest.py).
SEA-Forge currently supports Flow annotation linting for CQRS/runtime metadata (tools/flow_lint.py).
SEA-Forge currently supports manifest inspection and diffing via Workbench BFF (services/workbench-bff/src/api/routes.py).
SEA-Forge currently supports provenance graph construction and lineage queries (services/workbench-bff/src/adapters/provenance_registry.py, services/workbench-bff/src/api/provenance_routes.py).
SEA-Forge currently supports drift detection, analysis, and remediation workflows (services/workbench-bff/src/adapters/drift_detector.py, services/workbench-bff/src/adapters/remediation_engine.py, tools/drift_heal.py).
SEA-Forge currently supports Knowledge Graph queries and event projections (services/knowledge-graph/src/api/routes.py).
SEA-Forge currently supports OPA-backed governance enforcement and auditing (services/policy-gateway/src/api/routes.py).
SEA-Forge currently supports OpenAI-compatible LLM access via LiteLLM (services/llm-provider/src/api/routes.py).
SEA-Forge currently supports A2A agent gateway endpoints (services/a2a/src/api/routes.py).
SEA-Forge currently supports Ops execution with allowlisted commands and streamed logs (services/workbench-bff/src/adapters/ops_runner.py).
SEA-Forge currently supports NATS-backed outbox/inbox processing (apps/sea-mq-worker/src/main.rs).

8. Notes for Documentation Accuracy

Observed mismatches or overstatements vs code:

README: “Provenance and history for every change.”
- Code builds a provenance graph and manifest history, but historical lineage comparison is not implemented and relies on an index/history file rather than full git history. (services/workbench-bff/src/adapters/provenance_registry.py)
README: “Policy Gateway intercepts all LLM calls.”
- In code, Policy Gateway use is conditional (enabled by config). The LLM provider can bypass the gateway when disabled or when PolicyGatewayBypassError occurs. (services/llm-provider/src/adapters/litellm_adapter.py)
README: “Runtime behavior correlation / observed reality.”
- Current Workbench behavior correlation endpoints use mock summaries and simulated scan progress; no OpenObserve ingestion is wired in the scan path. (services/workbench-bff/src/api/behavior_routes.py)
README: “Generative Engine 100%”
- Core generator scripts tools/codegen/gen.py and tools/codegen/gap_report.py are explicitly handwritten with --hand-write-ok bypass markers, indicating generator/spec alignment is incomplete.

If you want, I can cross-check other documentation files against these findings and annotate them directly.