ADR-006: Walking Skeleton Ingest Pipeline

Status: Accepted Version: 1.0 Date: 2026-01-01 Supersedes: N/A Related ADRs: ADR-004 (Semantic Core Formalization) Related PRDs: PRD-INGEST-001


Context

The Walking Skeleton requires a minimal end-to-end flow to validate system architecture and component connectivity. The first critical capability is ingesting SEA-DSL policy files, parsing them into structured representations, and indexing them for later retrieval and governance checks.

Per the Walking Skeleton plan (P001-SKELETON), we need to demonstrate:

  1. Parse .sea files using tree-sitter
  2. Store RDF triples in Oxigraph
  3. Store embeddings in pgvector
  4. Enable downstream query and policy enforcement

Decision

Implement a minimal ingest pipeline for Cycle S1A with the following components:

  1. Parser: Use tree-sitter-based SEA-DSL parser to produce AST
  2. Triple Generator: Convert AST to RDF triples (Oxigraph format)
  3. Embedding Generator: Generate embeddings using EmbeddingGemma (llama.cpp)
  4. Dual Storage: Store triples in Oxigraph, embeddings in pgvector
  5. Idempotency: Support re-ingestion without duplication

Rationale

Local-First Stack

Alternatives Considered

Alternative Rejected Because
Python parser (lark/PLY) Slower, harder to integrate with Rust/TS ecosystem
Remote embedding API (OpenAI) Non-deterministic, requires internet, cost
In-memory triple store Data loss on restart, no persistence
MongoDB for vectors Requires additional service, less mature vector search

Consequences

Positive

Negative

Implementation Notes

Success Criteria


Next Steps: