ADR-007: Walking Skeleton Vector Memory
Status: Accepted
Version: 1.0
Date: 2026-01-01
Supersedes: N/A
Related ADRs: ADR-006 (Ingest Pipeline)
Related PRDs: PRD-MEMORY-001
Context
The Walking Skeleton requires semantic search capability to retrieve relevant policies based on query similarity. After ingesting policies (S1A), we need a vector memory layer that:
- Stores embeddings (384-dimensional vectors)
- Performs similarity search (cosine distance)
- Supports the downstream RAG query flow (S1D)
Per P001-SKELETON, this is the second component in the golden thread: “Index it, and query it.”
Decision
Implement a minimal vector memory service for Cycle S1B with:
- Embedding Model: EmbeddingGemma via llama.cpp (local, deterministic)
- Vector Store: pgvector extension in PostgreSQL
- Search Algorithm: Cosine similarity with HNSW indexing
- API: Simple query interface (text → top-k similar policies)
Rationale
Local-First Stack
- llama.cpp: C++ inference, no Python dependencies, fast CPU execution
- EmbeddingGemma: Small model (384-dim), good quality/performance tradeoff
- pgvector: PostgreSQL extension, zero additional services, SQL-based queries
- HNSW: Approximate nearest neighbor, fast for moderate dataset sizes
Alternatives Considered
| Alternative |
Rejected Because |
| OpenAI Embeddings API |
Non-deterministic, requires internet, cost, API keys |
| Sentence Transformers |
Heavier Python runtime, slower than llama.cpp |
| Milvus / Qdrant |
Over-engineered for skeleton, extra service overhead |
| FAISS |
Requires separate index management, pgvector simpler |
Consequences
Positive
- Zero external API dependencies
- Deterministic embeddings (same input → same vector)
- Fast local inference (llama.cpp optimized)
- SQL-based queries (familiar to developers)
- Foundation for policy retrieval in S1D
Negative
- Embedding quality limited by EmbeddingGemma model size
- pgvector performance degrades at very large scale (>1M vectors)
- HNSW index rebuild required for bulk updates
- llama.cpp requires native compilation
Implementation Notes
- Embedding dimension: 384 (EmbeddingGemma default)
- Distance metric: Cosine similarity (1 - cosine_distance)
- Index type: HNSW (m=16, ef_construction=64)
- Top-k retrieval: Default k=5, configurable
- Batch embedding: Process multiple texts in single inference call
Success Criteria
Next Steps:
- Define PRD-MEMORY-001 (requirements)
- Design SDS-MEMORY-010 (service architecture)
- Implement Cycle S1B (embedding + search)