PRD-MEMORY-001: Vector Semantic Memory
Type
Functional
Priority
Critical
MVP Status
✅ MVP (Walking Skeleton Cycle S1B)
When a user queries for a policy, the system shall retrieve the most semantically similar policies using vector embeddings and cosine similarity.
User Story
As a policy consumer, I want to search for policies by semantic meaning (not just keywords), so that I can find relevant governance rules even when using different terminology.
Acceptance Criteria
AC-001.1: Generate Embeddings
- Given policy text content
- When the embedding service processes it
- Then it shall produce a 384-dimensional vector using EmbeddingGemma via llama.cpp
AC-001.2: Store Embeddings in pgvector
- Given a policy ID and embedding vector
- When the storage service writes it
- Then it shall be stored in PostgreSQL with pgvector extension
AC-001.3: Similarity Search
- Given a query text
- When the search service processes it
- Then it shall return top-k most similar policies ranked by cosine similarity
- Given search results
- When the service returns them
- Then each result shall include policy ID, content, and similarity score
- Given a dataset of up to 1000 policies
- When a similarity search is executed
- Then results shall be returned in <100ms (p95)
AC-001.6: Deterministic Embeddings
- Given the same policy text
- When embeddings are generated multiple times
- Then the vector output shall be identical (deterministic)
Dependencies
- llama.cpp (C++ inference engine)
- EmbeddingGemma model weights
- PostgreSQL 15+ with pgvector extension
- HNSW index support in pgvector
- ADRs: ADR-007 (Vector Memory), ADR-006 (Ingest Pipeline)
- SDS: SDS-MEMORY-010 (Memory Service)
- Plan: P001-SKELETON (Walking Skeleton)
Success Metrics
- Embedding Generation: <50ms per policy (p95)
- Search Latency: <100ms for top-5 results (p95)
- Recall@5: >90% for known-relevant policies
- Index Build Time: <10s for 1000 policies
Non-Functional Requirements
- NFR-001.1: Deterministic (same input → same embedding)
- NFR-001.2: Local-first (no external API calls)
- NFR-001.3: Observability (emit search latency metrics)
- NFR-001.4: Scalable indexing (HNSW for fast ANN search)
Out of Scope (for MVP)
- Multi-modal embeddings (images, tables)
- Fine-tuned domain-specific embedding models
- Hybrid search (keyword + semantic)
- Embedding model versioning and migration
- Distributed vector index (sharding)
Next Steps:
- Design SDS-MEMORY-010
- Implement Cycle S1B vertical slice
- Write integration tests per acceptance criteria