PRD-QUERY-001: RAG Query Orchestration
Type
Functional
Priority
Critical
MVP Status
✅ MVP (Walking Skeleton Cycle S1D - FINAL)
When a user submits a natural language query, the system shall retrieve relevant policies, enforce governance, and synthesize an answer using RAG.
User Story
As a policy consumer, I want to ask questions about policies in natural language, so that I can get accurate answers with governance enforcement and source citations.
Acceptance Criteria
AC-001.1: Accept Natural Language Query
- Given a user submits a text query
- When the query service receives it
- Then it shall parse and validate the input
AC-001.2: Semantic Policy Retrieval
- Given a validated query
- When semantic search is performed
- Then top-k relevant policies shall be retrieved with similarity scores
AC-001.3: Governance Enforcement
- Given retrieved policies
- When governance check is performed
- Then unauthorized policies shall be filtered out before synthesis
AC-001.4: RAG Answer Synthesis
- Given authorized policies and query
- When LLM synthesis is performed
- Then a natural language answer shall be generated using retrieved context
AC-001.5: Structured Response
- Given synthesized answer
- When the service returns it
- Then response shall include answer text, source policy IDs, and confidence score
- Given a typical query
- When full pipeline executes
- Then response shall be returned in <5s (p95)
Dependencies
- Semantic Kernel framework
- llama.cpp with Gemma-2B or Phi-3 model
- Memory service (S1B embeddings + search)
- Governance service (S1C OPA checks)
- Ingest service (S1A policy storage)
- ADRs: ADR-009 (RAG Orchestration), ADR-006, ADR-007, ADR-008
- SDS: SDS-QUERY-010 (Query Service)
- Plan: P001-SKELETON (Walking Skeleton)
Success Metrics
- Query Latency: <5s end-to-end (p95)
- Retrieval Recall: >80% for known-relevant policies
- Governance Coverage: 100% of queries checked
- Answer Quality: Human eval score >3/5 (subjective)
Non-Functional Requirements
- NFR-001.1: Local-first (no external LLM API calls)
- NFR-001.2: Auditable (log all queries + answers)
- NFR-001.3: Observable (emit latency metrics per pipeline stage)
- NFR-001.4: Deterministic (same query → same answer, given same context)
Out of Scope (for MVP)
- Multi-turn conversations (stateless queries only)
- Complex query decomposition (single-step RAG)
- Fine-tuned domain LLMs
- Answer caching and deduplication
- Streaming responses
Next Steps:
- Design SDS-QUERY-010
- Implement Cycle S1D vertical slice
- Create end-to-end integration test
- Validate complete Walking Skeleton