PRD-QUERY-001: RAG Query Orchestration

Type

Functional

Priority

Critical

MVP Status

✅ MVP (Walking Skeleton Cycle S1D - FINAL)

Requirement Statement (EARS Format)

When a user submits a natural language query, the system shall retrieve relevant policies, enforce governance, and synthesize an answer using RAG.

User Story

As a policy consumer, I want to ask questions about policies in natural language, so that I can get accurate answers with governance enforcement and source citations.

Acceptance Criteria

AC-001.1: Accept Natural Language Query

Given a user submits a text query
When the query service receives it
Then it shall parse and validate the input

AC-001.2: Semantic Policy Retrieval

Given a validated query
When semantic search is performed
Then top-k relevant policies shall be retrieved with similarity scores

AC-001.3: Governance Enforcement

Given retrieved policies
When governance check is performed
Then unauthorized policies shall be filtered out before synthesis

AC-001.4: RAG Answer Synthesis

Given authorized policies and query
When LLM synthesis is performed
Then a natural language answer shall be generated using retrieved context

AC-001.5: Structured Response

Given synthesized answer
When the service returns it
Then response shall include answer text, source policy IDs, and confidence score

AC-001.6: End-to-End Performance

Given a typical query
When full pipeline executes
Then response shall be returned in <5s (p95)

Dependencies

Semantic Kernel framework
llama.cpp with Gemma-2B or Phi-3 model
Memory service (S1B embeddings + search)
Governance service (S1C OPA checks)
Ingest service (S1A policy storage)

ADRs: ADR-009 (RAG Orchestration), ADR-006, ADR-007, ADR-008
SDS: SDS-QUERY-010 (Query Service)
Plan: P001-SKELETON (Walking Skeleton)

Success Metrics

Query Latency: <5s end-to-end (p95)
Retrieval Recall: >80% for known-relevant policies
Governance Coverage: 100% of queries checked
Answer Quality: Human eval score >3/5 (subjective)

Non-Functional Requirements

NFR-001.1: Local-first (no external LLM API calls)
NFR-001.2: Auditable (log all queries + answers)
NFR-001.3: Observable (emit latency metrics per pipeline stage)
NFR-001.4: Deterministic (same query → same answer, given same context)

Out of Scope (for MVP)

Multi-turn conversations (stateless queries only)
Complex query decomposition (single-step RAG)
Fine-tuned domain LLMs
Answer caching and deduplication
Streaming responses

Next Steps:

Design SDS-QUERY-010
Implement Cycle S1D vertical slice
Create end-to-end integration test
Validate complete Walking Skeleton