ADR-009: Walking Skeleton RAG Query Orchestration
Status: Accepted
Version: 1.0
Date: 2026-01-01
Supersedes: N/A
Related ADRs: ADR-006 (Ingest), ADR-007 (Memory), ADR-008 (Governance)
Related PRDs: PRD-QUERY-001
Context
The Walking Skeleton requires end-to-end query orchestration to complete the golden thread. After ingesting policies (S1A), enabling retrieval (S1B), and governance (S1C), we need an orchestration layer that:
- Accepts natural language queries
- Retrieves relevant policies via semantic search
- Enforces governance checks
- Synthesizes answers using RAG
Per P001-SKELETON, this is the final component: “Ask ‘What is this policy?’ → Retrieve → Synthesize answer”
Decision
Implement a minimal RAG query service for Cycle S1D with:
- Orchestration: Semantic Kernel (SK) framework
- Query Flow: NL query → embeddings → similarity search → governance → synthesis
- LLM: Local model via llama.cpp (Gemma-2B or Phi-3)
- Response: Structured answer with sources and confidence
Rationale
Semantic Kernel as Orchestrator
- Lightweight: Minimal dependencies, C#/Python/TS support
- Pluggable: Swappable LLMs, memory stores, planners
- Local-first: Supports llama.cpp backend
- Structured: Built-in prompt templates, function calling
Alternatives Considered
| Alternative |
Rejected Because |
| LangChain |
Heavier, Python-only, more complex |
| LlamaIndex |
More opinionated, steeper learning curve |
| Custom orchestration |
Reinventing the wheel, harder to maintain |
| OpenAI API |
Non-local, non-deterministic, requires API keys |
Consequences
Positive
- Zero external API dependencies
- Structured orchestration (prompts, planners)
- Local LLM inference (privacy, no cost)
- Foundation for complex multi-step queries
- Pluggable architecture (easy to swap components)
Negative
- Semantic Kernel learning curve
- Limited LLM quality (small local models)
- Inference latency on CPU
- Memory constraints for large models
Implementation Notes
- SK Framework: C# or Python runtime
- LLM Backend: llama.cpp with Gemma-2B-Instruct
- Query pipeline:
- Parse natural language query
- Generate query embedding
- Semantic search (top-5 policies)
- Governance check (OPA)
- LLM synthesis with RAG context
- Return structured answer
- Response format: JSON with answer, sources, confidence
Success Criteria
Next Steps:
- Define PRD-QUERY-001 (requirements)
- Design SDS-QUERY-010 (service architecture)
- Implement Cycle S1D (RAG orchestration)
- Create end-to-end integration test