Implementation Plan: LLM Provider Abstraction

Phase 9 Artifact — This plan enables vertical slice delivery per ENGINEERING.SOP.md.

Purpose

Provide a unified LLM provider interface via LiteLLM, enabling SEA™ services to interact with OpenAI, Anthropic, Ollama, and OpenRouter through a single abstraction layer with Policy Gateway integration.


Upstream/Downstream Dependencies

Critical ordering — This plan must be implemented after its dependencies and before services that consume it.

Dependency Plan Reason
⬆️ Depends on P007 GovernedSpeed™ Runtime LLM calls route through Policy Gateway (SDS-047)
⬆️ Depends on P009 Observability LLM calls emit OpenTelemetry spans
⬇️ Needed by P020 Cognitive Extension Layer Context Analyzer and Artifact Engine use LLM
⬇️ Needed by P019 Semantic Core Services Embedding generation for Knowledge Graph
⬇️ Needed by P013 Generative Synthesis LLM for code generation

Pre-Flight Validation

STOP. Before implementing, validate all input specifications.

ADR Validation

Check Requirement Pass
ADR-035 exists File: docs/specs/shared/adr/035-llm-provider-abstraction.md [x]
Has Context section Explains multi-provider LLM integration needs [x]
Has Decision section LiteLLM as unified abstraction [x]
Has Constraints section MUST/MUST NOT statements [x]
Has Consequences section Trade-offs documented [x]
References prior ADRs ADR-028 (GovernedSpeed™ LLMOps) [x]

PRD Validation

Check Requirement Pass
PRD-010 exists docs/specs/shared/prd/010-ai-governance-runtime.md [x]
Has Satisfies: ADR-028 Traces to governance decision [x]
Uses EARS notation When/The system shall… [x]
Each REQ has ID REQ-GS-001..005 [x]

SDS Validation

Check Requirement Pass
SDS-049 exists docs/specs/shared/sds/049-llm-provider-service.md [x]
Has domain glossary LlmProvider, ChatCompletion, Embedding [x]
Entities defined ProviderConfig, ModelSpec [x]
Flows defined CMD-LLM-001..002, QRY-LLM-001..002 [x]
Ports defined LlmProviderPort, PolicyGatewayPort, ProviderConfigPort [x]
Invariants defined POL-LLM-001..005, INV-LLM-001..004 [x]

Provenance Chain

Complete traceability from ADR → PRD → SDS → Implementation.

graph TD
    ADR35[ADR-035: LLM Provider Abstraction] --> PRD10[PRD-010: AI Governance Runtime]
    ADR28[ADR-028: GovernedSpeed™ LLMOps] --> PRD10
    PRD10 --> SDS49[SDS-049: LLM Provider Service]
    SDS49 --> SDS47[SDS-047: GovernedSpeed™ Runtime]

    SDS49 --> C1A[C1A: Domain + Ports]
    SDS49 --> C1B[C1B: LiteLLM Adapter]
    SDS49 --> C2A[C2A: Fake Adapter + Tests]

    style ADR35 fill:#e1f5ff
    style ADR28 fill:#e1f5ff
    style PRD10 fill:#fff4e1
    style SDS49 fill:#e8f5e9
    style SDS47 fill:#e8f5e9
ADR ID PRD ID SDS Element Cycle
ADR-035 PRD-010 Entity: ProviderConfig C1A
ADR-035 PRD-010 Port: LlmProviderPort C1A
ADR-035 PRD-010 Flow: CMD-LLM-001 C1B
ADR-035 PRD-010 Adapter: LiteLLMAdapter C1B
ADR-035 PRD-010 Adapter: FakeLlmAdapter C2A

Implementation Order

Per ENGINEERING.SOP.md Phase 9.

  1. Domain model + invariants — ProviderConfig, ModelSpec, ChatMessage, Embedding
  2. Ports — LlmProviderPort interface + FakeLlmAdapter
  3. Adapters — LiteLLMAdapter wrapping LiteLLM Python
  4. Wiring — HTTP client for TypeScript consumers
  5. Observability — OpenTelemetry spans on all LLM calls
  6. Tests — Unit tests with FakeLlmAdapter
  7. Feature flagllm.provider.enabled for rollout

Proposed Cycles (Worktree-First)

Cycle Worktree Branch Wave Implements
C1A ../SEA-p31-c1A cycle/p31-c1A-llm-domain 1 Domain + Ports
C1B ../SEA-p31-c1B cycle/p31-c1B-litellm-adapter 1 LiteLLM Adapter
C2A ../SEA-p31-c2A cycle/p31-c2A-fake-tests 2 Fake Adapter + Tests
C3A ../SEA-p31-c3A cycle/p31-c3A-ts-client 3 TypeScript HTTP Client

Wave 1 (Parallel)

Wave 2 (Depends on Wave 1)

Wave 3 (Depends on Wave 2)


Nx Generators to Use

Generator Command When to Use
Bounded Context just generator-bc llm-provider Create llm-provider context
Adapter just generator-adapter litellm llm-provider Create LiteLLM adapter
Adapter just generator-adapter fake llm-provider Create Fake adapter

Expected Filetree After Implementation

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
libs/llm-provider/
├── domain/
│   └── src/
│       ├── index.ts
│       └── lib/
│           ├── provider-config.ts       # [SDS-049: Entity ProviderConfig]
│           ├── model-spec.ts            # [SDS-049: Entity ModelSpec]
│           ├── chat-message.ts          # [SDS-049: VO ChatMessage]
│           └── embedding.ts             # [SDS-049: VO Embedding]
├── ports/
│   └── src/
│       └── lib/
│           ├── llm-provider.port.ts     # [SDS-049: PORT-LLM-001]
│           └── policy-gateway.port.ts   # [SDS-049: PORT-LLM-002]
├── adapters/
│   └── src/
│       └── lib/
│           ├── litellm.adapter.ts       # HTTP client to Python service
│           ├── litellm.adapter.spec.ts
│           ├── fake.adapter.ts          # Deterministic test double
│           └── fake.adapter.spec.ts
└── application/
    └── src/
        └── lib/
            ├── complete-chat.handler.ts  # [SDS-049: CMD-LLM-001]
            └── generate-embedding.handler.ts  # [SDS-049: CMD-LLM-002]

services/llm-provider/                    # Python FastAPI service
├── src/
│   ├── adapters/
│   │   └── litellm_adapter.py           # LiteLLM wrapper
│   ├── api/
│   │   └── routes.py                    # OpenAI-compatible API
│   └── main.py
├── pyproject.toml
└── Dockerfile

Dependencies Introduced

Dependency Type Version Package Justification
litellm Python 1.56.x litellm Unified LLM provider abstraction
ollama Docker 0.5.x ollama/ollama Local development provider
fastapi Python 0.115.x fastapi LLM service API framework
opentelemetry-instrumentation-litellm Python opentelemetry-instrumentation-litellm OTel integration

Validation & Verification

Pre-Merge Checks

Post-Implementation Verification

Verification Commands

1
2
3
4
5
6
7
8
9
10
# Start Ollama locally
just dev-up

# Test LLM service
curl -X POST http://localhost:8001/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"model": "ollama/llama3.2", "messages": [{"role": "user", "content": "Hello"}]}'

# Run adapter tests
just test libs/llm-provider

Risks & Mitigations

Risk Likelihood Impact Mitigation
LiteLLM version incompatibility Medium Medium Pin version, test upgrades in CI
Ollama unavailable in CI Low High Use FakeLlmAdapter for unit tests; Ollama only for integration
Policy Gateway latency Low Medium Cache policy decisions; monitor latency
Provider API changes Low Medium LiteLLM handles provider differences

Open Questions

  1. Should LiteLLM run as a sidecar or dedicated service? Dedicated service
  2. How to handle streaming responses through Policy Gateway? Chunk-level filtering

References

Document Purpose
ADR-035 Architecture decision
SDS-049 Service design
SDS-047 GovernedSpeed™ Runtime (Policy Gateway)
ADR-028 Governance requirements
dependency_gap_analysis.md LiteLLM selection rationale

Note: SDS-042 (Policy Gateway Service) has been superseded by SDS-047 per ADR-031.