πŸ”— LLM Integration

Provider abstraction for language models.


Overview

The LLM Provider Service (SDS-049) abstracts away specific model providers, enabling:


Architecture

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  Agent Requesting LLM Call                                   β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
               β”‚
               β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  Policy Gateway                                              β”‚
β”‚  β€’ Rate limiting                                             β”‚
β”‚  β€’ Content filtering                                         β”‚
β”‚  β€’ Cost tracking                                             β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
               β”‚
               β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  LiteLLM Abstraction                                         β”‚
β”‚  β€’ Unified interface                                         β”‚
β”‚  β€’ Provider routing                                          β”‚
β”‚  β€’ Automatic retry                                           β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
               β”‚
     β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
     β–Ό         β–Ό         β–Ό         β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ OpenAI  β”‚β”‚Anthropicβ”‚β”‚ Ollama  β”‚β”‚ Azure   β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Configuration

Environment Variables

1
2
3
4
5
6
7
8
9
10
11
# Primary provider
export LLM_PROVIDER=anthropic
export ANTHROPIC_API_KEY=sk-ant-...

# Fallback providers
export OPENAI_API_KEY=sk-...
export OLLAMA_BASE_URL=http://localhost:11434

# Policy settings
export LLM_MAX_TOKENS_PER_MINUTE=100000
export LLM_COST_LIMIT_DAILY=50.00

Provider Routing

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
# llm-config.yaml
providers:
  - name: "anthropic"
    priority: 1
    models:
      - "claude-sonnet"
      - "claude-haiku"
    
  - name: "openai"
    priority: 2
    models:
      - "gpt-4o"
    fallback: true
    
  - name: "ollama"
    priority: 3
    models:
      - "llama3"
    localOnly: true

routing:
  strategy: "cost-optimized"  # or "latency-optimized", "quality-optimized"
  fallbackEnabled: true
  retries: 3

Usage

Direct Call

1
2
3
4
5
6
7
8
9
10
from llm_provider import LLMProvider

provider = LLMProvider()
response = await provider.complete(
    model="claude-sonnet",
    messages=[
        {"role": "user", "content": "Analyze this document..."}
    ],
    max_tokens=1000
)

With Agent Configuration

1
2
3
4
# Agent uses model from config
agentId: "semantic-mapper"
model: "claude-sonnet"  # Resolved via LLM Provider
maxOutputTokens: 500

Policy Gateway

Rate Limiting

1
2
3
4
5
policies:
  rateLimit:
    tokensPerMinute: 100000
    requestsPerMinute: 100
    onLimit: "queue"  # or "reject"

Content Filtering

1
2
3
4
5
6
7
8
policies:
  contentFilter:
    enabled: true
    rules:
      - type: "pii-detection"
        action: "redact"
      - type: "toxicity"
        action: "block"

Cost Control

1
2
3
4
5
policies:
  costControl:
    dailyLimit: 50.00
    warningThreshold: 0.8
    onLimit: "alert-and-block"

Observability

LLM calls are traced via OpenTelemetry:

1
2
3
4
5
6
7
8
9
10
11
12
{
  "traceId": "abc123",
  "spanName": "llm.complete",
  "attributes": {
    "llm.provider": "anthropic",
    "llm.model": "claude-sonnet",
    "llm.input_tokens": 500,
    "llm.output_tokens": 200,
    "llm.duration_ms": 1500,
    "llm.cost_usd": 0.0035
  }
}

Local Development

Using Ollama

1
2
3
4
5
6
7
8
9
# Start Ollama
ollama serve

# Pull a model
ollama pull llama3

# Configure to use local
export LLM_PROVIDER=ollama
export OLLAMA_BASE_URL=http://localhost:11434

Fake Adapter for Testing

1
2
3
4
5
from llm_provider.fake import FakeLLMProvider

provider = FakeLLMProvider(responses={
    "test-query": "Predetermined response"
})