Provider abstraction for language models.
The LLM Provider Service (SDS-049) abstracts away specific model providers, enabling:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Agent Requesting LLM Call β
ββββββββββββββββ¬βββββββββββββββββββββββββββββββββββββββββββββββ
β
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Policy Gateway β
β β’ Rate limiting β
β β’ Content filtering β
β β’ Cost tracking β
ββββββββββββββββ¬βββββββββββββββββββββββββββββββββββββββββββββββ
β
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β LiteLLM Abstraction β
β β’ Unified interface β
β β’ Provider routing β
β β’ Automatic retry β
ββββββββββββββββ¬βββββββββββββββββββββββββββββββββββββββββββββββ
β
βββββββββββΌββββββββββ¬ββββββββββ
βΌ βΌ βΌ βΌ
ββββββββββββββββββββββββββββββββββββββββββββ
β OpenAI ββAnthropicββ Ollama ββ Azure β
ββββββββββββββββββββββββββββββββββββββββββββ
1
2
3
4
5
6
7
8
9
10
11
# Primary provider
export LLM_PROVIDER=anthropic
export ANTHROPIC_API_KEY=sk-ant-...
# Fallback providers
export OPENAI_API_KEY=sk-...
export OLLAMA_BASE_URL=http://localhost:11434
# Policy settings
export LLM_MAX_TOKENS_PER_MINUTE=100000
export LLM_COST_LIMIT_DAILY=50.00
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
# llm-config.yaml
providers:
- name: "anthropic"
priority: 1
models:
- "claude-sonnet"
- "claude-haiku"
- name: "openai"
priority: 2
models:
- "gpt-4o"
fallback: true
- name: "ollama"
priority: 3
models:
- "llama3"
localOnly: true
routing:
strategy: "cost-optimized" # or "latency-optimized", "quality-optimized"
fallbackEnabled: true
retries: 3
1
2
3
4
5
6
7
8
9
10
from llm_provider import LLMProvider
provider = LLMProvider()
response = await provider.complete(
model="claude-sonnet",
messages=[
{"role": "user", "content": "Analyze this document..."}
],
max_tokens=1000
)
1
2
3
4
# Agent uses model from config
agentId: "semantic-mapper"
model: "claude-sonnet" # Resolved via LLM Provider
maxOutputTokens: 500
1
2
3
4
5
policies:
rateLimit:
tokensPerMinute: 100000
requestsPerMinute: 100
onLimit: "queue" # or "reject"
1
2
3
4
5
6
7
8
policies:
contentFilter:
enabled: true
rules:
- type: "pii-detection"
action: "redact"
- type: "toxicity"
action: "block"
1
2
3
4
5
policies:
costControl:
dailyLimit: 50.00
warningThreshold: 0.8
onLimit: "alert-and-block"
LLM calls are traced via OpenTelemetry:
1
2
3
4
5
6
7
8
9
10
11
12
{
"traceId": "abc123",
"spanName": "llm.complete",
"attributes": {
"llm.provider": "anthropic",
"llm.model": "claude-sonnet",
"llm.input_tokens": 500,
"llm.output_tokens": 200,
"llm.duration_ms": 1500,
"llm.cost_usd": 0.0035
}
}
1
2
3
4
5
6
7
8
9
# Start Ollama
ollama serve
# Pull a model
ollama pull llama3
# Configure to use local
export LLM_PROVIDER=ollama
export OLLAMA_BASE_URL=http://localhost:11434
1
2
3
4
5
from llm_provider.fake import FakeLLMProvider
provider = FakeLLMProvider(responses={
"test-query": "Predetermined response"
})