The LLM Provider Service (SDS-049) abstracts away specific model providers, enabling:

Architecture

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
┌─────────────────────────────────────────────────────────────┐
│  Agent Requesting LLM Call                                   │
└──────────────┬──────────────────────────────────────────────┘
               │
               ▼
┌─────────────────────────────────────────────────────────────┐
│  Policy Gateway                                              │
│  • Rate limiting                                             │
│  • Content filtering                                         │
│  • Cost tracking                                             │
└──────────────┬──────────────────────────────────────────────┘
               │
               ▼
┌─────────────────────────────────────────────────────────────┐
│  LiteLLM Abstraction                                         │
│  • Unified interface                                         │
│  • Provider routing                                          │
│  • Automatic retry                                           │
└──────────────┬──────────────────────────────────────────────┘
               │
     ┌─────────┼─────────┬─────────┐
     ▼         ▼         ▼         ▼
┌─────────┐┌─────────┐┌─────────┐┌─────────┐
│ OpenAI  ││Anthropic││ Ollama  ││ Azure   │
└─────────┘└─────────┘└─────────┘└─────────┘

Configuration

Environment Variables

1
2
3
4
5
6
7
8
9
10
11
# Primary provider
export LLM_PROVIDER=anthropic
export ANTHROPIC_API_KEY=sk-ant-...

# Fallback providers
export OPENAI_API_KEY=sk-...
export OLLAMA_BASE_URL=http://localhost:11434

# Policy settings
export LLM_MAX_TOKENS_PER_MINUTE=100000
export LLM_COST_LIMIT_DAILY=50.00

Provider Routing

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
# llm-config.yaml
providers:
  - name: "anthropic"
    priority: 1
    models:
      - "claude-sonnet"
      - "claude-haiku"
    
  - name: "openai"
    priority: 2
    models:
      - "gpt-4o"
    fallback: true
    
  - name: "ollama"
    priority: 3
    models:
      - "llama3"
    localOnly: true

routing:
  strategy: "cost-optimized"  # or "latency-optimized", "quality-optimized"
  fallbackEnabled: true
  retries: 3

Usage

Direct Call

1
2
3
4
5
6
7
8
9
10
from llm_provider import LLMProvider

provider = LLMProvider()
response = await provider.complete(
    model="claude-sonnet",
    messages=[
        {"role": "user", "content": "Analyze this document..."}
    ],
    max_tokens=1000
)

With Agent Configuration

1
2
3
4
# Agent uses model from config
agentId: "semantic-mapper"
model: "claude-sonnet"  # Resolved via LLM Provider
maxOutputTokens: 500

Policy Gateway

Rate Limiting

1
2
3
4
5
policies:
  rateLimit:
    tokensPerMinute: 100000
    requestsPerMinute: 100
    onLimit: "queue"  # or "reject"

Content Filtering

1
2
3
4
5
6
7
8
policies:
  contentFilter:
    enabled: true
    rules:
      - type: "pii-detection"
        action: "redact"
      - type: "toxicity"
        action: "block"

Cost Control

1
2
3
4
5
policies:
  costControl:
    dailyLimit: 50.00
    warningThreshold: 0.8
    onLimit: "alert-and-block"

Observability

1
2
3
4
5
6
7
8
9
10
11
12
{
  "traceId": "abc123",
  "spanName": "llm.complete",
  "attributes": {
    "llm.provider": "anthropic",
    "llm.model": "claude-sonnet",
    "llm.input_tokens": 500,
    "llm.output_tokens": 200,
    "llm.duration_ms": 1500,
    "llm.cost_usd": 0.0035
  }
}

Local Development

Using Ollama

1
2
3
4
5
6
7
8
9
# Start Ollama
ollama serve

# Pull a model
ollama pull llama3

# Configure to use local
export LLM_PROVIDER=ollama
export OLLAMA_BASE_URL=http://localhost:11434

Fake Adapter for Testing

1
2
3
4
5
from llm_provider.fake import FakeLLMProvider

provider = FakeLLMProvider(responses={
    "test-query": "Predetermined response"
})

🔗 LLM Integration

Overview

Architecture

Configuration

Environment Variables

Provider Routing

Usage

Direct Call

With Agent Configuration

Policy Gateway

Rate Limiting

Content Filtering

Cost Control

Observability

Local Development

Using Ollama

Fake Adapter for Testing

🔗 LLM Integration

Overview

Architecture

Configuration

Environment Variables

Provider Routing

Usage

Direct Call

With Agent Configuration

Policy Gateway

Rate Limiting

Content Filtering

Cost Control

Observability

Local Development

Using Ollama

Fake Adapter for Testing

Related Specs