LLMOps Reference Runbook for SEA™ Forge

This runbook provides operational procedures for managing LLM-based services within SEA™ Forge’s governed architecture.


1. Architecture Overview

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
┌───────────────────────────────────────────────────────────────────────────────┐
│                         SEA™ FORGE LLMOPS ARCHITECTURE                         │
├───────────────────────────────────────────────────────────────────────────────┤
│                                                                               │
│   User Request → Policy Gateway → LLM Service → Policy Gateway → Response    │
│                       ↓                              ↓                        │
│               Input Filtering                 Output Filtering                │
│                       ↓                              ↓                        │
│               Evidence Service ← ─ ─ ─ ─ ─ → Evidence Service                 │
│                                                                               │
│   ┌─────────────┐  ┌─────────────┐  ┌─────────────┐  ┌─────────────────────┐ │
│   │ SEA™ DSL     │  │ Knowledge   │  │ Semantic    │  │   OpenTelemetry     │ │
│   │ Policies    │→ │ Graph       │→ │ Debt Ledger │→ │   Observability     │ │
│   └─────────────┘  └─────────────┘  └─────────────┘  └─────────────────────┘ │
│                                                                               │
└───────────────────────────────────────────────────────────────────────────────┘

2. Deployment Procedures

2.1 Pre-Deployment Checklist

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
# 1. Validate all specifications
just spec-guard
# Expected: 0 errors

# 2. Verify SEA™ DSL policies
just sea-validate docs/specs/<context>/policies.sea
# Expected: Validation passed

# 3. Run full CI
just ci
# Expected: All checks passing

# 4. Verify drift detection
git diff --exit-code
# Expected: No differences

2.2 Deployment Steps

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
# 1. Generate latest code
just pipeline <context>

# 2. Build containers
docker-compose build

# 3. Deploy with governance
docker-compose up -d

# 4. Verify Policy Gateway
curl http://localhost:8080/health
# Expected: {"status": "healthy", "gateway": "active"}

# 5. Verify Evidence Service
curl http://localhost:8083/health
# Expected: {"status": "healthy", "logging": "active"}

3. Monitoring Procedures

3.1 Key Metrics

Metric Source Threshold Action
llm_pass_at_5 OpenTelemetry > 0.8 Monitor
policy_violation_rate Policy Gateway > 0.01 Investigate
harmful_output_rate Policy Gateway > 0 Alert
fairness_subgroup_delta Evidence Service < 0.1 Review
latency_p99 OpenTelemetry < 5s Scale

3.2 Dashboard Access

1
2
3
4
5
# OpenObserve (Metrics + Logs)
open http://localhost:5080/dashboards

# Grafana (if configured)
open http://localhost:3000/d/llmops

3.3 Alert Escalation

Severity Condition Response Time Action
Critical Policy violation (high-risk) Immediate Kill switch
High Output filtering triggered 15 minutes Review & remediate
Medium Drift detected 1 hour Spec update
Low Performance degradation 4 hours Investigate

4. Incident Response

4.1 Policy Violation Detected

1
2
3
4
5
6
7
8
9
10
11
12
13
# 1. Assess severity
curl http://localhost:8083/api/v1/logs \
  -d '{"event_type": "policy_violation", "last": "10"}'

# 2. If critical, activate kill switch
curl -X POST http://localhost:8080/admin/kill-switch

# 3. Gather evidence
curl http://localhost:8083/api/v1/logs \
  -d '{"start_time": "<incident_time>"}' > incident_evidence.json

# 4. Document in Semantic Debt Ledger
# (if policy needs update)

4.2 Harmful Output Detected

1
2
3
4
5
6
7
8
9
10
11
12
13
# 1. Immediate block
# (Automatic via Policy Gateway)

# 2. Review output
curl http://localhost:8083/api/v1/logs \
  -d '{"event_type": "harmful_output"}'

# 3. Update Policy Gateway filters
# Edit policy configuration
vim infra/policy-gateway/config.yaml

# 4. Redeploy
docker-compose restart policy-gateway

4.3 Drift Detected

1
2
3
4
5
6
7
8
9
10
11
12
# 1. Identify drift source
git diff

# 2. If spec change needed
# Update spec first
vim docs/specs/<context>/sds/<file>.md

# 3. Regenerate
just pipeline <context>

# 4. Verify resolution
git diff --exit-code

5. Model Lifecycle Management

5.1 Model Registration

All LLM models must be registered in SDS documentation:

1
2
3
4
5
6
7
8
9
10
11
# In SDS document
models:
  - id: model-001
    name: CustomerServiceLLM
    version: "1.2.0"
    provider: internal
    risk_tier: limited
    governance:
      policy_gateway: enabled
      evidence_logging: enabled
      human_review: optional

5.2 Model Updates

1
2
3
4
5
6
7
8
9
10
11
# 1. Update model version in SDS
vim docs/specs/<context>/sds/<model-sds>.md

# 2. Validate spec
just sds-validate docs/specs/<context>/sds/<model-sds>.md

# 3. Regenerate
just pipeline <context>

# 4. Deploy
docker-compose up -d

5.3 Model Decommissioning

1
2
3
4
5
6
7
8
9
10
# 1. Mark model as deprecated in SDS
# status: deprecated

# 2. Set removal date
# removal_date: 2026-06-01

# 3. Enable sunset monitoring
# Monitor usage until removal

# 4. Remove after transition period

6. Performance Tuning

6.1 Policy Gateway Optimization

1
2
3
4
5
6
# config.yaml optimizations
performance:
  cache_ttl: 300s          # Cache policy decisions
  batch_size: 10           # Batch similar requests
  timeout: 5s              # Request timeout
  max_concurrent: 100      # Concurrent requests

6.2 Evidence Service Optimization

1
2
3
4
5
6
# config.yaml optimizations
storage:
  buffer_size: 1000        # Buffer before flush
  flush_interval: 5s       # Regular flush
  compression: true        # Compress logs
  retention_days: 2555     # 7 years

7. Backup and Recovery

7.1 Evidence Service Backup

1
2
3
4
5
6
# Daily backup
curl -X POST http://localhost:8083/admin/backup \
  -d '{"destination": "s3://backup-bucket/evidence/"}'

# Verify backup
curl http://localhost:8083/admin/backup/status

7.2 Policy Configuration Backup

1
2
3
4
5
# All policies are in Git
git push origin main

# Verify
git log --oneline -5

7.3 Recovery Procedures

1
2
3
4
5
6
7
8
9
10
11
12
# 1. Restore Evidence Service
curl -X POST http://localhost:8083/admin/restore \
  -d '{"source": "s3://backup-bucket/evidence/YYYY-MM-DD/"}'

# 2. Verify policies
just spec-guard

# 3. Regenerate code
just pipeline <context>

# 4. Redeploy
docker-compose up -d

8. Security Procedures

8.1 Key Rotation

1
2
3
4
5
# Rotate Evidence Service keys
curl -X POST http://localhost:8083/admin/rotate-keys

# Verify new keys active
curl http://localhost:8083/health

8.2 Access Review

1
2
3
4
5
6
7
8
# Monthly access review
# 1. Export access logs
curl http://localhost:8083/api/v1/logs \
  -d '{"event_type": "access"}' > access_review.json

# 2. Review with security team

# 3. Remove stale access

Last Updated: January 2026 Version: 1.0.0