Created: 2026-01-23 Status: Draft (pending review) Dependencies: P0.2 Audit Trail Persistence, P3.1 Provenance Tracking System, P3.2 Automatic Drift Remediation Source: Last-Mile Plan P3.3 (docs/workdocs/last-mile-plan.md)
Deliver a production-quality runtime behavior correlation system that links OTLP traces, logs, and metrics to the spec truth chain (ADR/PRD/SDS/SEA + manifests), detects behavioral drift, and surfaces actionable insights in Workbench UI and CI. The system must be spec-first, privacy-aware, and high-performance, with zero gaps or technical debt.
flowchart TD
subgraph Ingest[Telemetry Ingest]
OTLP[OTLP Receiver] --> NORM[Behavior Normalizer]
OTLP --> OTLP_ERR[OTLP Parse Error / DLQ]
NORM --> NORM_ERR[Normalization Error / DLQ]
end
subgraph Correlate[Correlation]
NORM --> CORR[Correlation Engine]
CORR --> CLASS[Drift Classifier]
CORR --> NOMATCH[No-match / Correlation Failure]
end
subgraph Store[Persistence]
CLASS --> KG[KG Writer]
CLASS --> PG[Postgres Summary Index]
KG --> KG_RETRY[Storage Failure / Retry]
PG --> PG_RETRY[Storage Failure / Retry]
KG_RETRY --> DLQ[Dead-letter Queue]
PG_RETRY --> DLQ
end
subgraph Surfaces[Surfaces]
PG --> UI[Workbench Runtime Correlation]
PG --> CI[CI Drift Gate]
KG --> UI
end
Do not patch generated code. If behavior is missing, update specs → generators → regenerate.
/behavior/summary: P95 < 500ms/behavior/node/{node_id}: P95 < 2s with max 1000 evidence itemssea.* attributesNew module in services/workbench-bff/src/adapters/:
behavior_normalizer.py
sea.domain, sea.concept, sea.regime_id, sea.flow)sea.domain, sea.concept, sea.regime_idsea.flow, sea.policyBehaviorEvidence with correlation_status="failed" + error_reasoncorrelation_status="partial" with null placeholdersBehaviorEvidence records and logs validation failures with trace/span contextInput: OTLP via OTel Collector export
Output: Normalized BehaviorEvidence records
New module behavior_correlator.py:
sea.flow + sea.context → SEA flow nodesea.policy → SDS policy nodeCorrelationResult with confidence score (see algorithm below)Confidence Scoring Algorithm (SDS-0XX):
Classification mapping:
HIGH/MEDIUM drift analysis only when confidence ≥ 0.70LOW when 0.30 ≤ confidence < 0.70NONE when confidence < 0.30New module behavior_drift_classifier.py:
NONE, LOW, MEDIUM, HIGHNew adapters:
behavior_kg_writer.py — writes evidence and correlation edges to KGbehavior_indexer.py — stores summary rows in PostgresData Model
SpecNode --observed_behavior--> EvidenceNodebehavior_correlation_summary
context, spec_node_id, drift_score, confidence, last_seen_at, evidence_refStorage Consistency Model
kg_version + last_synced_at.last_synced_at exceeds threshold.New services/workbench-bff/src/api/behavior_routes.py
| Method | Path | Description |
|---|---|---|
| GET | /behavior/summary |
Summary by context/node with pagination |
| GET | /behavior/node/{node_id} |
Evidence + correlation details (max 1000 evidence items) |
| POST | /behavior/scan |
Trigger on-demand scan for context or node |
| GET | /behavior/trends |
Drift trends with time window filters |
Constraints
API Details
GET /behavior/summary: limit (default 50, max 200), offset (default 0)GET /behavior/node/{node_id}: limit (default 100, max 1000), cursor optionalGET /behavior/trends: from, to, interval (default 1d), context optionalPOST /behavior/scan body:
contextId (optional)nodeId (optional)requesterId (required)mode (default summary)maxEvidence (default 1000){ scanId, accepted, estimatedCompletionMs }New UI:
RuntimeCorrelationDashboard.tsx (summary, trends)BehaviorDriftCard.tsx (per node)ProvenanceExplorer (badges + drilldown)UX Enhancements:
New script:
scripts/ci/behavior_drift_gate.shModes:
--warn (default)--fail (fail if high drift)Drift Severity Thresholds
LOW: drift_score ≥ 0.20MEDIUM: drift_score ≥ 0.50HIGH: drift_score > 0.80Gate Rules
--warn: report all drifts ≥ LOW--fail: exit code 1 if any drift ≥ HIGH OR MEDIUM count > 5 (default, configurable)Integration:
.github/workflows/ci.ymlAdd to services/workbench-bff/src/models.py:
BehaviorEvidenceModel
1
2
3
4
5
6
7
8
9
10
11
12
class BehaviorEvidenceModel(BaseModel):
evidence_id: str = Field(..., min_length=8)
trace_id: str = Field(..., min_length=16)
span_id: str | None = Field(default=None)
timestamp: datetime
context: str
flow: str | None = None
policy_id: str | None = None
drift_score: float = Field(..., ge=0.0, le=1.0)
confidence: float = Field(..., ge=0.0, le=1.0)
correlation_status: Literal[\"ok\", \"partial\", \"failed\"]
error_reason: str | None = None
CorrelationResultModel
1
2
3
4
5
6
class CorrelationResultModel(BaseModel):
spec_node_id: str | None = None
evidence_id: str
confidence: float = Field(..., ge=0.0, le=1.0)
match_type: Literal[\"deterministic\", \"heuristic\", \"none\"]
rule_ids: list[str] = Field(default_factory=list)
BehaviorDriftSummaryModel
1
2
3
4
5
6
7
8
class BehaviorDriftSummaryModel(BaseModel):
context: str
spec_node_id: str
drift_score: float = Field(..., ge=0.0, le=1.0)
drift_level: Literal[\"none\", \"low\", \"medium\", \"high\"]
confidence: float = Field(..., ge=0.0, le=1.0)
last_seen_at: datetime
evidence_count: int = Field(..., ge=0)
Add to apps/workbench/src/types/behavior.ts:
BehaviorEvidence
1
2
3
4
5
6
7
8
9
10
11
12
13
export interface BehaviorEvidence {
evidenceId: string;
traceId: string;
spanId?: string | null;
timestamp: string;
context: string;
flow?: string | null;
policyId?: string | null;
driftScore: number;
confidence: number;
correlationStatus: 'ok' | 'partial' | 'failed';
errorReason?: string | null;
}
CorrelationResult
1
2
3
4
5
6
7
export interface CorrelationResult {
specNodeId?: string | null;
evidenceId: string;
confidence: number;
matchType: 'deterministic' | 'heuristic' | 'none';
ruleIds: string[];
}
BehaviorDriftSummary
1
2
3
4
5
6
7
8
9
export interface BehaviorDriftSummary {
context: string;
specNodeId: string;
driftScore: number;
driftLevel: 'none' | 'low' | 'medium' | 'high';
confidence: number;
lastSeenAt: string;
evidenceCount: number;
}
ifl:hash for evidence identity (SDS-050)tests/test_behavior_normalizer.pytests/test_behavior_correlator.pytests/test_behavior_drift_classifier.pytests/test_behavior_routes.pyRuntimeCorrelationDashboard rendering./scripts/ci/behavior_drift_gate.sh --warn/behavior/summary and /behavior/node/{node_id} P95 latency targetsjust spec-guard (PASSED: 1421 checks, 0 errors)Branch: cycle/p3.3-c1a-behavior-models
services/workbench-bff/src/models.py:
BehaviorEvidenceModelCorrelationResultModelBehaviorDriftSummaryModelBehaviorScanRequestModelBehaviorScanResponseModelservices/workbench-bff/src/adapters/behavior_normalizer.py:
sea.domain, sea.concept, sea.regime_id, sea.flow)tests/test_behavior_normalizer.pypytest tests/test_behavior_normalizer.py passesBranch: cycle/p3.3-c1b-behavior-correlator
services/workbench-bff/src/adapters/behavior_correlator.py:
sea.flow + sea.context → SEA flow node)CorrelationResult generationservices/workbench-bff/src/adapters/behavior_drift_classifier.py:
NONE, LOW, MEDIUM, HIGHtests/test_behavior_correlator.pytests/test_behavior_drift_classifier.pyBranch: cycle/p3.3-c1c-behavior-storage
services/workbench-bff/src/adapters/behavior_kg_writer.py:
SpecNode --observed_behavior--> EvidenceNode)services/workbench-bff/src/adapters/behavior_indexer.py:
behavior_correlation_summary table operationskg_version + last_synced_at)alembic revision --autogenerate -m "add behavior_correlation_summary"Branch: cycle/p3.3-c2a-behavior-api
services/workbench-bff/src/api/behavior_routes.py:
GET /behavior/summaries — summary by context/node with cursor + offset paginationGET /behavior/nodes/{node_id} — evidence + correlation details with OpenObserve linksPOST /behavior/scan — on-demand scan with rate limiting (10/min via slowapi) + WebSocket URLGET /behavior/trends — drift trends with time window filtersWS /behavior/scan/{scan_id}/progress — WebSocket progress streamingBranch: cycle/p3.3-c2b-behavior-ui
apps/workbench/src/types/behavior.ts:
BehaviorEvidenceCorrelationResultBehaviorDriftSummaryapps/workbench/src/pages/RuntimeCorrelationDashboard.tsx:
apps/workbench/src/components/BehaviorDriftCard.tsx:
apps/workbench/src/components/CardErrorBoundary.tsx — per-card error isolationapps/workbench/src/lib/behavior-api.ts:
apiFetch() — safe fetch with timeout, signal override prevention, error wrappingfetchRuntimeCorrelationSummaries() — fetch behavior summaries with proper error handlinggetBehaviorNodeDetails() — fetch detailed behavior for a spec nodegetBehaviorTrends() — fetch drift trends with time window filtersbuildOpenObserveTraceLink(), buildOpenObserveLogLink(), buildOpenObserveMetricLink() — safe deep link builders with try/catch and null returnsProvenanceExplorer.tsx:
apps/workbench/src/components/__tests__/CardErrorBoundary.test.tsx (1 test, passing)apps/workbench/src/pages/__tests__/RuntimeCorrelationDashboard.test.tsx (13 tests, all passing)apps/workbench/src/pages/__tests__/DriftDashboard.test.tsx (6 tests, all passing)pnpm exec nx lint workbench)pnpm exec nx test workbench passes (20/20 tests), pnpm exec nx lint workbench passes (0 errors)Branch: cycle/p3.3-c2c-behavior-ci
scripts/ci/behavior_drift_gate.sh:
--warn mode (report all drifts ≥ LOW)--fail mode (exit 1 if HIGH drift OR MEDIUM count > 5).github/workflows/ci.ymlscripts/tests/fixtures/behavior_drift_fixtures.jsontriggeredRules array with realistic rule IDs (AUTH-001, PERF-002, etc.)Debt Fixed:
--fail mode with 0 HIGH, 0 MEDIUM, 4 LOW driftsBranch: cycle/p3.3-c3a-openobserve
docs/runbooks/behavior-correlation-openobserve.mdBranch: cycle/p3.3-c3b-e2e
/behavior/summary: P95 < 500ms/behavior/node/{node_id}: P95 < 2sjust spec-guard| Cycle | Scope | Duration |
|---|---|---|
| Prerequisites | Spec validation + generator alignment | 1 day |
| C1A | Core Models + Normalizer | 1 day |
| C1B | Correlator + Classifier | 1–2 days |
| C1C | Storage Adapters | 1–2 days |
| C2A | API Routes | 1 day |
| C2B | Workbench UI | 2 days |
| C2C | CI Drift Gate | 0.5 day |
| C3A | OpenObserve Integration | 1 day |
| C3B | E2E Validation | 2–3 days |
| Review gates | Security, perf, stakeholder sign-off | 2–3 days |
| Contingency buffer (20%) | 2–3 days | |
| Total | 15–19 days |