P3.3: Runtime Behavior Correlation — Implementation Plan

Created: 2026-01-23 Status: Draft (pending review) Dependencies: P0.2 Audit Trail Persistence, P3.1 Provenance Tracking System, P3.2 Automatic Drift Remediation Source: Last-Mile Plan P3.3 (docs/workdocs/last-mile-plan.md)


Goal

Deliver a production-quality runtime behavior correlation system that links OTLP traces, logs, and metrics to the spec truth chain (ADR/PRD/SDS/SEA + manifests), detects behavioral drift, and surfaces actionable insights in Workbench UI and CI. The system must be spec-first, privacy-aware, and high-performance, with zero gaps or technical debt.


User Review Required

Design Decisions

  1. Tri-signal correlation (traces + logs + metrics) is the default, per ADR-029 (observability stack) and SDS-030 (semantic observability envelope).
  2. Spec truth chain is ADR/PRD/SDS/SEA + manifests (full traceability), not generated code.
  3. Drift signal policy is balanced with explicit thresholds: alert when confidence ≥ 0.70 (MEDIUM/HIGH), summarize when 0.30 ≤ confidence < 0.70 (LOW), suppress when confidence < 0.30 (NONE). See SDS-0XX for the confidence scoring algorithm.
  4. Storage: Knowledge Graph is authoritative; Postgres stores summary/index for UI/CI queries with explicit consistency rules (see “Storage Consistency Model” below).
  5. Surfaces: Workbench UI, CI, and OpenObserve are first-class in v1; Slack deferred to v2.

Architecture Overview

flowchart TD
    subgraph Ingest[Telemetry Ingest]
        OTLP[OTLP Receiver] --> NORM[Behavior Normalizer]
        OTLP --> OTLP_ERR[OTLP Parse Error / DLQ]
        NORM --> NORM_ERR[Normalization Error / DLQ]
    end

    subgraph Correlate[Correlation]
        NORM --> CORR[Correlation Engine]
        CORR --> CLASS[Drift Classifier]
        CORR --> NOMATCH[No-match / Correlation Failure]
    end

    subgraph Store[Persistence]
        CLASS --> KG[KG Writer]
        CLASS --> PG[Postgres Summary Index]
        KG --> KG_RETRY[Storage Failure / Retry]
        PG --> PG_RETRY[Storage Failure / Retry]
        KG_RETRY --> DLQ[Dead-letter Queue]
        PG_RETRY --> DLQ
    end

    subgraph Surfaces[Surfaces]
        PG --> UI[Workbench Runtime Correlation]
        PG --> CI[CI Drift Gate]
        KG --> UI
    end

Spec Alignment (Must Use)

Do not patch generated code. If behavior is missing, update specs → generators → regenerate.


Functional Scope

Core Capabilities

  1. OTLP ingest for traces, logs, metrics (via OTel Collector pipeline)
  2. Behavior evidence normalization into spec-aligned envelope
  3. Correlation engine mapping evidence to provenance nodes
  4. Behavioral drift classification (semantic vs benign)
  5. KG persistence of evidence + correlation edges
  6. Postgres summary index for fast queries
  7. Workbench UI: Runtime Correlation dashboard + provenance integration
  8. CI drift gate: warn/fail thresholds for behavioral drift

Non-Functional Requirements

Easy-Lift Enhancements (High Impact)


Proposed Components

1) Telemetry Ingest & Normalizer (Python)

New module in services/workbench-bff/src/adapters/:

Input: OTLP via OTel Collector export Output: Normalized BehaviorEvidence records


2) Correlation Engine (Python)

New module behavior_correlator.py:

Confidence Scoring Algorithm (SDS-0XX):

Classification mapping:


3) Drift Classifier (Python)

New module behavior_drift_classifier.py:


4) KG Writer + Postgres Indexer

New adapters:

Data Model

Storage Consistency Model


5) API Routes (Workbench BFF)

New services/workbench-bff/src/api/behavior_routes.py

Method Path Description
GET /behavior/summary Summary by context/node with pagination
GET /behavior/node/{node_id} Evidence + correlation details (max 1000 evidence items)
POST /behavior/scan Trigger on-demand scan for context or node
GET /behavior/trends Drift trends with time window filters

Constraints

API Details


6) Workbench UI (React)

New UI:

UX Enhancements:


7) CI Drift Gate

New script:

Modes:

Drift Severity Thresholds

Gate Rules

Integration:


Data Models

Pydantic Models (BFF)

Add to services/workbench-bff/src/models.py:

BehaviorEvidenceModel

1
2
3
4
5
6
7
8
9
10
11
12
class BehaviorEvidenceModel(BaseModel):
    evidence_id: str = Field(..., min_length=8)
    trace_id: str = Field(..., min_length=16)
    span_id: str | None = Field(default=None)
    timestamp: datetime
    context: str
    flow: str | None = None
    policy_id: str | None = None
    drift_score: float = Field(..., ge=0.0, le=1.0)
    confidence: float = Field(..., ge=0.0, le=1.0)
    correlation_status: Literal[\"ok\", \"partial\", \"failed\"]
    error_reason: str | None = None

CorrelationResultModel

1
2
3
4
5
6
class CorrelationResultModel(BaseModel):
    spec_node_id: str | None = None
    evidence_id: str
    confidence: float = Field(..., ge=0.0, le=1.0)
    match_type: Literal[\"deterministic\", \"heuristic\", \"none\"]
    rule_ids: list[str] = Field(default_factory=list)

BehaviorDriftSummaryModel

1
2
3
4
5
6
7
8
class BehaviorDriftSummaryModel(BaseModel):
    context: str
    spec_node_id: str
    drift_score: float = Field(..., ge=0.0, le=1.0)
    drift_level: Literal[\"none\", \"low\", \"medium\", \"high\"]
    confidence: float = Field(..., ge=0.0, le=1.0)
    last_seen_at: datetime
    evidence_count: int = Field(..., ge=0)

TypeScript Models (Workbench)

Add to apps/workbench/src/types/behavior.ts:

BehaviorEvidence

1
2
3
4
5
6
7
8
9
10
11
12
13
export interface BehaviorEvidence {
  evidenceId: string;
  traceId: string;
  spanId?: string | null;
  timestamp: string;
  context: string;
  flow?: string | null;
  policyId?: string | null;
  driftScore: number;
  confidence: number;
  correlationStatus: 'ok' | 'partial' | 'failed';
  errorReason?: string | null;
}

CorrelationResult

1
2
3
4
5
6
7
export interface CorrelationResult {
  specNodeId?: string | null;
  evidenceId: string;
  confidence: number;
  matchType: 'deterministic' | 'heuristic' | 'none';
  ruleIds: string[];
}

BehaviorDriftSummary

1
2
3
4
5
6
7
8
9
export interface BehaviorDriftSummary {
  context: string;
  specNodeId: string;
  driftScore: number;
  driftLevel: 'none' | 'low' | 'medium' | 'high';
  confidence: number;
  lastSeenAt: string;
  evidenceCount: number;
}

Security / Privacy / Governance


Testing Plan

Python Unit Tests

Integration Tests

UI Tests

CI Validation

Performance Testing

Load / Stress Testing

Security Testing

Chaos / Failure Testing

Coverage Targets


Prerequisites (Spec-First Gate)


TDD Cycle Plan

Wave 1: Core Backend

Cycle C1A: Core Models + Normalizer

Branch: cycle/p3.3-c1a-behavior-models

Cycle C1B: Correlator + Classifier

Branch: cycle/p3.3-c1b-behavior-correlator

Cycle C1C: Storage Adapters

Branch: cycle/p3.3-c1c-behavior-storage


Wave 2: API + Surfaces

Cycle C2A: API Routes

Branch: cycle/p3.3-c2a-behavior-api

Cycle C2B: Workbench UI

Branch: cycle/p3.3-c2b-behavior-ui


Cycle C2C: CI Drift Gate

Branch: cycle/p3.3-c2c-behavior-ci

Debt Fixed:


Wave 3: Integration + Observability

Cycle C3A: OpenObserve Integration

Branch: cycle/p3.3-c3a-openobserve

Cycle C3B: End-to-End Validation

Branch: cycle/p3.3-c3b-e2e


Verification Checklist


Out of Scope


Timeline Estimate

Cycle Scope Duration
Prerequisites Spec validation + generator alignment 1 day
C1A Core Models + Normalizer 1 day
C1B Correlator + Classifier 1–2 days
C1C Storage Adapters 1–2 days
C2A API Routes 1 day
C2B Workbench UI 2 days
C2C CI Drift Gate 0.5 day
C3A OpenObserve Integration 1 day
C3B E2E Validation 2–3 days
Review gates Security, perf, stakeholder sign-off 2–3 days
Contingency buffer (20%)   2–3 days
Total   15–19 days

References