SDS-030: Semantic Observability Envelope

Type

Software Design Specification - Core Platform Thruway

Status

Draft

Purpose

Establishes the Semantic Observability Envelope for SEA-Forge™. This specification defines how semantic concepts, events, and execution contexts are mapped into standard observability signals (Traces, Logs, Metrics) while strictly enforcing privacy, cardinality limits, and anti-leakage constraints.

It bridges the gap between Semantic Concepts (what the business means) and Telemetry (what the system observes).

1. The Semantic Envelope

The “Envelope” is a standardized wrapper for all observability data emitted by the SEA™ Runtime.

1.1. Structure

{
  "traceId": "12345...",         // W3C Trace ID
  "spanId": "abcde...",          // W3C Span ID
  "timestamp": "2025-10-27T...", // ISO-8601
  "semanticContext": {
    "domain": "Sales",           // High-level domain
    "concept": "OrderPlacement", // The semantic concept active
    "regimeId": "regime:v1"      // Invariant Regime ID (from SDS-019)
  },
  "provenance": {
    "executor": "agent:shipping-optimizer-v2",
    "trigger": "event:order-received"
  },
  "payloadMode": "full_fidelity", // or "aggregated"
  "data": { ... }                 // Signal-specific payload
}

2. Signal Mapping Rules

We rigorously partition data into Metrics, Logs, and Traces to balance cost, performance, and utility.

2.1. Metrics (Aggregatable Indicators)

Purpose: Health, Trends, SLAs. Constraint: Strict High Cardinality Blocking.

Field	Inclusion	Rule
`domain`	✅ Yes	Low cardinality grouping.
`concept`	✅ Yes	Key for “Business Activity Monitoring”.
`entityId`	❌ NO	Cardinality explosion risk.
`userId`	❌ NO	Privacy + Cardinality risk.
`errorType`	✅ Yes	Aggregate semantic error types (e.g., “PolicyViolation”).

Cardinality Guardrail: If concept cardinality > 1000 per hour, dynamic aggregation kicks in -> concept="other".

2.2. Traces (Causal Flows)

Purpose: Latency analysis, dependency graphing, root cause. Constraint: Flow Fidelity.

Field	Inclusion	Rule
`entityId`	✅ Yes	Essential for correlating specific flows.
`decisionTree`	⚠️ Hash	If decision logic exposes strategy, use hash/ID only.
`causality`	✅ Yes	`sea-caused-by` links are primary.

2.3. Logs (Discrete Events)

Purpose: Detailed debugging, audit, replay. Constraint: Volume Sampling & Privacy.

Field	Inclusion	Rule
`payload`	✅ Yes	Full request/response (subject to PII scrubbing).
`stackTrace`	✅ Yes	Only on error.
`stateSnapshot`	⚠️ Mode Dependent	See “Payload Modes”.

2.4. Governance Runtime Metrics

Purpose: Operational visibility into policy enforcement reliability. Constraint: Low-cardinality attributes only.

Metric Name	Type	Attributes	Purpose
`policy_gateway.circuit_breaker.state_change`	Event (Log)	`state`, `failure_count`, `timestamp`	Tracks circuit breaker transitions

Emission Rule: Emit as an OpenTelemetry span event or log record on state transitions only (open, half_open, closed).

3. Privacy & Sensitive Provenance

Provenance data (who did what based on reason) is highly sensitive in a semantic system as it reveals intent.

3.1. PII Constraints

Direct PII: (Email, Phone) -> REDACT or Zero-Knowledge Token.
Indirect PII: (User Behavior Pattern) -> Differential Privacy noise added to Metrics.

3.2. Sensitive Provenance Guard

Problem: Tracing a specific Agent’s reasoning path might reveal proprietary trading strategies or confidential business logic.
Rule: Mark sensitive semantic concepts with meta:sensitivity="high".
Enforcement:
- Low Sensitivity: Log full reasoning chain: Input -> Rule A -> Rule B -> Output.
- High Sensitivity: Log opaque checkpoint: Input -> [PolicyHash:xyz] -> Output.

4. Payload Modes

The envelope supports two modes aligning with SDS-016 (Debt) and SDS-003 (KG).

4.1. Mode: `full_fidelity`

Use Case: Audit Logs, Semantic Debt Ledger, Debugging (Sampled).
Content: Complete snapshot of Entity state before and after operation.
Storage: Hot/Warm Object Storage (S3/Blob).
Retention: Short (7 days) unless flagged for Audit (7 years).

4.2. Mode: `aggregated`

Use Case: Knowledge Graph Projection, Metric Streams.
Content: Summary statistics only (e.g., “Order Value > $10k”, “Risk Score: High”).
Mechanism: The Translation Loss vector (from SDS-019) is explicitly logged here to indicate what detail was dropped.

5. Anti-Leakage Rules (Side-Channel Prevention)

Prevent observers from reconstructing sensitive behavior/data through telemetry patterns.

5.1. Timing Attacks

Risk: Observer infers UserType=VIP because processing time is consistently 50ms faster (different code path).
Mitigation: Bucketed Granularity for high-sensitivity contexts. Report duration as ms rounded to nearest 10ms or 100ms interval.

5.2. Provenance Query Leaks

Risk: Querying “Show me all actions by Agent X” reveals a stealth feature launch pattern.
Mitigation:
- Read-Time Access Control: Observability query engines must enforce SEA™ RBAC.
- Pattern Masking: If a query matches < 5 entities (Micro-segment), return “Insufficient Data” (k-anonymity).

6. Implementation Guidelines

6.1. The Observability Agent

A sidecar/middleware that:

Intercepts SEA™ Events.
Checks meta:sensitivity.
Applies Scrubbing (PII).
Applies Hashing (Provenance).
Routes to OpenTelemetry collectors (Traces/Metrics) and Audit Log (Ledger).

6.2. Configuration Example

observability:
  defaults:
    mode: "aggregated"
    sampling: 0.1 # 10%
  overrides:
    - context: "Finance.HighValueParams"
      mode: "full_fidelity"
      sampling: 1.0
      retention: "audit_7yr"
      sensitivity: "high" # Trigger hashing

7. Correlation ID Propagation

To ensure full causality tracking across distributed boundaries (including non-HTTP transports like NATS), we enforce strict propagation rules.

7.1. Standard Headers

Header	Purpose	Format	OTel Mapping
`traceparent`	W3C Trace Context	`00-{traceId}-{spanId}-{flags}`	Automatically handled by OTel Propagators
`tracestate`	Vendor/System specific state	Key-value pairs	Automatically handled by OTel Propagators
`sea-correlation-id`	Business Transaction ID	UUID v7	`attributes.sea_correlation_id`
`sea-causation-id`	Direct Causal Parent	UUID v7	`attributes.sea_causation_id`

7.2. Propagation Rules (INV-OBS-05)

Inbound Request:
- If sea-correlation-id is present, MUST use it as the Trace ID or link to it.
- If missing, MUST generate a new UUID v7 and set as sea-correlation-id.
Domain Events:
- Every event emitted MUST include:
  - correlation_id (from current context)
  - causation_id (ID of the command/event that triggered this)
- These MUST map to OTel Span attributes when the event is processed.
Background Jobs:
- Jobs spawned from a user request MUST inherit the sea-correlation-id.

8. Pattern Weighting Telemetry (Plan P010)

Added 2025-12-30: Defines the governance and performance signals required by MetricIndexer (SDS-015) to compute pattern success rates for PatternOracle recommendations.

8.1. Pattern Execution Metrics

These metrics are emitted by the runtime and ingested by MetricIndexer to compute pattern success_rate and combined_score:

Metric Name	Type	Labels	Purpose	SDS Reference
`sea_pattern_invocation_total`	Counter	`pattern_id`, `domain`, `outcome`	Total pattern uses by outcome (success/failure/timeout)	SDS-015 IndexTelemetry
`sea_pattern_latency_seconds`	Histogram	`pattern_id`, `domain`	Execution latency distribution per pattern	SDS-015 RefreshStats
`sea_pattern_governance_score`	Gauge	`pattern_id`, `domain`, `policy_id`	Governance compliance score (0-1) from policy evaluations	SDS-047 GovernedSpeed™ Governance Runtime
`sea_pattern_drift_psi`	Gauge	`pattern_id`, `domain`	Population Stability Index detecting pattern drift	SDS-016 Drift Detection

8.2. Combined Score Formula

The PatternOracle computes combined_score from these signals:

combined_score = (
  weight_similarity * semantic_similarity +
  weight_performance * success_rate +
  weight_governance * governance_score
)

where:
  semantic_similarity = cosine_similarity(query_embedding, pattern_embedding)
  success_rate = success_count / total_count (from sea_pattern_invocation_total)
  governance_score = avg(sea_pattern_governance_score) per pattern_id

Default weights (configurable):

weight_similarity: 0.4
weight_performance: 0.35
weight_governance: 0.25

8.3. Telemetry Envelope for Pattern Metrics

Pattern-related telemetry MUST include these semantic context fields:

{
  "semanticContext": {
    "domain": "semantic-core",
    "concept": "PatternExecution",
    "regimeId": "regime:pattern-oracle-v1"
  },
  "provenance": {
    "executor": "agent:vibes-pro-generator",
    "trigger": "event:generation-planning"
  },
  "patternContext": {
    "pattern_id": "ifl:hash:abc123...",
    "pattern_name": "hexagonal-adapter",
    "pattern_version": "1.2.0",
    "invocation_context": "code-generation"
  }
}

8.4. Privacy Constraints (LocalFirstPrivacy)

Per SDS-015 Policy LocalFirstPrivacy:

Pattern execution metrics MUST be computed locally before aggregation
Raw code snippets MUST NOT be transmitted in observability data
Embeddings are pre-computed and stored with ifl:hash identity tokens only

8.5. OTel Collector Pattern Processing

Add to OTel Collector configuration for pattern metrics routing:

# Append to otel-collector-config.yaml
processors:
  # Pattern metric enrichment
  resource/patterns:
    attributes:
      - key: sea.concept
        value: PatternExecution
        action: insert
      - key: sea.domain
        value: semantic-core
        action: insert

  # Cardinality protection for pattern_id
  groupbyattrs:
    keys:
      - pattern_id
      - domain
      - outcome

exporters:
  # Route to MetricIndexer via OpenObserve
  otlp/metric-indexer:
    endpoint: ${OPENOBSERVE_ENDPOINT}
    headers:
      Authorization: ${OPENOBSERVE_TOKEN}
      X-SEA-Routing: metric-indexer

service:
  pipelines:
    metrics/patterns:
      receivers: [otlp]
      processors: [batch, resource/patterns, groupbyattrs]
      exporters: [otlp/metric-indexer, otlp/openobserve]

8.6. Flow Contract: GovernedSpeed™ → MetricIndexer

The IndexTelemetry flow (SDS-015) expects this event structure from GovernedSpeed™:

interface PatternTelemetryEvent {
  // Identity
  pattern_id: string;         // ifl:hash of the pattern
  event_id: string;           // UUID v7
  correlation_id: string;     // From sea-correlation-id header

  // Metrics
  outcome: 'success' | 'failure' | 'timeout';
  latency_ms: number;
  governance_score: number;   // 0-1 from policy evaluation

  // Temporal
  timestamp: string;          // ISO-8601
  window_start: string;       // Aggregation window start
  window_end: string;         // Aggregation window end

  // Context
  domain: string;
  invocation_context: string;
}

8.7. Invariant: INV-OBS-06

INV-ID	Invariant	Type	Enforcement
INV-OBS-06	Pattern metrics MUST include `pattern_id` label	System	OTel Collector processor validation
INV-OBS-07	Governance scores MUST be [0,1] normalized	Semantic	Policy Gateway normalization
INV-OBS-08	Pattern telemetry MUST respect LocalFirstPrivacy	Privacy	Collector PII scrubbing + no raw code

<- Returns ` = `false` Updated: 2025-12-30 - Added Pattern Weighting Telemetry section (Plan P010 C1B) –>

References

SDS-050: Semantic Identity Provenance