Task 14 Execution Packet: Knowledge Graph Reasoning Integration (Aligned to 2026-01-25 Plan)

This packet replaces the old Task 14 notes. It is aligned to Phase 14 / Task 14 in docs/plans/2026-01-25-end-state.md and to the current knowledge-graph service code in this repo.

Use this as the exact execution guide for an agent implementing Task 14.

Objective

Enable bounded, deterministic reasoning for the Knowledge Graph service with explicit profile selection and separable inferred data.

Key outcomes:

Reasoning is off by default (explicit-only results unless requested).
When enabled, inferred triples are materialized into a named graph: urn:sea:inferred:<snapshot_id>.
Each inferred triple is annotated with snapshot hash, reasoning profile, and rule-set hash.
Behavior is deterministic for a given snapshot + profile.

Non-negotiables

Backwards compatible: default behavior unchanged unless profile explicitly requested.
Deterministic: same snapshot + profile yields identical inferred graph content.
Separable: explicit graph remains truth; inferred graph is queryable but isolated.
No generated edits: do not touch generated zones or files with generated banners.

Step 0 — Ground the current code reality (must confirm before editing)

Current state in this repo (verify quickly before coding):

Adapter: services/knowledge-graph/src/adapters/oxigraph_adapter.py
- Uses rdflib.Graph, not pyoxigraph.
- Stores explicit triples in _graph.
- query_sparql and query_sparql_graph query the single graph.
- Snapshots are metadata-only content IDs (hash + timestamp), not stored graph versions.
API: services/knowledge-graph/src/api/routes.py
- POST /kg/sparql takes SparqlRequest { query, format }.
Config: services/knowledge-graph/src/config.py is SHACL-only.
Artifacts: load_precomputed_artifacts can load explicit.nq + inferred.nq or merged.nq.

If any of the above differs, adjust this packet to match the repo before coding.

Step 1 — Introduce a reasoning module with explicit profiles

Create:

services/knowledge-graph/src/reasoner.py
services/knowledge-graph/src/rules/custom_ruleset_v1.rq

Required profile list (exact)

none
rdfs
owlrl-lite
custom_ruleset_v1

Reasoner contract (stable API)

Define a small, stable interface:

ReasoningProfile = Literal["none", "rdfs", "owlrl-lite", "custom_ruleset_v1"]

@dataclass(frozen=True)
class ReasoningOutput:
    profile: ReasoningProfile
    snapshot_id: str
    explicit_triples: int
    inferred_triples: int
    rule_set_hash: str
    duration_ms: float
    inferred_graph: Graph  # inferred triples only

Profile implementations

none: no inference, inferred graph is empty.
rdfs: use owlrl.RDFSClosure (from owlrl).
owlrl-lite: use OWL-RL with axiomatic + datatype axioms disabled to keep it bounded.
- Verify the exact owlrl API in the venv before coding (do not guess signature).
custom_ruleset_v1:
- Load and execute SPARQL CONSTRUCT rules from custom_ruleset_v1.rq.
- The file must exist and be deterministic; one minimal rule is enough.

Ruleset hashing

Compute rule_set_hash as sha256 of the ruleset file contents (or a fixed string for rdfs/owlrl-lite).

Example:

rdfs: sha256("profile:rdfs")
owlrl-lite: sha256("profile:owlrl-lite")
custom_ruleset_v1: sha256(file_bytes)

Step 2 — Switch adapter storage to support named graphs

Why: Named graphs are required to keep inferred data separable.

Required structural change

In OxigraphAdapter:

Replace single Graph usage with a dataset that supports named graphs.
- Use rdflib.Dataset (preferred) or ConjunctiveGraph.
Keep explicit triples in the default graph.
Store inferred triples in a named graph:
- urn:sea:inferred:<snapshot_id>

Snapshot ID derivation

Use the existing canonical hash logic to derive a snapshot ID without mutating state:

content_hash = sha256(canonicalize(explicit_graph))
snapshot_id = f"ifl:snap:{content_hash[:16]}"

Snapshot model decision (resolved): metadata-only content IDs. The system does not store historical graph versions; snapshot_id is a deterministic label for the current explicit graph only. Do not rely on _current_snapshot_id for correctness; compute from graph content.

Step 3 — Annotate each inferred triple

Each inferred triple must be annotated with:

source snapshot hash
reasoning profile
rule-set hash

Required mechanism

Use RDF reification (standard, deterministic):

For each inferred triple (s, p, o) in the inferred named graph:

Create a reified statement node _:stmt with:
- rdf:subject s
- rdf:predicate p
- rdf:object o
Add metadata triples:
- sea:inferredFromSnapshot "<content_hash>"
- sea:reasoningProfile "<profile>"
- sea:ruleSetHash "<rule_set_hash>"

Use SEA = Namespace("http://sea-forge.com/schema/core#") (already present in adapter).

Step 4 — Wire reasoning into query paths (explicit per query)

API surface

Update SparqlRequest in services/knowledge-graph/src/api/routes.py:

reasoning_profile: str = "none"
snapshot_id: str | None = None (optional; must match current computed snapshot)

Adapter signatures

Update adapter methods to accept these:

query_sparql(query: str, reasoning_profile: ReasoningProfile = "none", snapshot_id: str | None = None) -> dict
query_sparql_graph(query: str, format: str = "turtle", reasoning_profile: ReasoningProfile = "none", snapshot_id: str | None = None) -> str

Query execution rules

If reasoning_profile == "none" → query explicit graph only (default behavior).
If reasoning_profile != "none":
- Materialize inferred graph via Reasoner.
- Store inferred triples in named graph urn:sea:inferred:<snapshot_id>.
- Execute the query against a union graph (explicit + inferred).
- Return metadata block in response (see below).

Snapshot handling: if snapshot_id is provided, it must equal the current computed snapshot ID. If it does not match, return 400 (or ignore and log a warning, but prefer 400 to avoid ambiguity).

Response metadata (required)

Include inferred in response when profile != none:

"inferred": {
  "enabled": true,
  "profile": "rdfs",
  "snapshot_id": "ifl:snap:...",
  "graph_uri": "urn:sea:inferred:<snapshot_id>",
  "explicit_triples": 123,
  "inferred_triples": 45,
  "rule_set_hash": "<sha256>",
  "duration_ms": 12.3
}

If profile is none, omit inferred or set enabled=false.

Format-specific metadata handling (required)

For JSON SPARQL results, include the inferred object in the response body.
For non-JSON SPARQL result formats (XML/CSV/TSV) and RDF serializations (e.g., turtle), must emit equivalent metadata via HTTP response headers:
- X-Inferred-Enabled
- X-Inferred-Profile
- X-Inferred-Snapshot-Id
- X-Inferred-Graph-Uri
- X-Inferred-Explicit-Triples
- X-Inferred-Inferred-Triples
- X-Inferred-Rule-Set-Hash
- X-Inferred-Duration-Ms

Optional (opt-in): allow embedding metadata as triples in a named metadata graph when include_metadata=true; do not reject reasoning requests for non-JSON formats.

Step 5 — Precomputed artifacts must not pollute explicit truth

Update load_precomputed_artifacts to keep explicit/inferred separate:

If explicit.nq + inferred.nq exist:
- Load explicit.nq into default graph.
- Validate inferred artifacts using a manifest or metadata file (required fields):
  - snapshot hash
  - explicit source identifier or hash
  - reasoning profile
  - rule-set hash
- Verify the inferred metadata matches the current snapshot hash derived from explicit.nq.
- If validation fails, discard inferred and log a clear warning/error.
- Load inferred.nq into named inferred graph using the current snapshot hash only after validation.
- Annotate inferred triples as in Step 3 (required).
If only merged.nq exists:
- Load into explicit graph only.
- Log a warning that inferred separation is unavailable for merged input.

Step 6 — Dependencies

Update services/knowledge-graph/pyproject.toml:

Ensure owlrl is listed in dependencies.
rdflib already exists.

Step 7 — Tests (must prove real inference + separation)

Create services/knowledge-graph/tests/test_reasoning.py with at least these tests:

Test 1: RDFS subclass inference works

Explicit triples:
- :A rdfs:subClassOf :B
- :x rdf:type :A
Query: ?s a :B
Expectation:
- No result with reasoning_profile="none".
- :x appears with reasoning_profile="rdfs".

Test 2: Inferred named graph + annotations exist

Request reasoning with rdfs.
Assert the named graph urn:sea:inferred:<snapshot_id> exists.
Assert reification triples exist with:
- sea:inferredFromSnapshot
- sea:reasoningProfile
- sea:ruleSetHash

Test 3: Custom ruleset v1 works

Add one rule in custom_ruleset_v1.rq that produces a deterministic inference.
Assert the inferred triple exists when reasoning_profile="custom_ruleset_v1".

Test 4: Determinism

Run the same reasoning twice and assert identical inferred graph canonical form (use the adapter’s canonicalization logic or rdflib.compare.to_isomorphic).

Step 8 — API docs update (avoid doc drift)

Update:

services/knowledge-graph/README.md
docs/howto/use-knowledge-graph-service.md

Add:

reasoning_profile + snapshot_id request fields
inferred response metadata
the inferred graph URI format

Step 9 — Verification

Run:

pytest services/knowledge-graph/tests/test_reasoning.py -v

Optional:

pytest services/knowledge-graph/tests -v

Only mark Task 14 complete after the test passes then open a pull request.

Completion Status

Status: ✅ COMPLETED (2026-01-27)

All 8 tests pass:

Test 1: RDFS subclass inference works ✓
Test 2: Inferred named graph + annotations exist ✓
Test 3: Custom ruleset v1 works ✓
Test 4: Determinism ✓

Implementation summary:

Created src/reasoner.py with Reasoner class and 4 profiles
Created src/rules/custom_ruleset_v1.rq with SPARQL CONSTRUCT rules
Switched adapter to use rdflib.Dataset for named graph support
Implemented per-triple reification annotations
Wired reasoning into query paths with reasoning_profile and snapshot_id parameters
Updated API routes to support reasoning parameters
Updated load_precomputed_artifacts for explicit/inferred separation
Added owlrl>=6.0.2 to dependencies
Updated README and HOWTO documentation