spec_id: SDS-011 title: PET Prompt Judge Service bounded_context: cognitive-extension status: Draft version: 1.1.0 date_created: 2025-12-21 last_updated: 2026-01-03 implements:
Defines the Prompt Judge Service, the core backend component of the PET App responsible for evaluating user prompts, inferring intent, detecting weaknesses, and recommending improvements.
The Prompt Judge Service is a stateless processing module (likely a microservice or serverless function) that accepts a prompt context and returns a structured evaluation.
It implements the Modular Pipeline pattern defined in ADR-018.
graph TD
A[Client Request] --> B[Judge Service]
B --> C{Pipeline Orchestrator}
C --> D[Intent Detector]
C --> E[Structure Evaluator]
C --> F[Agentic Viability]
C --> G[Constraint Checker]
D & E & F & G --> H[Aggregator]
H --> I[Suggestion Engine]
I --> J[Final Evaluation JSON]
POST /v1/judge/evaluate[!NOTE] API versioning follows semantic versioning. Breaking changes require version bump.
Request:
1
2
3
4
5
6
7
8
{
"prompt": "Summarize this article and send it to Slack.",
"context": {
"userLevel": "intermediate",
"mode": "agentic",
"domainRules": ["strict-no-pii"]
}
}
Response:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
{
"evaluationId": "eval-123456",
"score": 85,
"summary": "Good clear intent, but missing specific constraints and tool parameters.",
"sections": {
"intent": {
"inferred": "User wants a summary of text and a Slack notification.",
"score": 5,
"feedback": "Intent is clear."
},
"structure": {
"score": 3,
"feedback": "Lacks specific tool parameters (channel, length)."
},
"agentic": {
"score": 3,
"feedback": "Implies tool use but doesn't define error handling or format."
}
},
"suggestions": [
{
"type": "add_constraint",
"text": "Specify summary length (e.g., '3 sentences')."
},
{
"type": "add_parameter",
"text": "Specify Slack channel (e.g., '#general')."
}
],
"improvedPrompt": "Summarize this article in 3 sentences. Then, send the summary to the #general Slack channel. usage: slack.post(channel='#general', text=summary). Handle errors if Slack is down.",
"flags": ["#missing_constraints", "#agentic_ambiguity"]
}
mode=agentic):
improvedPrompt by applying fixes to the original.The Judge uses a hybrid scoring model:
| Component | Method | Weight |
|---|---|---|
| Intent | LLM extraction | 0.2 |
| Structure | Rule-based + LLM | 0.3 |
| Agentic | LLM evaluation | 0.5 |
Rule-based checks run first (fast, deterministic), then LLM evaluates remaining aspects.
The following invariants MUST be maintained:
mode=test, evaluation results MUST be identical.The service loads Rubric Configurations based on the user’s Organization ID. Enterprises can publish versioned “Best Practice” documents that become part of the system prompt context.
Example rubric.yaml:
1
2
3
4
5
6
7
8
org_id: "default"
weights:
intent: 0.2
structure: 0.3
agentic: 0.5
priorities:
- no_jailbreaks
- clear_tool_use
#missing_constraints to trigger lessons)LLMJudgePort) with strict Zod schemas + JSON Mode to guarantee output format.libs/domain-judge as pure TypeScript entities.