spec_id: SDS-011 title: PET Prompt Judge Service bounded_context: cognitive-extension status: Draft version: 1.1.0 date_created: 2025-12-21 last_updated: 2026-01-03 implements:

Purpose

Defines the Prompt Judge Service, the core backend component of the PET App responsible for evaluating user prompts, inferring intent, detecting weaknesses, and recommending improvements.

1. Architecture Overview

The Prompt Judge Service is a stateless processing module (likely a microservice or serverless function) that accepts a prompt context and returns a structured evaluation.

2. API Specification

2.1 Endpoint: POST /v1/judge/evaluate

1
2
3
4
5
6
7
8
{
  "prompt": "Summarize this article and send it to Slack.",
  "context": {
    "userLevel": "intermediate",
    "mode": "agentic",
    "domainRules": ["strict-no-pii"]
  }
}

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
{
  "evaluationId": "eval-123456",
  "score": 85,
  "summary": "Good clear intent, but missing specific constraints and tool parameters.",
  "sections": {
    "intent": {
      "inferred": "User wants a summary of text and a Slack notification.",
      "score": 5,
      "feedback": "Intent is clear."
    },
    "structure": {
      "score": 3,
      "feedback": "Lacks specific tool parameters (channel, length)."
    },
    "agentic": {
      "score": 3,
      "feedback": "Implies tool use but doesn't define error handling or format."
    }
  },
  "suggestions": [
    {
      "type": "add_constraint",
      "text": "Specify summary length (e.g., '3 sentences')."
    },
    {
      "type": "add_parameter",
      "text": "Specify Slack channel (e.g., '#general')."
    }
  ],
  "improvedPrompt": "Summarize this article in 3 sentences. Then, send the summary to the #general Slack channel. usage: slack.post(channel='#general', text=summary). Handle errors if Slack is down.",
  "flags": ["#missing_constraints", "#agentic_ambiguity"]
}

3. Component Logic

3.1 Sub-Judges (Logic Pipeline)

3.2 Suggestion Engine

3.3 Hybrid Scoring

Rule-based checks run first (fast, deterministic), then LLM evaluates remaining aspects.

4. Invariants

5. Configuration & Rubrics

Component	Method	Weight
Intent	LLM extraction	0.2
Structure	Rule-based + LLM	0.3
Agentic	LLM evaluation	0.5

The service loads Rubric Configurations based on the user’s Organization ID. Enterprises can publish versioned “Best Practice” documents that become part of the system prompt context.

1
2
3
4
5
6
7
8
org_id: "default"
weights:
  intent: 0.2
  structure: 0.3
  agentic: 0.5
priorities:
  - no_jailbreaks
  - clear_tool_use

SDS-011: PET Prompt Judge Service

SDS-030

Purpose

1. Architecture Overview

2. API Specification

2.1 Endpoint: `POST /v1/judge/evaluate`

3. Component Logic

3.1 Sub-Judges (Logic Pipeline)

3.2 Suggestion Engine

3.3 Hybrid Scoring

4. Invariants

5. Configuration & Rubrics

5. Integration

6. Implementation Strategy (v1)

SDS-011: PET Prompt Judge Service

SDS-030

Purpose

1. Architecture Overview

2. API Specification

2.1 Endpoint: POST /v1/judge/evaluate

3. Component Logic

3.1 Sub-Judges (Logic Pipeline)

3.2 Suggestion Engine

3.3 Hybrid Scoring

4. Invariants

5. Configuration & Rubrics

5. Integration

6. Implementation Strategy (v1)

2.1 Endpoint: `POST /v1/judge/evaluate`