Document Operations Handbook Epic

User Journey

The Document Operations bounded context provides unified capabilities for working with structured documents (DOCX, XLSX, PDF) in SEA-Forge™. All operations preserve semantic structure, support traceability to Knowledge Graph concepts, and enable pattern-based document generation for ADR exports, traceability matrices, RAG indexing, and metrics dashboards.

Jobs to be Done & EARS Requirements

Job: Create DOCX Document with Structure

User Story: As a developer, I want to create Word documents with styles, sections, tables, and headers/footers, so that I can generate formatted specifications and reports.

EARS Requirement:

While generating DOCX documents, when a CreateDOCX command is received with document structure, the document operations context shall:
1. Initialize python-docx Document with default styles
2. Add headings with hierarchy (level 0-6) for document structure
3. Create paragraphs with text, runs, and formatting (bold, italic, font size, color)
4. Insert tables with specified rows, columns, and styles
5. Support merged cells, table styles (Light Grid Accent 1, etc.)
6. Add headers and footers with page numbers and metadata
7. Generate table of contents from heading hierarchy
8. Set document properties (title, author, subject, keywords, version)
9. Write output to specified path with .docx extension

Job: Create XLSX Spreadsheet with Formulas and Charts

User Story: As an analyst, I want to create Excel spreadsheets with formulas, charts, pivot tables, and conditional formatting, so that I can build traceability matrices and metrics dashboards.

EARS Requirement:

While generating XLSX documents, when a CreateXLSX command is received with spreadsheet data, the document operations context shall:
1. Initialize openpyxl Workbook with active worksheet
2. Write data to cells with proper formatting (font, fill, alignment)
3. Add formulas (=SUM, =AVERAGE, =IF) with explicit data type ‘f’
4. Create charts (bar, line, pie, scatter) from data ranges
5. Insert pivot tables for data aggregation
6. Apply conditional formatting rules
7. Add data validation and named ranges
8. Support multiple worksheets with custom titles
9. Write output to specified path with .xlsx extension

Job: Read and Extract PDF Content

User Story: As a knowledge worker, I want to read PDF documents and extract text with metadata, so that I can index content for RAG and reference archival material.

EARS Requirement:

While reading PDF documents, when a ReadPDF command is received with file path, the document operations context shall:
1. Open PDF using pypdf.PdfReader
2. Extract metadata (author, title, subject, creator, producer, creation date)
3. Iterate through pages and extract text content
4. Preserve page-level access with page numbers
5. Extract hyperlinks and annotations if available
6. Maintain layout information where possible
7. Return structured data with metadata and pages array
8. Handle encrypted PDFs with password prompt

Job: Generate ADR Export to DOCX

User Story: As an architect, I want to export Architecture Decision Records from markdown to formatted DOCX, so that I can distribute professional ADR documents.

EARS Requirement:

While exporting ADR documents, when an ExportADR command is received with markdown path, the document operations context shall:
1. Read markdown content from ADR file
2. Parse sections (title, status, context, decision, consequences)
3. Create DOCX with ADR template heading
4. Format each section with appropriate heading levels
5. Apply professional styling (fonts, spacing, alignment)
6. Include decision status badge (Accepted, Deprecated, Superseded)
7. Add metadata section with date, author, decision ID
8. Write formatted DOCX to output path

Job: Generate Traceability Matrix to XLSX

User Story: As a requirements analyst, I want to generate traceability matrices in Excel with coverage formulas, so that I can track spec-to-code relationships.

EARS Requirement:

While generating traceability matrices, when a GenerateMatrix command is received with mapping data, the document operations context shall:
1. Create XLSX with “Traceability” worksheet
2. Add header row (ADR ID, PRD IDs, SDS IDs, Coverage)
3. Apply bold font and blue fill to header cells
4. Write mapping data rows with ADR, PRD, and SDS references
5. Add coverage formula: =IF(C<row><>"", "✓", "") for each row
6. Format as table with TableStyleMedium2
7. Auto-fit column widths
8. Write to output path

Job: Extract PDF for RAG Indexing

User Story: As a knowledge engineer, I want to extract PDF content with metadata for Knowledge Graph indexing, so that I can enable semantic search on PDF documents.

EARS Requirement:

While extracting PDF for RAG, when an ExtractForIndexing command is received with PDF path, the document operations context shall:
1. Open PDF with pypdf.PdfReader
2. Extract metadata (author, title, creation date)
3. Iterate pages and extract text with page numbers
4. Extract hyperlinks and annotations for additional context
5. Build structured output with:
  - metadata: Document metadata object
  - pages: Array of {page_num, text, links}
6. Return JSON structure ready for chunking and embedding
7. Preserve document structure for accurate retrieval

Job: Generate Metrics Dashboard to XLSX

User Story: As a project manager, I want to generate metrics dashboards in Excel with charts and trend lines, so that I can visualize sprint performance over time.

EARS Requirement:

While generating metrics dashboards, when a GenerateDashboard command is received with metrics data, the document operations context shall:
1. Create XLSX with “Metrics” worksheet
2. Add header row (Date, Velocity, Quality, Coverage)
3. Write metrics data rows with date and values
4. Create line chart with:
  - Title: “Sprint Metrics”
  - Data series for Velocity, Quality, Coverage
  - Categories from Date column
5. Position chart at cell F5
6. Format chart with legend and axis labels
7. Write to output path

Job: Apply Semantic Anchoring to Documents

User Story: As a knowledge architect, I want to link document content to Knowledge Graph concepts, so that documents are traceable and semantically grounded.

EARS Requirement:

While creating documents, when semantic anchoring is applied, the document operations context shall:
1. Embed ConceptId in document properties:
  - doc.core_properties.subject = 'sea:BoundedContext'
  - doc.core_properties.keywords = 'ADR-021, SDS-012, PRD-026'
2. Add cross-references to related documents
3. Include hyperlinks to Knowledge Graph entities
4. Enable traceability queries from document to concepts
5. Support reverse lookup (concept → documents)

Job: Use Templates for Consistent Document Generation

User Story: As a document author, I want to use templates for consistent formatting, so that generated documents follow organizational standards.

EARS Requirement:

While generating documents, when a template is specified, the document operations context shall:
1. Load template from templates/ directory (e.g., adr-template.docx)
2. Preserve template styles, headers, footers, and layout
3. Replace placeholder variables with actual content
4. Maintain template structure and formatting
5. Write output to specified path
6. Support template variables: , , , etc.

Domain Entities Summary

Root Aggregates

DOCXDocument: Word document with content, styles, tables, headers/footers, and properties
XLSXSpreadsheet: Excel workbook with worksheets, formulas, charts, and pivot tables
PDFDocument: PDF reader with pages, metadata, text content, and hyperlinks

Value Objects

DOCXStyle: Formatting definition with font, size, color, bold, italic, alignment
XLSXChart: Chart definition with type, data ranges, categories, and position
XLSXTable: Table definition with display name, reference range, and style
SemanticAnchor: ConceptId reference linking document to Knowledge Graph

Policy Rules

SemanticStructurePreserved: Document structure maintained through read/write operations
TraceabilityRequired: All generated documents include semantic anchors to Knowledge Graph
TemplateConsistency: Template-based generation ensures consistent formatting

Integration Points

python-docx: DOCX creation, reading, and manipulation
openpyxl: XLSX creation, reading, formulas, and charts
pypdf/pdfplumber: PDF text extraction and metadata reading
Knowledge Graph: Semantic anchoring and concept linking
docx Skill: Claude Code skill for DOCX operations
xlsx Skill: Claude Code skill for XLSX operations

Success Metrics

Document Generation Accuracy: 100% of templates render correctly
Extraction Completeness: >95% of PDF text extracted accurately
Formula Accuracy: 100% of XLSX formulas calculate correctly
Semantic Linking: All spec documents have ConceptId anchors

Non-Functional Requirements

NFR-001: DOCX generation completes in <5 seconds for typical 20-page document
NFR-002: XLSX generation completes in <3 seconds for typical 100-row spreadsheet
NFR-003: PDF extraction handles 100-page documents in <10 seconds
NFR-004: All document operations preserve semantic structure and formatting