Document Operations Handbook Epic
User Journey
The Document Operations bounded context provides unified capabilities for working with structured documents (DOCX, XLSX, PDF) in SEA-Forge™. All operations preserve semantic structure, support traceability to Knowledge Graph concepts, and enable pattern-based document generation for ADR exports, traceability matrices, RAG indexing, and metrics dashboards.
Jobs to be Done & EARS Requirements
Job: Create DOCX Document with Structure
User Story: As a developer, I want to create Word documents with styles, sections, tables, and headers/footers, so that I can generate formatted specifications and reports.
EARS Requirement:
- While generating DOCX documents, when a
CreateDOCX command is received with document structure, the document operations context shall:
- Initialize python-docx Document with default styles
- Add headings with hierarchy (level 0-6) for document structure
- Create paragraphs with text, runs, and formatting (bold, italic, font size, color)
- Insert tables with specified rows, columns, and styles
- Support merged cells, table styles (Light Grid Accent 1, etc.)
- Add headers and footers with page numbers and metadata
- Generate table of contents from heading hierarchy
- Set document properties (title, author, subject, keywords, version)
- Write output to specified path with
.docx extension
User Story: As an analyst, I want to create Excel spreadsheets with formulas, charts, pivot tables, and conditional formatting, so that I can build traceability matrices and metrics dashboards.
EARS Requirement:
- While generating XLSX documents, when a
CreateXLSX command is received with spreadsheet data, the document operations context shall:
- Initialize openpyxl Workbook with active worksheet
- Write data to cells with proper formatting (font, fill, alignment)
- Add formulas (=SUM, =AVERAGE, =IF) with explicit data type ‘f’
- Create charts (bar, line, pie, scatter) from data ranges
- Insert pivot tables for data aggregation
- Apply conditional formatting rules
- Add data validation and named ranges
- Support multiple worksheets with custom titles
- Write output to specified path with
.xlsx extension
Job: Read and Extract PDF Content
User Story: As a knowledge worker, I want to read PDF documents and extract text with metadata, so that I can index content for RAG and reference archival material.
EARS Requirement:
- While reading PDF documents, when a
ReadPDF command is received with file path, the document operations context shall:
- Open PDF using pypdf.PdfReader
- Extract metadata (author, title, subject, creator, producer, creation date)
- Iterate through pages and extract text content
- Preserve page-level access with page numbers
- Extract hyperlinks and annotations if available
- Maintain layout information where possible
- Return structured data with metadata and pages array
- Handle encrypted PDFs with password prompt
Job: Generate ADR Export to DOCX
User Story: As an architect, I want to export Architecture Decision Records from markdown to formatted DOCX, so that I can distribute professional ADR documents.
EARS Requirement:
- While exporting ADR documents, when an
ExportADR command is received with markdown path, the document operations context shall:
- Read markdown content from ADR file
- Parse sections (title, status, context, decision, consequences)
- Create DOCX with ADR template heading
- Format each section with appropriate heading levels
- Apply professional styling (fonts, spacing, alignment)
- Include decision status badge (Accepted, Deprecated, Superseded)
- Add metadata section with date, author, decision ID
- Write formatted DOCX to output path
Job: Generate Traceability Matrix to XLSX
User Story: As a requirements analyst, I want to generate traceability matrices in Excel with coverage formulas, so that I can track spec-to-code relationships.
EARS Requirement:
- While generating traceability matrices, when a
GenerateMatrix command is received with mapping data, the document operations context shall:
- Create XLSX with “Traceability” worksheet
- Add header row (ADR ID, PRD IDs, SDS IDs, Coverage)
- Apply bold font and blue fill to header cells
- Write mapping data rows with ADR, PRD, and SDS references
- Add coverage formula:
=IF(C<row><>"", "✓", "") for each row
- Format as table with TableStyleMedium2
- Auto-fit column widths
- Write to output path
User Story: As a knowledge engineer, I want to extract PDF content with metadata for Knowledge Graph indexing, so that I can enable semantic search on PDF documents.
EARS Requirement:
- While extracting PDF for RAG, when an
ExtractForIndexing command is received with PDF path, the document operations context shall:
- Open PDF with pypdf.PdfReader
- Extract metadata (author, title, creation date)
- Iterate pages and extract text with page numbers
- Extract hyperlinks and annotations for additional context
- Build structured output with:
metadata: Document metadata object
pages: Array of {page_num, text, links}
- Return JSON structure ready for chunking and embedding
- Preserve document structure for accurate retrieval
Job: Generate Metrics Dashboard to XLSX
User Story: As a project manager, I want to generate metrics dashboards in Excel with charts and trend lines, so that I can visualize sprint performance over time.
EARS Requirement:
- While generating metrics dashboards, when a
GenerateDashboard command is received with metrics data, the document operations context shall:
- Create XLSX with “Metrics” worksheet
- Add header row (Date, Velocity, Quality, Coverage)
- Write metrics data rows with date and values
- Create line chart with:
- Title: “Sprint Metrics”
- Data series for Velocity, Quality, Coverage
- Categories from Date column
- Position chart at cell F5
- Format chart with legend and axis labels
- Write to output path
Job: Apply Semantic Anchoring to Documents
User Story: As a knowledge architect, I want to link document content to Knowledge Graph concepts, so that documents are traceable and semantically grounded.
EARS Requirement:
- While creating documents, when semantic anchoring is applied, the document operations context shall:
- Embed ConceptId in document properties:
doc.core_properties.subject = 'sea:BoundedContext'
doc.core_properties.keywords = 'ADR-021, SDS-012, PRD-026'
- Add cross-references to related documents
- Include hyperlinks to Knowledge Graph entities
- Enable traceability queries from document to concepts
- Support reverse lookup (concept → documents)
Job: Use Templates for Consistent Document Generation
User Story: As a document author, I want to use templates for consistent formatting, so that generated documents follow organizational standards.
EARS Requirement:
- While generating documents, when a template is specified, the document operations context shall:
- Load template from
templates/ directory (e.g., adr-template.docx)
- Preserve template styles, headers, footers, and layout
- Replace placeholder variables with actual content
- Maintain template structure and formatting
- Write output to specified path
- Support template variables: , , , etc.
Domain Entities Summary
Root Aggregates
- DOCXDocument: Word document with content, styles, tables, headers/footers, and properties
- XLSXSpreadsheet: Excel workbook with worksheets, formulas, charts, and pivot tables
- PDFDocument: PDF reader with pages, metadata, text content, and hyperlinks
Value Objects
- DOCXStyle: Formatting definition with font, size, color, bold, italic, alignment
- XLSXChart: Chart definition with type, data ranges, categories, and position
- XLSXTable: Table definition with display name, reference range, and style
- SemanticAnchor: ConceptId reference linking document to Knowledge Graph
Policy Rules
- SemanticStructurePreserved: Document structure maintained through read/write operations
- TraceabilityRequired: All generated documents include semantic anchors to Knowledge Graph
- TemplateConsistency: Template-based generation ensures consistent formatting
Integration Points
- python-docx: DOCX creation, reading, and manipulation
- openpyxl: XLSX creation, reading, formulas, and charts
- pypdf/pdfplumber: PDF text extraction and metadata reading
- Knowledge Graph: Semantic anchoring and concept linking
- docx Skill: Claude Code skill for DOCX operations
- xlsx Skill: Claude Code skill for XLSX operations
Success Metrics
- Document Generation Accuracy: 100% of templates render correctly
- Extraction Completeness: >95% of PDF text extracted accurately
- Formula Accuracy: 100% of XLSX formulas calculate correctly
- Semantic Linking: All spec documents have ConceptId anchors
Non-Functional Requirements
- NFR-001: DOCX generation completes in <5 seconds for typical 20-page document
- NFR-002: XLSX generation completes in <3 seconds for typical 100-row spreadsheet
- NFR-003: PDF extraction handles 100-page documents in <10 seconds
- NFR-004: All document operations preserve semantic structure and formatting