πŸ•ΈοΈ Ada Research Browser

plan.md
← Back

Implementation Plan: Data Models and Documentation Generation Foundation

Branch: 001-data-models-docs-foundation | Date: 2026-02-14 | Spec: spec.md Input: Feature specification from /specs/001-data-models-docs-foundation/spec.md

Note: This template is filled in by the /speckit.plan command. See .specify/templates/commands/plan.md for the execution workflow.

Summary

Build the foundational data models and documentation generation system for a CUI compliance Ansible framework. This feature establishes the single source of truth for all compliance data through YAML data models (control mappings, glossary, HPC tailoring, ODP values) and provides Python-based tooling for generating audience-specific documentation and validating glossary coverage. No Ansible roles are implementedβ€”only the structured data and generation/validation scripts that all subsequent compliance implementation specs depend on.

Technical Context

Language/Version: Python 3.9+ (per constitution tech stack) Primary Dependencies: PyYAML (YAML parsing), Jinja2 (templating for doc generation), pytest (testing), NEEDS CLARIFICATION (YAML schema validation library) Storage: File-based YAML (control_mapping.yml, terms.yml, hpc_tailoring.yml, odp_values.yml) + generated Markdown/CSV outputs Testing: pytest for unit tests, YAML validation tests, doc generation integration tests Target Platform: RHEL 9 / Rocky Linux 9 (per constitution), command-line tooling Project Type: Data models + CLI scripts (Ansible project skeleton with Python tooling) Performance Goals: Documentation generator completes all 7 outputs in <30 seconds (SC-004), NEEDS CLARIFICATION (YAML load time for 110+ controls) Constraints: Deterministic output (same YAML β†’ same docs), CI-friendly exit codes, Excel-compatible CSV, GitHub-flavored Markdown, NEEDS CLARIFICATION (YAML schema enforcement approach) Scale/Scope: 110 NIST 800-171 Rev 2 controls + 97 Rev 3 requirements, 60+ glossary terms, 49 ODPs, 10+ HPC tailoring entries, 7 doc output types

Constitution Check

GATE: Must pass before Phase 0 research. Re-check after Phase 1 design.

Principle I: Plain Language First

βœ… PASS - Feature directly implements glossary with plain-language explanations for all 5 audiences (PI, researcher, sysadmin, CISO, leadership). Glossary validator enforces no undefined jargon.

Principle II: Data Model as Source of Truth

βœ… PASS - YAML files (control_mapping.yml, terms.yml, hpc_tailoring.yml, odp_values.yml) are single source; all documentation is generated, never duplicated.

Principle III: Compliance as Code

βœ… PASS - Control mapping includes Ansible role assignment placeholders and control tagging structure for future implementation. This feature establishes the data foundation for compliance-as-code.

Principle IV: HPC-Aware

βœ… PASS - hpc_tailoring.yml explicitly documents 10+ HPC/security conflicts with compensating controls, risk acceptance, and NIST 800-223 references.

Principle V: Multi-Framework

βœ… PASS - Control mapping covers all 4 frameworks simultaneously (NIST 800-171 Rev 2/3, CMMC L2, 800-53 R5) with explicit "N/A" + rationale for missing mappings.

Principle VI: Audience-Aware Documentation

βœ… PASS - Documentation generator produces 7 distinct audience-specific outputs from single YAML source (PI guide, researcher quickstart, sysadmin reference, CISO map, leadership briefing, glossary, crosswalk).

Principle VII: Idempotent and Auditable

βœ… PASS - Control mapping includes placeholders for verify.yml/evidence.yml task files. Doc generator is deterministic (same input β†’ same output).

Principle VIII: Prefer Established Tools

βœ… PASS - Uses PyYAML (standard YAML library), Jinja2 (established templating), pytest (standard Python testing). No custom parsers/generators where established tools exist.

Gate Status: βœ… ALL PRINCIPLES SATISFIED - Proceed to Phase 0 Research

Project Structure

Documentation (this feature)

specs/001-data-models-docs-foundation/
β”œβ”€β”€ plan.md              # This file (implementation plan)
β”œβ”€β”€ research.md          # Phase 0 output (technology decisions)
β”œβ”€β”€ data-model.md        # Phase 1 output (YAML schemas)
β”œβ”€β”€ quickstart.md        # Phase 1 output (usage guide)
β”œβ”€β”€ contracts/           # Phase 1 output (script interfaces, no APIs)
β”‚   └── README.md
└── tasks.md             # Phase 2 output (/speckit.tasks command - NOT created by /speckit.plan)

Source Code (repository root)

This is an Ansible project with Python tooling. Structure follows Ansible best practices with compliance data models:

rcd-cui/
β”œβ”€β”€ ansible.cfg                    # Ansible configuration
β”œβ”€β”€ inventory/                     # Ansible inventory
β”‚   β”œβ”€β”€ hosts.yml
β”‚   └── group_vars/
β”‚       β”œβ”€β”€ all.yml
β”‚       β”œβ”€β”€ management.yml
β”‚       β”œβ”€β”€ internal.yml
β”‚       └── restricted.yml
β”œβ”€β”€ roles/                         # Ansible roles (empty initially, populated in future specs)
β”‚   └── common/
β”‚       └── vars/
β”‚           └── control_mapping.yml  # CANONICAL DATA MODEL (110+ controls)
β”œβ”€β”€ docs/                          # Documentation source and generated output
β”‚   β”œβ”€β”€ glossary/
β”‚   β”‚   └── terms.yml              # CANONICAL GLOSSARY (60+ terms)
β”‚   β”œβ”€β”€ hpc_tailoring.yml          # HPC-specific control tailoring (10+ entries)
β”‚   β”œβ”€β”€ odp_values.yml             # Organization-Defined Parameters (49 ODPs)
β”‚   └── generated/                 # Generated documentation (ephemeral)
β”‚       β”œβ”€β”€ pi_guide.md
β”‚       β”œβ”€β”€ researcher_quickstart.md
β”‚       β”œβ”€β”€ sysadmin_reference.md
β”‚       β”œβ”€β”€ ciso_compliance_map.md
β”‚       β”œβ”€β”€ leadership_briefing.md
β”‚       β”œβ”€β”€ glossary_full.md
β”‚       └── crosswalk.csv
β”œβ”€β”€ scripts/                       # Python automation scripts
β”‚   β”œβ”€β”€ generate_docs.py           # Documentation generator
β”‚   β”œβ”€β”€ validate_glossary.py       # Glossary coverage validator
β”‚   └── models/                    # Pydantic data models
β”‚       β”œβ”€β”€ __init__.py
β”‚       β”œβ”€β”€ control_mapping.py
β”‚       β”œβ”€β”€ glossary.py
β”‚       β”œβ”€β”€ hpc_tailoring.py
β”‚       └── odp_values.py
β”œβ”€β”€ templates/                     # Jinja2 templates for doc generation
β”‚   β”œβ”€β”€ pi_guide.md.j2
β”‚   β”œβ”€β”€ researcher_quickstart.md.j2
β”‚   β”œβ”€β”€ sysadmin_reference.md.j2
β”‚   β”œβ”€β”€ ciso_compliance_map.md.j2
β”‚   β”œβ”€β”€ leadership_briefing.md.j2
β”‚   β”œβ”€β”€ glossary_full.md.j2
β”‚   β”œβ”€β”€ crosswalk.csv.j2
β”‚   └── _partials/
β”‚       β”œβ”€β”€ glossary_link.j2
β”‚       β”œβ”€β”€ control_table.j2
β”‚       └── header.j2
β”œβ”€β”€ tests/                         # Pytest tests
β”‚   β”œβ”€β”€ test_yaml_schemas.py       # Validate all YAML data models
β”‚   β”œβ”€β”€ test_generate_docs.py     # Doc generator integration tests
β”‚   └── test_glossary_validator.py # Glossary validator unit tests
β”œβ”€β”€ Makefile                       # Build targets (docs, validate, crosswalk, clean)
β”œβ”€β”€ requirements.txt               # Python dependencies (PyYAML, Pydantic, Jinja2, pytest)
β”œβ”€β”€ README.md                      # Project overview and usage
└── .specify/                      # Specify framework artifacts
    └── memory/
        └── constitution.md        # Project constitution

Structure Decision: Ansible project structure with Python tooling. This feature establishes the data foundation (4 YAML files) and documentation generation pipeline (Python scripts + Jinja2 templates). No Ansible roles are implemented yetβ€”those come in future specs. The structure separates:

  1. Canonical Data (roles/common/vars/, docs/glossary/, docs/*.yml) - Single source of truth, version-controlled
  2. Generated Artifacts (docs/generated/) - Ephemeral, regenerated from YAML sources
  3. Tooling (scripts/, templates/) - Python generators and validators
  4. Tests (tests/) - Schema validation and integration tests

This aligns with Constitution Principle II (Data Model as Source of Truth) and Principle VI (Audience-Aware Documentation).

Complexity Tracking

No constitution violations. All principles satisfied: - βœ… Established tools (Pydantic, PyYAML, Jinja2, pytest) - βœ… Data model as source of truth (YAML canonical, docs generated) - βœ… Plain language first (glossary with 5-audience context) - βœ… HPC-aware (explicit tailoring document) - βœ… Multi-framework (4 frameworks in single data model)

No complexity justification required.