πŸ•ΈοΈ Ada Research Browser

spec.md
← Back

Feature Specification: Pre-Baked Vagrant Box Workflow

Feature Branch: 009-vagrant-prebaked-boxes Created: 2026-03-02 Status: Draft Input: User description: "Add pre-baked Vagrant box support to the local demo lab (spec 006) for fast cluster boot from reusable packaged boxes."

Clarifications

Session 2026-03-02

User Scenarios & Testing (mandatory)

User Story 1 - Package a Provisioned Cluster as Reusable Boxes (Priority: P1)

A demo operator has just completed a full demo-setup.sh run, which took 20-30 minutes. They want to capture the fully provisioned state of all four VMs (mgmt01, login01, compute01, compute02) as reusable Vagrant boxes so that future demo sessions can skip the lengthy provisioning step entirely.

The operator runs demo-bake.sh, which packages each VM into a named Vagrant box, stores them locally, and records metadata (creation date, source commit, provider) in a manifest file. The operator can later list available baked box sets or delete stale ones.

Why this priority: This is the foundational capability β€” without the ability to create baked boxes, nothing else in this feature works. It delivers immediate value by capturing a known-good cluster state for reuse.

Independent Test: Can be fully tested by running demo-setup.sh followed by demo-bake.sh and verifying that four box files are created with a valid manifest.

Acceptance Scenarios:

  1. Given a fully provisioned and running demo cluster (all 4 VMs up with baseline services), When the operator runs demo-bake.sh, Then four named box files are created in the local box storage directory and a manifest file records creation date, source commit hash, and provider for each box.
  2. Given one or more baked box sets exist, When the operator runs demo-bake.sh --list, Then a table of available box sets is displayed showing set name, creation date, provider, age, and source commit.
  3. Given an older baked box set exists, When the operator runs demo-bake.sh --delete <set-name>, Then the specified box files and manifest entry are removed and disk space is reclaimed.
  4. Given VMs are not all running or provisioning is incomplete, When the operator runs demo-bake.sh, Then the script exits with a clear error message explaining that a fully provisioned cluster is required.

User Story 2 - Boot a Demo Cluster from Pre-Baked Boxes (Priority: P1)

A demo presenter needs a working CUI demo cluster ready in under 5 minutes. They have previously baked a box set. Instead of running the full 20-30 minute provisioning cycle, the existing demo-setup.sh detects the available baked boxes and offers to use them, skipping all Ansible provisioning.

Why this priority: This is the primary value proposition β€” reducing demo startup from 20-30 minutes to under 5 minutes. Co-equal with User Story 1 since baking without booting delivers no time savings.

Independent Test: Can be tested by booting from pre-baked boxes and verifying that all critical services (FreeIPA, Slurm, Wazuh, NFS, Munge) are operational and demo scenarios run successfully.

Acceptance Scenarios:

  1. Given a recent baked box set exists (within the staleness threshold), When the operator runs demo-setup.sh, Then the script detects available boxes, prompts the user to choose between baked boxes or fresh provisioning, and when baked is selected, boots from the boxes without running Ansible provisioning.
  2. Given baked boxes are used for boot, When all 4 VMs are running, Then FreeIPA (server on mgmt01, clients on others), Slurm (slurmctld on mgmt01, slurmd on compute nodes), Wazuh (manager on mgmt01, agents on others), NFS (exports on mgmt01, mounts on others), Munge (all nodes), and Chronyd (all nodes) are all operational.
  3. Given baked boxes are used for boot, When the operator runs any demo scenario (A through D), Then the scenario completes successfully with the same results as a fresh-provisioned cluster.
  4. Given baked boxes exist but are older than the staleness threshold (default: 7 days), When the operator runs demo-setup.sh, Then the script warns that boxes are stale and recommends refreshing, but still allows the user to proceed with stale boxes if desired.
  5. Given no baked boxes exist, When the operator runs demo-setup.sh, Then the script proceeds with normal from-scratch provisioning (current behavior unchanged).
  6. Given a fresh provision completes successfully, When no DEMO_USE_BAKED=0 is set, Then the script prompts "Bake this cluster for future fast starts?" and bakes if confirmed.

User Story 3 - Rebuild Baked Boxes from Current Codebase (Priority: P2)

A developer has made changes to Ansible roles or provisioning playbooks and needs to create a fresh set of baked boxes that reflect the updated codebase. They run demo-refresh.sh, which destroys existing VMs, provisions from scratch, and automatically bakes the result into new boxes. This is the "prove reproducibility" workflow done periodically, not on every demo.

Why this priority: Important for keeping baked boxes current with code changes, but not needed for initial demo acceleration. Can be run on a schedule or after significant changes rather than every session.

Independent Test: Can be tested by modifying a provisioning playbook, running demo-refresh.sh, and verifying the resulting boxes reflect the code changes.

Acceptance Scenarios:

  1. Given any cluster state (running, stopped, or no VMs), When the operator runs demo-refresh.sh, Then existing VMs are destroyed, a fresh cluster is provisioned from scratch using current playbooks, and the result is baked into a new box set.
  2. Given a previous baked box set exists, When demo-refresh.sh completes, Then the old box set is replaced by the new one and the manifest is updated with the current commit hash and date.
  3. Given provisioning fails during demo-refresh.sh, When the error occurs, Then the script reports the failure, preserves any previous baked box set (does not delete old boxes until new ones are confirmed), and exits with a non-zero status.

User Story 4 - Override Baked Box Behavior via Environment Variable (Priority: P3)

An advanced user or CI system needs deterministic control over whether baked boxes are used, without interactive prompts. They set DEMO_USE_BAKED=1 to force baked-box boot (failing if no boxes exist) or DEMO_USE_BAKED=0 to force fresh provisioning regardless of box availability.

Why this priority: Supports automation and CI integration but is not required for interactive demo workflows.

Independent Test: Can be tested by running demo-setup.sh with DEMO_USE_BAKED=1 and DEMO_USE_BAKED=0 and verifying the expected behavior in each case.

Acceptance Scenarios:

  1. Given baked boxes exist, When DEMO_USE_BAKED=1 is set and demo-setup.sh is run, Then baked boxes are used without prompting.
  2. Given no baked boxes exist, When DEMO_USE_BAKED=1 is set and demo-setup.sh is run, Then the script exits with an error directing the user to run demo-bake.sh first.
  3. Given baked boxes exist, When DEMO_USE_BAKED=0 is set and demo-setup.sh is run, Then fresh provisioning runs without prompting, ignoring available boxes.

Edge Cases

Requirements (mandatory)

Functional Requirements

Key Entities

Assumptions

Success Criteria (mandatory)

Measurable Outcomes