πŸ•ΈοΈ Ada Research Browser

spec.md
← Back

Feature Specification: Cloud Demo Infrastructure

Feature Branch: 007-cloud-demo-infra Created: 2026-02-15 Status: Draft Input: On-demand cloud demo environment using Hetzner Cloud VMs, replacing local Vagrant for reliable demo experience

User Scenarios & Testing (mandatory)

User Story 1 - Spin Up Demo Cluster (Priority: P1)

As a presenter preparing for a conference talk or customer demo, I need to provision a complete 4-node demo cluster in the cloud so that I have a reliable, consistent environment without consuming local resources.

Why this priority: This is the core value proposition - without cluster spin-up, no other functionality matters. Presenters need a working environment before they can demonstrate anything.

Independent Test: Can be fully tested by running the spin-up command and verifying all 4 VMs are accessible via SSH, delivering a ready-to-use demo environment.

Acceptance Scenarios:

  1. Given valid cloud credentials configured, When I run the spin-up command, Then 4 VMs are created with correct sizes (mgmt01: 4GB, login01: 2GB, compute01: 2GB, compute02: 2GB)
  2. Given VMs are provisioned, When provisioning completes, Then I can SSH to mgmt01 and login01 using the injected SSH key
  3. Given spin-up is initiated, When the process runs, Then I see progress output including cost estimation for the session
  4. Given VMs are created, When provisioning completes, Then all nodes can communicate on the private network (10.0.0.0/24)

User Story 2 - Tear Down Demo Cluster (Priority: P1)

As a presenter who has finished a demo, I need to destroy all cloud resources immediately so that billing stops and no orphaned resources remain.

Why this priority: Equal to spin-up - without reliable teardown, users risk unexpected charges. This is a critical cost-safety feature.

Independent Test: Can be fully tested by running teardown after spin-up and verifying zero resources remain in the cloud account.

Acceptance Scenarios:

  1. Given a running demo cluster, When I run the teardown command, Then all VMs, networks, and associated resources are destroyed
  2. Given teardown is initiated, When I confirm the action, Then I see a count of resources being destroyed
  3. Given teardown completes, When I check the cloud console, Then no demo-related resources exist and billing has stopped

User Story 3 - Run Demo Scenarios (Priority: P1)

As a presenter with a running cluster, I need to execute the same demo scenarios from spec 006 so that I can demonstrate compliance workflows without modification.

Why this priority: The cluster is only useful if existing demo playbooks work unchanged. This validates the integration with spec 006.

Independent Test: Can be tested by running scenario-a-onboard.yml on a cloud cluster and verifying the same outputs as the Vagrant environment.

Acceptance Scenarios:

  1. Given a provisioned cloud cluster, When I run provision.yml, Then FreeIPA, Slurm, Wazuh, and NFS are configured as in the Vagrant environment
  2. Given core services are running, When I run any scenario playbook (a, b, c, or d), Then the scenario executes successfully with expected outputs
  3. Given the cluster is provisioned, When I use existing demo/narratives/.md guides, Then* all commands and expected outputs match

User Story 4 - Share Access with Workshop Attendees (Priority: P2)

As a workshop instructor, I need to provide SSH access to attendees so they can interact directly with the demo cluster during hands-on exercises.

Why this priority: Extends the value beyond single-presenter demos to interactive workshops. Depends on basic cluster functionality.

Independent Test: Can be tested by adding an attendee SSH key and verifying they can connect to login01.

Acceptance Scenarios:

  1. Given a running cluster, When I add attendee SSH keys, Then attendees can SSH to login01 using their keys
  2. Given attendees are connected, When they run permitted commands, Then they can interact with Slurm and shared storage
  3. Given workshop is complete, When I teardown, Then all access is revoked with the cluster

User Story 5 - Cost Awareness and Safety (Priority: P2)

As a user of cloud resources, I need visibility into costs and protection against forgotten clusters so that I don't incur unexpected charges.

Why this priority: Critical for user trust and adoption, but secondary to core spin-up/teardown functionality.

Independent Test: Can be tested by verifying TTL warnings appear after the configured threshold and cost estimates display on spin-up.

Acceptance Scenarios:

  1. Given spin-up is initiated, When VMs are being created, Then I see an estimated hourly cost for the cluster
  2. Given a cluster has been running longer than the TTL threshold, When the threshold is exceeded, Then a warning is displayed
  3. Given teardown is requested, When I confirm, Then I see the total resources to be destroyed before proceeding

Edge Cases

Requirements (mandatory)

Functional Requirements

Cluster Provisioning

Teardown

Cost and Safety

Integration

Documentation

Key Entities

Success Criteria (mandatory)

Measurable Outcomes

Scope

In Scope

Out of Scope

Assumptions

Dependencies

Clarifications

Session 2026-02-15