πŸ•ΈοΈ Ada Research Browser

architecture.md
← Back

Claude Code Prompt: Secure Runtime Environment (SRE)

Paste this into Claude Code as your project prompt


You are building a product called Secure Runtime Environment (SRE) β€” a Kubernetes-based platform that provides a hardened, compliant runtime for deploying applications. It must satisfy government compliance frameworks (ATO, CMMC, FedRAMP, NIST 800-53, DISA STIGs) while also being viable for commercial regulated industries (finance, healthcare). It must provide a simple, GitOps-driven developer experience for deploying applications to the platform.

Architecture Overview

SRE is an Infrastructure-as-Code (IaC) platform composed of these layers:

  1. Cluster Foundation β€” RKE2 Kubernetes distribution on hardened OS
  2. Platform Services β€” Security, observability, networking, and policy tooling deployed via Flux CD
  3. Developer Experience β€” GitOps-based app deployment with self-service templates
  4. Supply Chain Security β€” Image scanning, signing, SBOM generation, and admission control

The platform is modeled after the DoD Platform One / Big Bang architecture but is independently built, 100% open-source, vendor-neutral, and opinionated toward simplicity. Every component is free to use with no special access required. When pursuing government contracts, specific components can be swapped for government-approved equivalents (e.g., Iron Bank images, Vault Enterprise, RHEL).


Layer 1: Cluster Foundation

Kubernetes Distribution: RKE2 - Only DISA STIG-certified Kubernetes distribution - FIPS 140-2 compliant out of the box (BoringCrypto module) - CIS Kubernetes Benchmark passing with default configuration - SELinux support, built-in etcd, no Docker dependency - Supports air-gapped and edge deployments

Base OS: Rocky Linux 9 (preferred) or Ubuntu 22.04 LTS (STIG-hardened) - Rocky Linux 9 is a free, binary-compatible RHEL rebuild β€” the same DISA STIG and CIS benchmarks for RHEL 9 apply directly - AlmaLinux 9 is an equally valid alternative - Apply DISA STIG or CIS Level 2 benchmark via Ansible (use ansible-lockdown roles) - Enable FIPS mode at the OS level - Enable SELinux in enforcing mode (Rocky/Alma default; configure AppArmor for Ubuntu) - Configure auditd for NIST AU-family controls

Provisioning: OpenTofu + Ansible - OpenTofu (open-source Terraform fork, fully compatible) for infrastructure (cloud VMs, networking, LBs) with modules for AWS, Azure, on-prem vSphere, and Proxmox VE - Ansible for OS hardening and RKE2 bootstrap - Packer for immutable, pre-hardened AMI/VM image builds (AWS, vSphere, Proxmox VE) - Proxmox VE support enables on-premises homelabs and air-gapped environments with cloud-init based provisioning - Note: When pursuing government contracts, swap cloud targets to AWS GovCloud / Azure Gov as needed

Create the following directory structure:

sre/
β”œβ”€β”€ tofu/
β”‚   β”œβ”€β”€ modules/
β”‚   β”‚   β”œβ”€β”€ aws/
β”‚   β”‚   β”œβ”€β”€ azure/
β”‚   β”‚   β”œβ”€β”€ vsphere/
β”‚   β”‚   └── proxmox/
β”‚   β”œβ”€β”€ environments/
β”‚   β”‚   β”œβ”€β”€ dev/
β”‚   β”‚   β”œβ”€β”€ staging/
β”‚   β”‚   β”œβ”€β”€ production/
β”‚   β”‚   └── proxmox-lab/
β”‚   └── main.tf
β”œβ”€β”€ ansible/
β”‚   β”œβ”€β”€ playbooks/
β”‚   β”‚   β”œβ”€β”€ harden-os.yml
β”‚   β”‚   └── install-rke2.yml
β”‚   β”œβ”€β”€ roles/
β”‚   └── inventory/
β”œβ”€β”€ packer/
β”‚   β”œβ”€β”€ rocky9-hardened.pkr.hcl
β”‚   β”œβ”€β”€ ubuntu2204-hardened.pkr.hcl
β”‚   └── rocky-linux-9-proxmox/   # Proxmox VE template with RKE2 pre-staged
β”œβ”€β”€ platform/                    # Layer 2 - Flux GitOps
β”‚   β”œβ”€β”€ flux-system/
β”‚   β”œβ”€β”€ core/
β”‚   β”‚   β”œβ”€β”€ istio/
β”‚   β”‚   β”œβ”€β”€ kyverno/
β”‚   β”‚   β”œβ”€β”€ monitoring/
β”‚   β”‚   β”œβ”€β”€ logging/
β”‚   β”‚   β”œβ”€β”€ runtime-security/
β”‚   β”‚   β”œβ”€β”€ cert-manager/
β”‚   β”‚   β”œβ”€β”€ openbao/
β”‚   β”‚   └── backup/
β”‚   └── addons/
β”‚       β”œβ”€β”€ argocd/              # Optional: for app teams who prefer Argo
β”‚       β”œβ”€β”€ backstage/
β”‚       β”œβ”€β”€ harbor/
β”‚       └── keycloak/
β”œβ”€β”€ apps/                        # Layer 3 - App deployment templates
β”‚   β”œβ”€β”€ templates/
β”‚   β”‚   β”œβ”€β”€ web-app/
β”‚   β”‚   β”œβ”€β”€ api-service/
β”‚   β”‚   └── worker/
β”‚   └── tenants/
β”œβ”€β”€ policies/                    # Layer 2 - Kyverno policies
β”‚   β”œβ”€β”€ baseline/
β”‚   β”œβ”€β”€ restricted/
β”‚   └── custom/
β”œβ”€β”€ compliance/
β”‚   β”œβ”€β”€ oscal/
β”‚   β”œβ”€β”€ stig-checklists/
β”‚   └── nist-800-53-mappings/
β”œβ”€β”€ docs/
β”‚   β”œβ”€β”€ developer-guide.md
β”‚   β”œβ”€β”€ operator-guide.md
β”‚   β”œβ”€β”€ compliance-guide.md
β”‚   └── architecture.md
└── scripts/
    β”œβ”€β”€ bootstrap.sh
    └── validate-compliance.sh

Layer 2: Platform Services

Deploy all platform services via Flux CD (the GitOps engine Big Bang uses). Every component is defined as a HelmRelease or Kustomization in the platform/ directory. Use a Kustomization hierarchy:

platform/flux-system/gotk-sync.yaml β†’ platform/core/ β†’ each service

Service Mesh: Istio

Policy Enforcement: Kyverno

Monitoring: Prometheus + Grafana

Logging: Grafana Loki + Alloy

Distributed Tracing: Tempo

Runtime Security: NeuVector (open source)

Secrets Management: OpenBao + External Secrets Operator

Certificate Management: cert-manager

Identity & Access: Keycloak

Container Registry: Harbor

Backup: Velero


Layer 3: Developer Experience

The goal: a developer should be able to go from "I have a container image" to "my app is running securely in production" with minimal platform knowledge.

GitOps App Deployment via Flux CD

App Templates (Helm Charts)

Provide standardized Helm chart templates in apps/templates/ that bake in all compliance requirements:

sre-web-app chart β€” for HTTP services: - Deployment with security context (non-root, read-only rootfs, drop all capabilities) - HPA for autoscaling - PodDisruptionBudget - Service + Istio VirtualService - NetworkPolicy (ingress from istio-gateway only, egress to specific services) - ServiceMonitor for Prometheus - Liveness/readiness probes (configurable)

sre-api-service chart β€” for internal APIs: - Same as web-app but with Istio AuthorizationPolicy for caller restrictions - mTLS peer authentication

sre-worker chart β€” for background processors: - Same security context - No ingress, egress only to required services - Optional CronJob support

Each template accepts a simple values.yaml:

app:
  name: my-service
  team: alpha
  image: harbor.sre.internal/alpha/my-service:v1.2.3
  port: 8080
  replicas: 2
  resources:
    requests: { cpu: 100m, memory: 128Mi }
    limits: { cpu: 500m, memory: 512Mi }
  env:
    - name: DATABASE_URL
      secretRef: my-service-db  # Pulled from OpenBao via ESO
  ingress:
    enabled: true
    host: my-service.apps.sre.example.com

Developer Portal: Backstage (optional addon)

CI/CD Pipeline Templates

Provide reference CI pipeline definitions (GitLab CI, GitHub Actions) that: 1. Build container image from Dockerfile 2. Scan with Trivy (fail on CRITICAL/HIGH) 3. Generate SBOM with Syft (SPDX + CycloneDX) 4. Sign image with Cosign 5. Push to Harbor 6. Update Helm values in GitOps repo (image tag bump) 7. Flux auto-deploys from there


Layer 4: Supply Chain Security

Image Pipeline

Admission Control Chain

Request flow: API Server β†’ Kyverno (mutate/validate) β†’ NeuVector admission β†’ Pod created β†’ Istio sidecar injected

Software Bill of Materials


Compliance Mapping

NIST 800-53 Rev 5 Control Families Addressed

Control Family Implementation
AC (Access Control) Keycloak SSO + RBAC + Istio AuthorizationPolicy + NetworkPolicy
AU (Audit) Loki + auditd + OpenBao audit log + K8s audit log
CA (Assessment) Kyverno policy reports + NeuVector CIS benchmarks + OSCAL
CM (Configuration Mgmt) GitOps (Flux) + Kyverno policies + immutable infrastructure
IA (Identification/Auth) Keycloak MFA + OpenBao auth + Istio mTLS + cert-manager
IR (Incident Response) AlertManager + NeuVector alerts + Grafana dashboards
RA (Risk Assessment) Trivy scanning + NeuVector runtime + Kyverno violation reports
SA (System Acquisition) SBOM + Cosign + Harbor scan gates + Chainguard/distroless base images
SC (System Comms) Istio mTLS STRICT + TLS everywhere + FIPS crypto
SI (System Integrity) Image signing + admission control + drift detection via Flux

CMMC 2.0 Level 2 (NIST 800-171)

The platform directly addresses the 110 controls through the same mechanisms above. The compliance/nist-800-53-mappings/ directory should contain a crosswalk from 800-53 β†’ 800-171 β†’ platform implementation.

DISA STIGs Covered

OSCAL / cATO Support


Key Design Decisions

  1. Flux CD over ArgoCD as the primary GitOps engine β€” Flux is what Big Bang uses, is more Kubernetes-native, and better suited for platform-level orchestration. ArgoCD is available as an optional addon for app teams who prefer its UI.

  2. Kyverno over OPA Gatekeeper β€” Kyverno policies are written in YAML (not Rego), making them auditable by non-developers and compliance officers. Kyverno also natively supports image verification.

  3. RKE2 over vanilla K8s or OpenShift β€” RKE2 is the only distribution with a published DISA STIG, FIPS compliance, and CIS benchmark alignment out of the box. OpenShift is an alternative but introduces vendor lock-in and significant cost.

  4. Grafana stack (Loki/Tempo/Prometheus) over EFK β€” Unified observability UI, lower resource footprint than Elasticsearch, and better integration with Istio.

  5. NeuVector over Falco β€” NeuVector provides both runtime security AND network segmentation AND CIS scanning AND admission control in one tool. Falco only covers runtime detection.

  6. External Secrets Operator over direct Vault/OpenBao injection β€” ESO produces standard Kubernetes Secrets, which works with any application without SDK changes. Apps remain portable. ESO supports both OpenBao and Vault backends interchangeably.

  7. OpenTofu over Terraform β€” Fully open-source (MPL 2.0) fork of Terraform with identical HCL syntax and provider compatibility. Avoids HashiCorp BSL licensing concerns.

  8. OpenBao over HashiCorp Vault β€” Linux Foundation open-source fork with compatible APIs. For government contracts where Vault is explicitly required, swap in Vault community edition (the ESO and Kubernetes auth configs are identical).

  9. Harbor as your own hardened registry β€” Instead of depending on Iron Bank access (which requires Platform One registration), run Harbor with Trivy scanning and Cosign verification. This gives you the same security posture. Add Iron Bank as an upstream replication source later when pursuing ATO.


Implementation Order

Build the platform in this order:

  1. OpenTofu + Ansible: Provision VMs, harden OS, install RKE2
  2. Flux CD bootstrap: Install Flux, set up Git repo sync
  3. Istio: Service mesh with mTLS
  4. cert-manager: TLS certificates
  5. Kyverno + policies: Policy enforcement baseline
  6. Monitoring stack: Prometheus + Grafana
  7. Logging stack: Loki + Alloy
  8. OpenBao + ESO: Secrets management
  9. Harbor: Container registry
  10. NeuVector: Runtime security
  11. Keycloak: SSO for all UIs
  12. Tempo: Distributed tracing
  13. Velero: Backup
  14. App templates: Helm charts for developers
  15. Backstage: Developer portal (optional)
  16. Compliance artifacts: OSCAL, STIG checklists, documentation

What to Build Now

Start with steps 1-6. For each component: - Write the OpenTofu modules and Ansible playbooks/roles - Write the Flux HelmRelease or Kustomization manifests - Write the Kyverno policies - Write a comprehensive README in docs/ - Include a Makefile or Taskfile.yml with commands like: - make infra-plan ENV=dev - make infra-apply ENV=dev - make bootstrap-flux - make validate-compliance

Focus on making each component production-ready, well-documented, and testable in isolation. Use semantic versioning for the platform Helm charts. Pin all image versions explicitly β€” no :latest tags.