πŸ•ΈοΈ Ada Research Browser

README.md
← Back

Secure Runtime Environment (SRE)

A hardened, compliance-ready Kubernetes platform for deploying applications in regulated environments. One-click deploy, zero-trust security, full observability β€” all open source.

License RKE2 Flux CD Components


What You Get

A complete Kubernetes platform with 16 integrated components, all deployed and managed through GitOps:

SRE Dashboard showing all healthy components

Category Components What It Does
Service Mesh Istio Encrypts all pod-to-pod traffic (mTLS), controls who can talk to whom
Policy Engine Kyverno Blocks insecure containers, enforces image signing, requires labels
Monitoring Prometheus + Grafana + Alertmanager Metrics, dashboards, and alerting for the entire cluster
Logging Loki + Alloy Centralized log collection and search from every pod
Tracing Tempo Distributed request tracing across services
Runtime Security NeuVector Detects and blocks anomalous container behavior in real time
Secrets OpenBao + External Secrets Operator Centralized secrets vault with automatic Kubernetes sync
Certificates cert-manager Automated TLS certificate issuance and rotation
Identity Keycloak Single sign-on (SSO) with OIDC/SAML for all platform UIs
Registry Harbor + Trivy Container image storage with vulnerability scanning on push
Backup Velero Scheduled cluster backup and disaster recovery
Load Balancer MetalLB Provides LoadBalancer IPs on bare metal (cloud uses native LB)
GitOps Flux CD Continuously reconciles cluster state from this Git repo

Accessing the Platform

All platform UIs are exposed through a single Istio ingress gateway on standard HTTPS (port 443). No custom ports needed.

Step 1: Add DNS entries

Get the gateway IP and add DNS entries:

# Get the gateway's external IP (assigned by MetalLB on bare metal, or cloud LB on AWS/Azure)
GATEWAY_IP=$(kubectl get svc istio-gateway -n istio-system -o jsonpath='{.status.loadBalancer.ingress[0].ip}')

# Add to /etc/hosts (or configure real DNS in production)
echo "$GATEWAY_IP  portal.apps.sre.example.com dashboard.apps.sre.example.com grafana.apps.sre.example.com prometheus.apps.sre.example.com alertmanager.apps.sre.example.com harbor.apps.sre.example.com keycloak.apps.sre.example.com neuvector.apps.sre.example.com openbao.apps.sre.example.com oauth2.apps.sre.example.com" | sudo tee -a /etc/hosts

How it works: The Istio ingress gateway gets a dedicated IP via LoadBalancer (MetalLB on bare metal, cloud LB on AWS/Azure). When a request arrives on port 443, Istio reads the Host header and routes it to the correct backend service via VirtualService rules. All traffic is TLS-encrypted with a wildcard certificate for *.apps.sre.example.com.

Step 2: Open any service

All URLs follow the pattern: https://<service>.apps.sre.example.com

Service URL Default Credentials
Portal https://portal.apps.sre.example.com SSO via Keycloak
Dashboard https://dashboard.apps.sre.example.com SSO via Keycloak
Grafana https://grafana.apps.sre.example.com SSO via Keycloak (or admin / prom-operator)
Prometheus https://prometheus.apps.sre.example.com SSO via Keycloak
Alertmanager https://alertmanager.apps.sre.example.com SSO via Keycloak
Harbor https://harbor.apps.sre.example.com SSO via Keycloak (or admin / Harbor12345)
Keycloak https://keycloak.apps.sre.example.com admin / (auto-generated, see below)
NeuVector https://neuvector.apps.sre.example.com SSO via Keycloak (or admin / admin)
OpenBao https://openbao.apps.sre.example.com SSO via Keycloak

SSO: All services (except Keycloak itself) are behind Single Sign-On. Clicking any link redirects you to Keycloak to log in once, then you're authenticated across all services.

Your browser will warn about the self-signed certificate β€” click through it or use curl -k.

Step 3: Get credentials

# Show all service URLs and credentials
./scripts/sre-access.sh

# Just credentials
./scripts/sre-access.sh creds

# Health check
./scripts/sre-access.sh status

How the Networking Works

                    Internet / LAN
                         β”‚
                  β”Œβ”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”
                  β”‚ LoadBalancer β”‚  Dedicated IP (MetalLB / cloud LB)
                  β”‚  :443 :80   β”‚  Standard HTTPS/HTTP ports
                  β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”˜
                         β”‚
                β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”
                β”‚  Istio Gateway  β”‚  TLS termination
                β”‚  (istio-system) β”‚  Host-based routing
                β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                         β”‚
         β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
         β”‚               β”‚               β”‚
    β”Œβ”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”
    β”‚ Grafana β”‚    β”‚ Harbor  β”‚    β”‚ Your Appβ”‚
    β”‚ :3000   β”‚    β”‚ :8080   β”‚    β”‚ :8080   β”‚
    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Traffic flow for https://grafana.apps.sre.example.com: 1. DNS resolves to the gateway's LoadBalancer IP 2. HTTPS hits port 443 on that IP 3. Istio Gateway terminates TLS using the wildcard certificate 4. Istio reads the Host: grafana.apps.sre.example.com header 5. VirtualService rule matches and routes to kube-prometheus-stack-grafana.monitoring.svc:80 6. Grafana serves the response back through the same path


User Walkthrough

Here's exactly what it looks like when you use the platform, from first login to deploying an app.

1. SSO Gate β€” Every Service is Protected

Visit any URL and you're redirected to sign in. One login, access everywhere.

SSO Sign-in Gate

2. Keycloak Login

Enter your credentials (default: sre-admin / SreAdmin123!). Once signed in, you're authenticated across all services.

Keycloak Login

3. Portal β€” Your Starting Point

The portal is your home page. It shows all platform services with health status, quick actions, and direct links.

SRE Portal

4. Dashboard β€” Platform Overview

Click "Dashboard" from the portal. See all 16 components, 3 nodes, and problem pods at a glance.

Dashboard Overview

Browse all services with health indicators, descriptions, and one-click access.

Services

6. Deploy Tab β€” One-Click App Deployment

Quick-start templates for instant demos, or use the custom form to deploy your own image.

Deploy App

7. Status Page β€” Shareable Health View

Operational status of every platform service. Share this URL with your team.

Status Page

8. Audit Log β€” Cluster Events

Tabular view with type filters, namespace filter, pagination, and color-coded badges.

Audit Log

9. Credentials β€” Quick Access to Passwords

View service credentials without needing kubectl.

Credentials

10. Command Palette (Ctrl+K)

Quick-search to jump to any page or external service.

Command Palette

11. Grafana β€” 30+ Dashboards

Cluster health, namespace resources, Istio traffic, Kyverno violations, and more.

Grafana

12. Harbor β€” Container Registry

Image storage with Trivy vulnerability scanning on push.

Harbor

13. Keycloak Admin β€” Identity Management

Manage users, groups, OIDC clients, and SSO configuration.

Keycloak Admin

14. Mobile Responsive

Portal and dashboard adapt to mobile screens for on-the-go health checks.

Mobile Portal

Full user stories with walkthroughs: See docs/user-stories.md for detailed personas and step-by-step workflows for Platform Admins, Developers, Security Officers, Team Leads, New Hires, and Incident Responders.


Quick Start

Deploy to Any Existing Kubernetes Cluster

If you already have a Kubernetes cluster with kubectl access:

git clone https://github.com/morbidsteve/sre-platform.git
cd sre-platform
./scripts/sre-deploy.sh

The script handles everything: storage provisioning, kernel modules, Flux CD bootstrap, secret generation, and waits until all components are healthy (~10 minutes).

When it finishes:

./scripts/sre-access.sh          # Show all URLs and credentials

Deploy from Scratch on Proxmox VE

Build a full cluster from bare metal:

git clone https://github.com/morbidsteve/sre-platform.git
cd sre-platform
./scripts/quickstart-proxmox.sh

See the Proxmox Getting Started Guide for details.

Deploy on Cloud (AWS, Azure, vSphere)

git clone https://github.com/morbidsteve/sre-platform.git
cd sre-platform

# 1. Provision infrastructure
task infra-plan ENV=dev
task infra-apply ENV=dev

# 2. Harden OS + install RKE2
cd infrastructure/ansible
ansible-playbook playbooks/site.yml -i inventory/dev/hosts.yml

# 3. Deploy the platform
cd ../..
./scripts/sre-deploy.sh

Deploy Your App

Option A: Web Dashboard (30 seconds)

  1. Open https://dashboard.apps.sre.example.com
  2. Click Deploy App
  3. Fill in: name, team, image, tag, port
  4. Click Deploy

The platform automatically adds security contexts, network policies, Istio mTLS, health probes, and Prometheus monitoring.

Option B: CLI

# Create a team namespace (one-time)
./scripts/sre-new-tenant.sh my-team

# Deploy your app (interactive)
./scripts/sre-deploy-app.sh

# Push to Git β€” Flux handles the rest
git push

Option C: GitOps (manual YAML)

Create apps/tenants/my-team/my-app.yaml:

apiVersion: helm.toolkit.fluxcd.io/v2
kind: HelmRelease
metadata:
  name: my-app
  namespace: team-my-team
spec:
  interval: 10m
  chart:
    spec:
      chart: ./apps/templates/sre-web-app
      reconcileStrategy: Revision
      sourceRef:
        kind: GitRepository
        name: flux-system
        namespace: flux-system
  values:
    app:
      name: my-app
      team: my-team
      image:
        repository: nginx
        tag: "1.27-alpine"
      port: 8080
    ingress:
      enabled: true
      host: my-app.apps.sre.example.com

Commit and push β€” Flux deploys it automatically.

Container Requirements

Your container must: - Run as non-root (UID 1000+) - Listen on port 8080+ (not 80 or 443) - Use a pinned version tag (not :latest)

Can't run as non-root? Use nginxinc/nginx-unprivileged instead of nginx, or add USER 1000 to your Dockerfile.


Architecture

SRE is composed of four layers:

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  Layer 4: Supply Chain Security                  β”‚
β”‚  Harbor + Trivy scanning + Cosign signing        β”‚
β”‚  + Kyverno image verification                    β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚  Layer 3: Developer Experience                   β”‚
β”‚  Helm templates + Tenant namespaces              β”‚
β”‚  + SRE Dashboard + GitOps app deployment         β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚  Layer 2: Platform Services (Flux CD)            β”‚
β”‚  Istio + Kyverno + Prometheus + Grafana + Loki   β”‚
β”‚  + NeuVector + OpenBao + cert-manager + Keycloak β”‚
β”‚  + Tempo + Velero + External Secrets             β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚  Layer 1: Cluster Foundation                     β”‚
β”‚  RKE2 (FIPS + CIS + STIG) on Rocky Linux 9      β”‚
β”‚  Provisioned by OpenTofu + Ansible + Packer      β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Layer 1 β€” Cluster Foundation: Infrastructure provisioned with OpenTofu (AWS, Azure, vSphere, Proxmox VE), OS hardened to DISA STIG via Ansible, RKE2 installed with FIPS 140-2 and CIS benchmark.

Layer 2 β€” Platform Services: All security, observability, and networking tools deployed via Flux CD. Every component is a HelmRelease in Git, continuously reconciled to the cluster.

Layer 3 β€” Developer Experience: Standardized Helm chart templates and self-service tenant namespaces. Developers deploy apps by committing a values file β€” the platform handles security contexts, network policies, monitoring, and mesh integration.

Layer 4 β€” Supply Chain Security: Images scanned by Trivy, signed with Cosign, verified by Kyverno at admission, monitored at runtime by NeuVector.

Security Controls

Every request passes through multiple security layers:

Request β†’ TLS Termination β†’ JWT Validation β†’ Authorization Policy β†’ Network Policy β†’ Istio mTLS β†’ Application
                                                                                         ↓
                                                                                 NeuVector Runtime Monitor

GitOps Flow

All changes flow through Git:

Developer β†’ git push β†’ GitHub β†’ Flux CD detects change β†’ Kyverno validates β†’ Helm deploys β†’ Pod running

No kubectl apply needed. No manual cluster access. Git is the single source of truth.

Zero-Trust Security Model

Every layer enforces security independently β€” compromising one layer doesn't bypass the others:

Layer Control What It Prevents
Gateway Istio ext-authz + OAuth2 Proxy Unauthenticated access to any service
Mesh Istio mTLS STRICT Unencrypted pod-to-pod communication
Network NetworkPolicy default-deny Lateral movement between namespaces
Admission Kyverno 7 policies Privileged containers, unsigned images, :latest tags
Runtime NeuVector Anomalous process execution, network exfiltration
Secrets OpenBao + ESO Hardcoded credentials, secret sprawl
Audit Prometheus + Loki + Tempo Unmonitored activity, missing forensic data

Platform Components

Component Versions (as deployed)

Component Chart Version Namespace
Istio (base + istiod + gateway) 1.25.2 istio-system
cert-manager v1.14.4 cert-manager
Kyverno 3.3.7 kyverno
kube-prometheus-stack 72.6.2 monitoring
Loki 6.29.0 logging
Alloy 0.12.2 logging
Tempo 1.18.2 tempo
OpenBao 0.9.0 openbao
External Secrets 0.9.13 external-secrets
NeuVector 2.8.6 neuvector
Velero 11.3.2 velero
Harbor 1.16.3 harbor
Keycloak 24.8.1 keycloak
MetalLB 0.14.9 metallb-system

Kyverno Policies (7 active)

Policy Mode What It Enforces
disallow-latest-tag Enforce Blocks :latest image tags
require-labels Enforce Requires app.kubernetes.io/name and sre.io/team labels
require-network-policies Enforce Ensures every namespace has a default-deny NetworkPolicy
require-security-context Enforce Requires non-root, drop ALL capabilities
restrict-image-registries Enforce Restricts images to approved registries
require-istio-sidecar Audit Requires Istio sidecar injection labels
verify-image-signatures Audit Verifies Cosign signatures on images

Secrets Management

Feature Implementation
Secrets Vault OpenBao (auto-initialized, auto-unsealed)
K8s Integration External Secrets Operator syncs from OpenBao to K8s Secrets
Auth Method Kubernetes ServiceAccount-based authentication
Engines KV v2 (app secrets), PKI (certificates)

SSO / Identity (Keycloak + OAuth2 Proxy)

All platform services are protected by SSO via Keycloak + OAuth2 Proxy + Istio ext-authz. A single login grants access to every service.

Feature Detail
Realm sre with OIDC discovery
Groups sre-admins, developers, sre-viewers
OIDC Clients Grafana, Harbor, NeuVector, Dashboard, OAuth2 Proxy
SSO Gate OAuth2 Proxy + Istio ext-authz on the gateway
Test User sre-admin / SreAdmin123! (in sre-admins group)

How SSO works: 1. User visits any service URL (e.g., https://grafana.apps.sre.example.com) 2. Istio gateway sends the request to OAuth2 Proxy for authentication check 3. If no valid session, OAuth2 Proxy redirects to Keycloak login page 4. User logs in once with Keycloak credentials 5. OAuth2 Proxy sets a session cookie valid across all *.apps.sre.example.com services 6. All subsequent requests pass through automatically β€” no more logins needed

Observability

Feature Detail
Grafana Dashboards 5 custom SRE dashboards (cluster, namespace, istio, kyverno, flux) + 31 built-in
PrometheusRules 22 alerts across 8 groups (certs, flux, kyverno, nodes, storage, pods, security, istio)
Alertmanager Severity-based routing (critical/warning/info) with inhibition rules

CI/CD Pipeline

Reusable GitHub Actions workflows in ci/github-actions/:

  1. Build container image with Docker Buildx
  2. Scan with Trivy (fail on CRITICAL)
  3. Generate SBOM with Syft (SPDX + CycloneDX)
  4. Sign with Cosign
  5. Push to Harbor
  6. Update GitOps repo (Flux auto-deploys)

Compliance Artifacts

Artifact Path
OSCAL System Security Plan compliance/oscal/ssp.json
NIST 800-53 Control Mapping compliance/nist-800-53-mappings/control-mapping.json
CMMC 2.0 Level 2 Assessment compliance/cmmc/level2-assessment.json
RKE2 DISA STIG Checklist compliance/stig-checklists/rke2-stig.json

Project Structure

sre-platform/
β”œβ”€β”€ platform/                     # Flux CD GitOps manifests
β”‚   β”œβ”€β”€ flux-system/              # Flux bootstrap
β”‚   β”œβ”€β”€ core/                     # Core platform components
β”‚   β”‚   β”œβ”€β”€ istio/                # Service mesh (mTLS, gateway, auth)
β”‚   β”‚   β”œβ”€β”€ cert-manager/         # TLS certificates
β”‚   β”‚   β”œβ”€β”€ kyverno/              # Policy engine
β”‚   β”‚   β”œβ”€β”€ monitoring/           # Prometheus + Grafana + Alertmanager
β”‚   β”‚   β”œβ”€β”€ logging/              # Loki + Alloy
β”‚   β”‚   β”œβ”€β”€ tracing/              # Tempo
β”‚   β”‚   β”œβ”€β”€ openbao/              # Secrets vault
β”‚   β”‚   β”œβ”€β”€ external-secrets/     # Secrets sync to K8s
β”‚   β”‚   β”œβ”€β”€ runtime-security/     # NeuVector
β”‚   β”‚   └── backup/               # Velero
β”‚   └── addons/                   # Optional components
β”‚       β”œβ”€β”€ harbor/               # Container registry
β”‚       └── keycloak/             # Identity / SSO
β”œβ”€β”€ apps/
β”‚   β”œβ”€β”€ portal/                   # SRE Portal β€” tiled landing page for all services
β”‚   β”œβ”€β”€ dashboard/                # SRE Dashboard web app (v2.0.2)
β”‚   β”œβ”€β”€ demo-app/                 # Go demo app with Prometheus metrics
β”‚   β”œβ”€β”€ templates/                # Helm chart templates (web-app, worker, cronjob, api)
β”‚   └── tenants/                  # Per-team app configs (team-alpha, team-beta)
β”œβ”€β”€ ci/
β”‚   └── github-actions/           # Reusable CI/CD workflows (build, scan, sign, deploy)
β”œβ”€β”€ policies/                     # Kyverno policies + test suites
β”œβ”€β”€ infrastructure/
β”‚   β”œβ”€β”€ tofu/                     # OpenTofu modules (AWS, Azure, vSphere, Proxmox)
β”‚   β”œβ”€β”€ ansible/                  # OS hardening + RKE2 install
β”‚   └── packer/                   # Immutable VM image builds
β”œβ”€β”€ compliance/                   # OSCAL, STIG checklists, NIST mappings
β”œβ”€β”€ scripts/                      # Deploy, access, and management scripts
└── docs/                         # Full documentation

Compliance

SRE targets these government and industry compliance frameworks:

Framework Coverage
NIST 800-53 Rev 5 AC, AU, CA, CM, IA, IR, MP, RA, SA, SC, SI control families
CMMC 2.0 Level 2 All 110 NIST 800-171 controls
DISA STIGs RKE2 Kubernetes, RHEL 9 / Rocky Linux 9, Istio
FedRAMP NIST 800-53 control inheritance + OSCAL artifacts
CIS Benchmarks Kubernetes (via RKE2), Rocky Linux 9 Level 2

Every Kyverno policy, Helm chart, and Flux manifest includes sre.io/nist-controls annotations mapping to specific NIST 800-53 controls.


Scripts Reference

Script Description
scripts/sre-deploy.sh One-button platform install on any K8s cluster
scripts/sre-access.sh Show all service URLs, credentials, and health status
scripts/sre-access.sh status Quick health check (all HelmReleases + problem pods)
scripts/sre-access.sh creds Show credentials for all platform services
scripts/sre-new-tenant.sh <team> Create a team namespace with RBAC, quotas, network policies
scripts/sre-deploy-app.sh Interactive app deployment (generates HelmRelease)
apps/dashboard/build-and-deploy.sh Build and deploy the SRE Dashboard to the cluster

Documentation

Guide Description
Architecture Full platform spec and design rationale
User Stories Personas, walkthroughs, and screenshots for every user type
Decision Records ADRs for all major technology choices
Developer Guide Deploy your app, secrets management, SSO, CI/CD
Proxmox Guide Build a cluster from scratch on Proxmox VE
Session Playbook Step-by-step build plan
CI/CD Pipelines Reusable GitHub Actions for build/scan/sign/deploy
Istio AuthZ Policies Zero-trust network policies

Contributing

Branch naming: feat/, fix/, docs/, refactor/ prefixes

Commit format: Conventional Commits β€” feat(istio): add strict mTLS peer authentication

Requirements: - task lint and task validate must pass - Every component needs a README.md - All Kyverno policies need test suites - All Helm charts need values.schema.json - Never use :latest tags β€” pin specific versions - Never commit secrets or credentials


License

Apache License, Version 2.0. See LICENSE.