Runbook: Pod Security Violation (Kyverno)
Alert
- Prometheus Alert:
KyvernoPolicyViolation - Grafana Dashboard: Kyverno Policy dashboard
- Firing condition: Kyverno reports a policy violation on a resource creation or update, or background scan detects non-compliant existing resources
Severity
Warning -- Policy violations in Enforce mode block resource creation. Violations in Audit mode allow the resource but generate a compliance report. Both require investigation to determine whether the violation is a legitimate security concern or a misconfiguration.
Impact
- Enforce mode: Pod or resource creation is blocked -- the developer's deployment fails
- Audit mode: The resource is created but flagged as non-compliant in PolicyReports
- Compliance posture degradation if violations are not addressed
- Potential security risk if a violation indicates a genuine attempt to bypass controls
Investigation Steps
- Check for recent policy violations:
kubectl get policyreport -A
kubectl get clusterpolicyreport
- Get details on violations in a specific namespace:
kubectl get policyreport -n <namespace> -o yaml
- List all ClusterPolicies and their enforcement mode:
kubectl get clusterpolicies -o custom-columns='NAME:.metadata.name,ACTION:.spec.validationFailureAction,READY:.status.conditions[0].status'
- Check Kyverno admission controller logs for the specific denial:
kubectl logs -n kyverno deployment/kyverno-admission-controller --tail=200 | grep -i "denied\|violation\|blocked"
- Check the events in the namespace where the violation occurred:
kubectl get events -n <namespace> --sort-by='.lastTimestamp' | grep -i "kyverno\|policy"
- If the violation was on a specific pod/deployment, check what triggered it:
kubectl describe deployment <name> -n <namespace>
kubectl get replicaset -n <namespace> -o yaml | grep -A 20 "securityContext"
- Check the specific policy that was violated:
kubectl get clusterpolicy <policy-name> -o yaml
- Check the Kyverno background controller for existing resource violations:
kubectl logs -n kyverno deployment/kyverno-background-controller --tail=100
Resolution
Violation: disallow-privileged-containers
The pod spec requests privileged mode. Fix the deployment:
# Correct security context
spec:
containers:
- name: app
securityContext:
privileged: false
allowPrivilegeEscalation: false
readOnlyRootFilesystem: true
runAsNonRoot: true
capabilities:
drop:
- ALL
Violation: require-run-as-nonroot
The container is running as root. Fix by setting the security context:
spec:
securityContext:
runAsNonRoot: true
containers:
- name: app
securityContext:
runAsNonRoot: true
runAsUser: 1000
Violation: restrict-image-registries
The image is not from the approved Harbor registry. Fix by pulling from Harbor:
spec:
containers:
- name: app
image: harbor.sre.internal/<project>/<image>:<tag>
Violation: disallow-latest-tag
The image uses the :latest tag or has no tag. Fix by pinning an explicit version:
spec:
containers:
- name: app
image: harbor.sre.internal/team-alpha/my-app:v1.2.3
Violation: require-resource-limits
The pod is missing CPU or memory limits. Fix by adding resource constraints:
spec:
containers:
- name: app
resources:
requests:
cpu: 100m
memory: 128Mi
limits:
cpu: 500m
memory: 512Mi
Violation: require-labels
Required labels are missing. Ensure all resources have standard labels:
metadata:
labels:
app.kubernetes.io/name: my-app
app.kubernetes.io/instance: my-app
app.kubernetes.io/version: v1.2.3
app.kubernetes.io/managed-by: Helm
sre.io/team: team-alpha
Legitimate exception needed (platform components)
If a platform component genuinely needs an exception (e.g., NeuVector enforcer requires privileged access):
- Create a PolicyException:
apiVersion: kyverno.io/v2beta1
kind: PolicyException
metadata:
name: <component>-exception
namespace: <namespace>
spec:
exceptions:
- policyName: <policy-name>
ruleNames:
- <rule-name>
match:
any:
- resources:
kinds:
- Pod
namespaces:
- <namespace>
- Document the exception and its justification in the component's README
- Add the exception to Git and let Flux reconcile it
Prevention
- Use the SRE app template Helm charts (
sre-web-app,sre-worker,sre-cronjob) which include compliant security contexts by default - Run
kyverno testlocally before pushing policy changes - Review PolicyReports weekly to catch audit-mode violations before switching policies to Enforce
- Include Kyverno policy validation in CI/CD pipelines
- Educate development teams on pod security requirements via the developer guide
Escalation
- If a legitimate workload is being blocked and no workaround exists: discuss a PolicyException with the security team
- If violations appear to be intentional bypass attempts: escalate to the security team for investigation
- If Kyverno itself is failing (admission controller not responding): this is a P1 -- all admissions may be blocked or uncontrolled