K8

Kubernetes Error Handling

Pod failure policies, probes, resource limits, and recovery patterns in Kubernetes

Details

Language / Topic
kubernetesKubernetes
Category
Error Handling

Rules

balanced
- Add `livenessProbe` and `readinessProbe` to every container — readiness gates traffic routing, liveness triggers automatic restarts: `readinessProbe: { httpGet: { path: /health, port: 8080 }, initialDelaySeconds: 5 }`.
- Set `resources.requests` and `resources.limits` on every container — without limits, a runaway process starves other pods on the node and triggers OOMKill without bound.
- Use `restartPolicy: OnFailure` for Jobs and `BackoffLimit` to cap retry loops: `spec: { backoffLimit: 4, template: { spec: { restartPolicy: OnFailure } } }`.
- Use `PodDisruptionBudget` to ensure at least one replica is available during rolling updates and voluntary disruptions.
- Configure both `livenessProbe` and `readinessProbe`: readiness controls when traffic is sent to the pod; liveness triggers kubelet restart when the app deadlocks — use `httpGet` for HTTP apps and `exec` for command-based health checks.
- Set resource `requests` (for scheduling) and `limits` (for OOMKill and CPU throttle) on every container: `resources: { requests: { cpu: '100m', memory: '128Mi' }, limits: { cpu: '500m', memory: '512Mi' } }` — missing limits cause noisy-neighbor problems.
- Add a `startupProbe` for slow-starting containers to prevent liveness from killing them during initialization — `startupProbe.failureThreshold * periodSeconds` defines the startup budget.
- Set `terminationGracePeriodSeconds` to allow in-flight requests to complete on pod shutdown — your app must handle SIGTERM and stop accepting new connections within this window.
- Define `PodDisruptionBudget` for every Deployment: `spec: { minAvailable: 1, selector: { matchLabels: { app: api } } }` — prevents node drain from removing all pods simultaneously.
- Use `CrashLoopBackOff` as a diagnostic signal — inspect logs with `kubectl logs <pod> --previous` before the backoff interval resets.