Skip to main content

Container Restart Check

Container restart check validates the restart count of a container.

Infrastructure type

  • Kubernetes

Use cases

Container Restart Check probe helps you:

  • Verify containers don't restart excessively during chaos experiments
  • Monitor container stability during resource stress
  • Validate application resilience to failures
  • Ensure pods maintain healthy restart counts

Overview

This probe validates that container restart counts remain within acceptable thresholds during chaos experiments. It supports filtering by pod names, labels, or resource kinds (Deployment, StatefulSet, DaemonSet, etc.).

Probe type

Command Probe

Prerequisites

  • Kubernetes cluster with chaos infrastructure installed
  • Access to target namespace and pods
  • Sufficient RBAC permissions to query pod status

Probe properties

Command

healthchecks -name validate-container-restart

Comparator

TypeCriteriaValue
stringcontains[Pass]

The probe passes when the command output contains [Pass], indicating container restart counts are within the acceptable threshold.

Environment variables

VariableDescriptionRequiredDefault
TARGET_LABELSComma-separated list of target labels to filter pods (e.g., app=nginx,env=prod).No-
TARGET_NAMESComma-separated list of target pod names.No-
TARGET_NAMESPACENamespace of the target pods.No-
TARGET_KINDKind of the target resource (e.g., deployment, statefulset, daemonset).Nodeployment
TARGET_CONTAINERName of the container to check restart count.No-
CONTAINER_RESTARTMaximum allowed restart count. Restart count should be equal or less than this value.No1
STATUS_CHECK_TIMEOUTMaximum time in seconds to wait for status check.No180
STATUS_CHECK_DELAYDelay in seconds between status checks.No2

Run properties

PropertyDescriptionTypeDefault
timeoutMaximum time to wait for the probe to complete (e.g., 30s, 1m, 5m)String180s
intervalTime between probe executions (e.g., 1s, 5s, 10s)String1s
attemptNumber of retry attempts before marking the probe as failedInteger1
pollingIntervalTime between retry attempts (e.g., 1s, 5s, 10s)String-
initialDelayInitial delay before starting the probe (e.g., 0s, 10s, 30s)String-
stopOnFailureStop the experiment if the probe failsBooleanfalse
verbosityLog verbosity level (info, debug, trace)String-

Probe definition

You can define this probe in your chaos experiment as follows:

Using pod labels

probe:
- name: "container-restart-validation"
type: "cmdProbe"
mode: "Continuous"
cmdProbe/inputs:
command: "healthchecks -name validate-container-restart"
comparator:
type: "string"
criteria: "contains"
value: "[Pass]"
env:
- name: TARGET_LABELS
value: "app=nginx,tier=frontend"
- name: TARGET_NAMESPACE
value: "production"
- name: TARGET_CONTAINER
value: "nginx"
- name: CONTAINER_RESTART
value: "3"
runProperties:
timeout: 180s
interval: 1s
attempt: 1
stopOnFailure: false

Using pod names

probe:
- name: "specific-pod-restart-check"
type: "cmdProbe"
mode: "Edge"
cmdProbe/inputs:
command: "healthchecks -name validate-container-restart"
comparator:
type: "string"
criteria: "contains"
value: "[Pass]"
env:
- name: TARGET_NAMES
value: "my-app-pod-1,my-app-pod-2"
- name: TARGET_NAMESPACE
value: "default"
- name: CONTAINER_RESTART
value: "5"
- name: STATUS_CHECK_TIMEOUT
value: "120"
runProperties:
timeout: 150s
interval: 2s
attempt: 3