Skip to main content

Node Status Check

Node status check validates the current state of Kubernetes nodes.

Infrastructure type

  • Kubernetes

Use cases

Node Status Check probe helps you:

  • Verify nodes remain healthy during chaos experiments
  • Validate node recovery after failures
  • Monitor cluster health during node-level chaos
  • Ensure node availability during infrastructure disruptions

Overview

This probe validates that Kubernetes nodes are in a healthy state during chaos experiments. It supports filtering by node names or labels, making it suitable for checking specific nodes or groups of nodes in your cluster.

Probe type

Command Probe

Prerequisites

  • Kubernetes cluster with chaos infrastructure installed
  • Access to cluster nodes
  • Sufficient RBAC permissions to query node status

Probe properties

Command

healthchecks -name node-level

Comparator

TypeCriteriaValue
stringcontains[Pass]

The probe passes when the command output contains [Pass], indicating all specified nodes are in a healthy state.

Environment variables

VariableDescriptionRequiredDefault
TARGET_NODEComma-separated list of nodes to be checked. One of TARGET_NODE, TARGET_NODES, or NODE_LABEL must be specified.No*-
TARGET_NODESComma-separated list of nodes to be checked. One of TARGET_NODE, TARGET_NODES, or NODE_LABEL must be specified.No*-
NODE_LABELNode label to filter nodes to be checked (e.g., node-role.kubernetes.io/worker=). One of TARGET_NODE, TARGET_NODES, or NODE_LABEL must be specified.No*-
STATUS_CHECK_TIMEOUTMaximum time in seconds to wait for status check.No180
STATUS_CHECK_DELAYDelay in seconds between status checks.No2

Note: At least one of TARGET_NODE, TARGET_NODES, or NODE_LABEL must be provided.


Run properties

PropertyDescriptionTypeDefault
timeoutMaximum time to wait for the probe to complete (e.g., 30s, 1m, 5m)String180s
intervalTime between probe executions (e.g., 1s, 5s, 10s)String1s
attemptNumber of retry attempts before marking the probe as failedInteger1
pollingIntervalTime between retry attempts (e.g., 1s, 5s, 10s)String-
initialDelayInitial delay before starting the probe (e.g., 0s, 10s, 30s)String-
stopOnFailureStop the experiment if the probe failsBooleanfalse
verbosityLog verbosity level (info, debug, trace)String-

Probe definition

You can define this probe in your chaos experiment as follows:

Using node names

probe:
- name: "node-health-check"
type: "cmdProbe"
mode: "Continuous"
cmdProbe/inputs:
command: "healthchecks -name node-level"
comparator:
type: "string"
criteria: "contains"
value: "[Pass]"
env:
- name: TARGET_NODES
value: "node-1,node-2,node-3"
- name: STATUS_CHECK_TIMEOUT
value: "180"
- name: STATUS_CHECK_DELAY
value: "2"
runProperties:
timeout: 180s
interval: 1s
attempt: 1
stopOnFailure: false

Using node labels

probe:
- name: "worker-nodes-check"
type: "cmdProbe"
mode: "Edge"
cmdProbe/inputs:
command: "healthchecks -name node-level"
comparator:
type: "string"
criteria: "contains"
value: "[Pass]"
env:
- name: NODE_LABEL
value: "node-role.kubernetes.io/worker="
- name: STATUS_CHECK_TIMEOUT
value: "120"
runProperties:
timeout: 150s
interval: 2s
attempt: 3
pollingInterval: 5s