Skip to main content

Pod Resource Utilisation Check

Pod resource utilisation check validates the current resource utilisation metrics of Kubernetes pods.

Infrastructure type

  • Kubernetes

Use cases

Pod Resource Utilisation Check probe helps you:

  • Monitor resource usage during stress chaos experiments
  • Verify resource limits are respected
  • Validate application performance under load
  • Ensure pods don't exceed resource thresholds

Overview

This probe validates that pod resource utilisation (CPU and memory) remains within acceptable limits during chaos experiments. It requires metrics-server to be installed in the cluster.

Probe type

Command Probe

Prerequisites

  • Kubernetes cluster with chaos infrastructure installed
  • Metrics server installed and running in the cluster
  • Access to target namespace and pods
  • Sufficient RBAC permissions to query pod metrics

Probe properties

Command

healthchecks -name pod-resource-metrics-check

Comparator

TypeCriteriaValue
stringcontains[Pass]

The probe passes when the command output contains [Pass], indicating pod resource utilisation is within acceptable limits.

Environment variables

VariableDescriptionRequiredDefault
TARGET_LABELSComma-separated list of target labels to filter pods.No-
TARGET_NAMESComma-separated list of target pod names.No-
TARGET_NAMESPACENamespace of the target pods.No-
TARGET_KINDKind of the target resource (e.g., deployment, statefulset, daemonset).Nodeployment
TARGET_CONTAINERName of the container to check resource metrics.No-
METRIC_TYPEMetric type to check: cpu or memory.Nocpu
CPU_LIMITPods should have CPU usage (in millicores) less than or equal to this value.No1000
MEMORY_LIMITPods should have memory usage (in MB) less than or equal to this value.No1024
STATUS_CHECK_TIMEOUTMaximum time in seconds to wait for status check.No180
STATUS_CHECK_DELAYDelay in seconds between status checks.No2

Run properties

PropertyDescriptionTypeDefault
timeoutMaximum time to wait for the probe to complete (e.g., 30s, 1m, 5m)String180s
intervalTime between probe executions (e.g., 1s, 5s, 10s)String1s
attemptNumber of retry attempts before marking the probe as failedInteger1
pollingIntervalTime between retry attempts (e.g., 1s, 5s, 10s)String-
initialDelayInitial delay before starting the probe (e.g., 0s, 10s, 30s)String-
stopOnFailureStop the experiment if the probe failsBooleanfalse
verbosityLog verbosity level (info, debug, trace)String-

Probe definition

You can define this probe in your chaos experiment as follows:

CPU utilisation check

probe:
- name: "cpu-utilisation-check"
type: "cmdProbe"
mode: "Continuous"
cmdProbe/inputs:
command: "healthchecks -name pod-resource-metrics-check"
comparator:
type: "string"
criteria: "contains"
value: "[Pass]"
env:
- name: TARGET_LABELS
value: "app=nginx"
- name: TARGET_NAMESPACE
value: "production"
- name: METRIC_TYPE
value: "cpu"
- name: CPU_LIMIT
value: "800"
runProperties:
timeout: 180s
interval: 1s
attempt: 1
stopOnFailure: false

Memory utilisation check

probe:
- name: "memory-utilisation-check"
type: "cmdProbe"
mode: "Continuous"
cmdProbe/inputs:
command: "healthchecks -name pod-resource-metrics-check"
comparator:
type: "string"
criteria: "contains"
value: "[Pass]"
env:
- name: TARGET_NAMES
value: "my-app-pod"
- name: TARGET_NAMESPACE
value: "default"
- name: TARGET_CONTAINER
value: "app-container"
- name: METRIC_TYPE
value: "memory"
- name: MEMORY_LIMIT
value: "2048"
runProperties:
timeout: 180s
interval: 2s
attempt: 3