Pod Resource Utilisation Check

Pod resource utilisation check validates the current resource utilisation metrics of Kubernetes pods.

Infrastructure type

Kubernetes

Use cases

Pod Resource Utilisation Check probe helps you:

Monitor resource usage during stress chaos experiments
Verify resource limits are respected
Validate application performance under load
Ensure pods don't exceed resource thresholds

Overview

This probe validates that pod resource utilisation (CPU and memory) remains within acceptable limits during chaos experiments. It requires metrics-server to be installed in the cluster.

Probe type

Command Probe

Prerequisites

Kubernetes cluster with chaos infrastructure installed
Metrics server installed and running in the cluster
Access to target namespace and pods
Sufficient RBAC permissions to query pod metrics

Probe properties

Command

healthchecks -name pod-resource-metrics-check

Comparator

Type	Criteria	Value
string	contains	[Pass]

The probe passes when the command output contains [Pass], indicating pod resource utilisation is within acceptable limits.

Environment variables

Variable	Description	Required	Default
`TARGET_LABELS`	Comma-separated list of target labels to filter pods.	No	-
`TARGET_NAMES`	Comma-separated list of target pod names.	No	-
`TARGET_NAMESPACE`	Namespace of the target pods.	No	-
`TARGET_KIND`	Kind of the target resource (e.g., `deployment`, `statefulset`, `daemonset`).	No	deployment
`TARGET_CONTAINER`	Name of the container to check resource metrics.	No	-
`METRIC_TYPE`	Metric type to check: `cpu` or `memory`.	No	cpu
`CPU_LIMIT`	Pods should have CPU usage (in millicores) less than or equal to this value.	No	1000
`MEMORY_LIMIT`	Pods should have memory usage (in MB) less than or equal to this value.	No	1024
`STATUS_CHECK_TIMEOUT`	Maximum time in seconds to wait for status check.	No	180
`STATUS_CHECK_DELAY`	Delay in seconds between status checks.	No	2

Run properties

Property	Description	Type	Default
`timeout`	Maximum time to wait for the probe to complete (e.g., `30s`, `1m`, `5m`)	String	180s
`interval`	Time between probe executions (e.g., `1s`, `5s`, `10s`)	String	1s
`attempt`	Number of retry attempts before marking the probe as failed	Integer	1
`pollingInterval`	Time between retry attempts (e.g., `1s`, `5s`, `10s`)	String	-
`initialDelay`	Initial delay before starting the probe (e.g., `0s`, `10s`, `30s`)	String	-
`stopOnFailure`	Stop the experiment if the probe fails	Boolean	false
`verbosity`	Log verbosity level (`info`, `debug`, `trace`)	String	-

Probe definition

You can define this probe in your chaos experiment as follows:

CPU utilisation check

probe:
  - name: "cpu-utilisation-check"
    type: "cmdProbe"
    mode: "Continuous"
    cmdProbe/inputs:
      command: "healthchecks -name pod-resource-metrics-check"
      comparator:
        type: "string"
        criteria: "contains"
        value: "[Pass]"
      env:
        - name: TARGET_LABELS
          value: "app=nginx"
        - name: TARGET_NAMESPACE
          value: "production"
        - name: METRIC_TYPE
          value: "cpu"
        - name: CPU_LIMIT
          value: "800"
    runProperties:
      timeout: 180s
      interval: 1s
      attempt: 1
      stopOnFailure: false

Memory utilisation check

probe:
  - name: "memory-utilisation-check"
    type: "cmdProbe"
    mode: "Continuous"
    cmdProbe/inputs:
      command: "healthchecks -name pod-resource-metrics-check"
      comparator:
        type: "string"
        criteria: "contains"
        value: "[Pass]"
      env:
        - name: TARGET_NAMES
          value: "my-app-pod"
        - name: TARGET_NAMESPACE
          value: "default"
        - name: TARGET_CONTAINER
          value: "app-container"
        - name: METRIC_TYPE
          value: "memory"
        - name: MEMORY_LIMIT
          value: "2048"
    runProperties:
      timeout: 180s
      interval: 2s
      attempt: 3

Infrastructure type​

Use cases​

Overview​

Probe type​

Prerequisites​

Probe properties​

Command​

Comparator​

Environment variables​

Run properties​

Probe definition​

CPU utilisation check​

Memory utilisation check​