Skip to main content

Node CPU hog

Node CPU hog exhausts the CPU resources on a Kubernetes node.

  • The CPU chaos is injected using a helper pod running the Linux stress tool (a workload generator).
  • The chaos affects the application for a period defined by the TOTAL_CHAOS_DURATION environment variable.

Node CPU Hog

Usage

View the uses of the fault
The fault aims to verify the resiliency of applications whose replicas may be evicted on account of nodes turning unschedulable (Not Ready) or new replicas not being able to schedule due to a lack of CPU resources. The fault causes CPU stress on the target node(s). It simulates the situation of lack of CPU for processes running on the application, which degrades their performance. It also helps verify metrics-based horizontal pod autoscaling as well as vertical autoscale, i.e. demand based CPU addition. It helps scalability of nodes based on growth beyond budgeted pods. It verifies the autopilot functionality of (cloud) managed clusters. It benefits include verifying multi-tenant load issues (when the load increases on one container, it does not cause downtime in other containers).

Prerequisites

  • Kubernetes > 1.16.

Default validations

The target nodes should be in the ready state before and after injecting chaos.

Fault tunables

Fault tunables

Mandatory Fields

Variables Description Notes
TARGET_NODES Comma-separated list of nodes subject to node CPU hog.
NODE_LABEL It contains the node label that is used to filter the target nodes.It is mutually exclusive with the TARGET_NODES environment variable. If both are provided, TARGET_NODES takes precedence.

Optional fields

Variables Description Notes
TOTAL_CHAOS_DURATION Duration that you specify, through which chaos is injected into the target resource (in seconds). Defaults to 60s.
LIB_IMAGE Image used to inject stress. Defaults to litmuschaos/go-runner:latest.
RAMP_TIME Period to wait before and after injecting chaos (in seconds). For example, 30s.
NODE_CPU_CORE Number of cores of the CPU to be consumed. Defaults to 2.
NODES_AFFECTED_PERC Percentage of total nodes to target, that takes numeric values only. Defaults to 0 (corresponds to 1 node).
SEQUENCE Sequence of chaos execution for multiple target pods. Defaults to parallel. Supports serial sequence as well.

Fault examples

Common and node-specific tunables

Refer to the common attributes and node-specific tunables to tune the common tunables for all faults and node specific tunables.

Node CPU cores

It contains the number of cores of CPU that will be consumed. You can tune it using the NODE_CPU_CORE environment variable.

Use the following example to tune it:

# stress the CPU of the targeted nodes
apiVersion: litmuschaos.io/v1alpha1
kind: ChaosEngine
metadata:
name: engine-nginx
spec:
engineState: "active"
annotationCheck: "false"
chaosServiceAccount: litmus-admin
experiments:
- name: node-cpu-hog
spec:
components:
env:
# number of CPU cores to be stressed
- name: NODE_CPU_CORE
value: '2'
- name: TOTAL_CHAOS_DURATION
VALUE: '60'

Node CPU load

It contains the percentage of CPU that will be consumed. You can tune it using the CPU_LOAD environment variable.

Use the following example to tune it:

# stress the CPU of the targeted nodes by load percentage
apiVersion: litmuschaos.io/v1alpha1
kind: ChaosEngine
metadata:
name: engine-nginx
spec:
engineState: "active"
annotationCheck: "false"
chaosServiceAccount: litmus-admin
experiments:
- name: node-cpu-hog
spec:
components:
env:
# percentage of CPU to be stressed
- name: CPU_LOAD
value: "100"
# node CPU core should be provided as 0 for CPU load
# to work otherwise it will take CPU core as priority
- name: NODE_CPU_CORE
value: '0'
- name: TOTAL_CHAOS_DURATION
VALUE: '60'