Skip to main content

Pod CPU hog

Pod CPU hog is a Kubernetes pod-level chaos fault that excessively consumes CPU resources, resulting in a significant increase in the CPU resource usage of a pod. This fault applies stress on the target pods by simulating lack of CPU for processes running on the Kubernetes application. This degrades the performance of the application.

Pod CPU Hog

Use cases

CPU hog:

  • Simulates a situation where the application's CPU resource usage unexpectedly increases.
  • Verifies metrics-based horizontal pod autoscaling as well as vertical autoscale, that is, demand based CPU addition.
  • Facilitates scalability of the nodes based on the growth beyond budgeted pods.
  • Verifies the autopilot functionality of cloud managed clusters.
  • Verifies multi-tenant load issues, that is, when the load increases on one container, this does not cause downtime in other containers.

Permissions required

Below is a sample Kubernetes role that defines the permissions required to execute the fault.

apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
namespace: hce
name: pod-cpu-hog
spec:
definition:
scope: Cluster # Supports "Namespaced" mode too
permissions:
- apiGroups: [""]
resources: ["pods"]
verbs: ["create", "delete", "get", "list", "patch", "deletecollection", "update"]
- apiGroups: [""]
resources: ["events"]
verbs: ["create", "get", "list", "patch", "update"]
- apiGroups: [""]
resources: ["pods/log"]
verbs: ["get", "list", "watch"]
- apiGroups: [""]
resources: ["deployments, statefulsets"]
verbs: ["get", "list"]
- apiGroups: [""]
resources: ["replicasets, daemonsets"]
verbs: ["get", "list"]
- apiGroups: [""]
resources: ["chaosEngines", "chaosExperiments", "chaosResults"]
verbs: ["create", "delete", "get", "list", "patch", "update"]
- apiGroups: ["batch"]
resources: ["jobs"]
verbs: ["create", "delete", "get", "list", "deletecollection"]

Prerequisites

  • Kubernetes > 1.16
  • The application pods should be in the running state before and after injecting chaos.

Optional tunables

Tunable Description Notes
CPU_CORES Number of CPU cores subject to CPU stress. Default: 1. For more information, go to CPU cores
NODE_LABEL Node label used to filter the target node if TARGET_NODE environment variable is not set. It is mutually exclusive with the TARGET_NODE environment variable. If both are provided, the fault uses TARGET_NODE. For more information, go to node label.
CPU_LOAD Perentage of CPU to be consumed. For more information, go to CPU load
TOTAL_CHAOS_DURATION Duration for which to insert chaos (in seconds). Default: 60 s. For more information, go to duration of the chaos
TARGET_PODS Comma-separated list of application pod names subject to pod CPU hog. If this value is not provided, the fault selects the target pods randomly based on provided appLabels. For more information, go to target specific pods
TARGET_CONTAINER Name of the target container under stress. If this value is not provided, the fault selects the first container of the target pod. For more information, go to target specific container
PODS_AFFECTED_PERC Percentage of total pods to target. Provide numeric values. Default: 0 (corresponds to 1 replica). For more information, go to pod affected percentage
CONTAINER_RUNTIME Container runtime interface for the cluster Default: containerd. Supports docker, containerd and crio. For more information, go to container runtime
SOCKET_PATH Path of the containerd or crio or docker socket file. Default: /run/containerd/containerd.sock. For more information, go to socket path
RAMP_TIME Period to wait before injecting chaos (in seconds). For example, 30 s. For more information, go to ramp time
LIB_IMAGE Image used to inject chaos. Default: harness/chaos-go-runner:main-latest. For more information, go to image used by the helper pod.
SEQUENCE Sequence of chaos execution for multiple target pods. Default: parallel. Supports serial and parallel. For more information, go to sequence of chaos execution

CPU cores

Number of CPU cores to target. Tune it by using the CPU_CORE environment variable.

The following YAML snippet illustrates the use of this environment variable:

# CPU cores for the stress
apiVersion: litmuschaos.io/v1alpha1
kind: ChaosEngine
metadata:
name: engine-nginx
spec:
engineState: "active"
annotationCheck: "false"
appinfo:
appns: "default"
applabel: "app=nginx"
appkind: "deployment"
chaosServiceAccount: litmus-admin
experiments:
- name: pod-cpu-hog
spec:
components:
env:
# CPU cores for stress
- name: CPU_CORES
value: '1'
- name: TOTAL_CHAOS_DURATION
value: '60'

CPU load

Percentage of CPU to be consumed. Tune it by using the CPU_LOAD environment variable.

The following YAML snippet illustrates the use of this environment variable:

# CPU load for the stress
apiVersion: litmuschaos.io/v1alpha1
kind: ChaosEngine
metadata:
name: engine-nginx
spec:
engineState: "active"
annotationCheck: "false"
appinfo:
appns: "default"
applabel: "app=nginx"
appkind: "deployment"
chaosServiceAccount: litmus-admin
experiments:
- name: pod-cpu-hog
spec:
components:
env:
# CPU load in percentage for the stress
- name: CPU_LOAD
value: "100"
# CPU core should be provided as 0 for CPU load
# to work, otherwise it will take CPU core as priority
- name: CPU_CORES
value: "0"
- name: TOTAL_CHAOS_DURATION
value: "60"

Container runtime and socket path

The CONTAINER_RUNTIME and SOCKET_PATH environment variables to set the container runtime and socket file path, respectively.

  • CONTAINER_RUNTIME: It supports docker, containerd, and crio runtimes. The default value is containerd.
  • SOCKET_PATH: It contains path of containerd socket file by default(/run/containerd/containerd.sock). For docker, specify the path as /var/run/docker.sock. For crio, specify the path as /var/run/crio/crio.sock.

The following YAML snippet illustrates the use of this environment variable:

## provide the container runtime and socket file path
apiVersion: litmuschaos.io/v1alpha1
kind: ChaosEngine
metadata:
name: engine-nginx
spec:
engineState: "active"
annotationCheck: "false"
appinfo:
appns: "default"
applabel: "app=nginx"
appkind: "deployment"
chaosServiceAccount: litmus-admin
experiments:
- name: pod-cpu-hog
spec:
components:
env:
# runtime for the container
# supports docker, containerd, crio
- name: CONTAINER_RUNTIME
value: "containerd"
# path of the socket file
- name: SOCKET_PATH
value: "/run/containerd/containerd.sock"
- name: TOTAL_CHAOS_DURATION
VALUE: "60"