Skip to main content

Node IO stress

Node IO stress causes I/O stress on the Kubernetes node.

  • The amount of I/O stress is specifed as the size in percentage of the total free space available on the file system using FILESYSTEM_UTILIZATION_PERCENTAGE environment variable or in gigabytes(GB) using FILESYSTEM_UTILIZATION_BYTES environment variable.
  • When both the values are provided, FILESYSTEM_UTILIZATION_PERCENTAGE takes precendence.
  • It tests application resiliency on replica evictions that occur due I/O stress on the available disk space.

Node CPU Hog

Usage

View the uses of the fault
The fault aims to verify the resilience of applications that share the disk resource for ephemeral or persistent storage purposes during high disk I/O usage. It simulates slower disk operations by the application and nosiy neighbour problems by hogging the disk bandwidth. It also verifies the disk performance on increasing I/O threads and varying I/O block sizes. It checks if the application functions under high disk latency conditions, when I/O traffic is very high and includes large I/O blocks, and when other services monopolize the I/O disks.

Prerequisites

  • Kubernetes > 1.16.

Default validations

The target nodes should be in the ready state before and after injecting chaos.

Fault tunables

Fault tunables

Mandatory fields

Variables Description Notes
TARGET_NODES Comma-separated list of nodes subject to node I/O stress. For example, node-1,node-2.
NODE_LABEL It contains the node label that is used to filter the target nodes.It is mutually exclusive with the TARGET_NODES environment variable. If both are provided, TARGET_NODES takes precedence.

Optional fields

Variables Description Notes
TOTAL_CHAOS_DURATION Duration that you specify, through which chaos is injected into the target resource (in seconds). Default to 120s.
FILESYSTEM_UTILIZATION_PERCENTAGE Specify the size as a percentage of free space on the file system. Default to 10%
FILESYSTEM_UTILIZATION_BYTES Specify the size of the files used per worker (in GB). FILESYSTEM_UTILIZATION_PERCENTAGE and FILESYSTEM_UTILIZATION_BYTES are mutually exclusive. If both are provided, FILESYSTEM_UTILIZATION_PERCENTAGE takes precedence.
CPU Number of cores of the CPU that will be used. Defaults to 1.
NUMBER_OF_WORKERS Number of I/O workers involved in I/O stress. Defaults to 4.
VM_WORKERS Number of VM workers involved in I/O stress. Defaults to 1.
LIB_IMAGE Image used to run the stress command. Defaults to litmuschaos/go-runner:latest .
RAMP_TIME Period to wait before and after injecting chaos (in seconds). For example, 30s.
NODES_AFFECTED_PERC Percentage of the total nodes to target. It takes numeric values only. Defaults to 0 (corresponds to 1 node).
SEQUENCE Sequence of chaos execution for multiple target pods. Defaults to parallel. Supports serial sequence as well.

Fault examples

Common and node-specific tunables

Refer to the common attributes and node-specific tunables to tune the common tunables for all faults and node specific tunables.

File system utilization percentage

It specifies the amount of free space available on the node (in percentage). You can tune it using the FILESYSTEM_UTILIZATION_PERCENTAGE environment variable.

Use the following example to tune it:

# stress the i/o of the targeted node with FILESYSTEM_UTILIZATION_PERCENTAGE of total free space 
# it is mutually exclusive with the FILESYSTEM_UTILIZATION_BYTES.
# if both are provided then it will use FILESYSTEM_UTILIZATION_PERCENTAGE for stress
apiVersion: litmuschaos.io/v1alpha1
kind: ChaosEngine
metadata:
name: engine-nginx
spec:
engineState: "active"
annotationCheck: "false"
chaosServiceAccount: litmus-admin
experiments:
- name: node-io-stress
spec:
components:
env:
# percentage of total free space of file system
- name: FILESYSTEM_UTILIZATION_PERCENTAGE
value: '10' # in percentage
- name: TOTAL_CHAOS_DURATION
VALUE: '60'

File system utilization bytes

It specifies the amount of free space available on the node (in gigabytes). You can tune it using the FILESYSTEM_UTILIZATION_BYTES environment variable. It is mutually exclusive with the FILESYSTEM_UTILIZATION_PERCENTAGE environment variable. When both the values are provided, FILESYSTEM_UTILIZATION_PERCENTAGE takes precedence.

Use the following example to tune it:

# stress the i/o of the targeted node with given FILESYSTEM_UTILIZATION_BYTES
# it is mutually exclusive with the FILESYSTEM_UTILIZATION_PERCENTAGE.
# if both are provided then it will use FILESYSTEM_UTILIZATION_PERCENTAGE for stress
apiVersion: litmuschaos.io/v1alpha1
kind: ChaosEngine
metadata:
name: engine-nginx
spec:
engineState: "active"
annotationCheck: "false"
chaosServiceAccount: litmus-admin
experiments:
- name: node-io-stress
spec:
components:
env:
# file system to be stress in GB
- name: FILESYSTEM_UTILIZATION_BYTES
value: '500' # in GB
- name: TOTAL_CHAOS_DURATION
VALUE: '60'

Limit CPU utilization

It specifies the CPU usage limit while the CPU undergoes I/O stress. You can tune it using the CPU environment variable.

Use the following example to tune it:

# limit the CPU uses to the provided value while performing io stress
apiVersion: litmuschaos.io/v1alpha1
kind: ChaosEngine
metadata:
name: engine-nginx
spec:
engineState: "active"
annotationCheck: "false"
chaosServiceAccount: litmus-admin
experiments:
- name: node-io-stress
spec:
components:
env:
# number of CPU cores to be stressed
- name: CPU
value: '1'
- name: TOTAL_CHAOS_DURATION
VALUE: '60'

Workers for stress

It specifies the number of I/O and VM workers for the stress. You can tune it using the NUMBER_OF_WORKERS and VM_WORKERS environment variables, respectively.

Use the following example to tune it:

# define the workers count for the i/o and vm
apiVersion: litmuschaos.io/v1alpha1
kind: ChaosEngine
metadata:
name: engine-nginx
spec:
engineState: "active"
annotationCheck: "false"
chaosServiceAccount: litmus-admin
experiments:
- name: node-io-stress
spec:
components:
env:
# total number of io workers involved in stress
- name: NUMBER_OF_WORKERS
value: '4'
# total number of vm workers involved in stress
- name: VM_WORKERS
value: '1'
- name: TOTAL_CHAOS_DURATION
VALUE: '60'