Pod IO stress
Pod I/O stress is a Kubernetes pod-level chaos fault that causes I/O stress on the application pod by increasing the number of input and output requests. Applying stress on the disk with continuous and heavy I/O degrades the reads and writes with respect to the microservices. Scratch space consumed on a node may lead to lack of memory for new containers to be scheduled. All these aspects increase resilience to stress.
Use cases
Pod IO stress:
- Aims to verify the resilience of applications that share the disk resource for ephemeral (or persistent) storage.
- Simulates slower disk operations by the application.
- Simulates noisy neighbour problems by hogging the disk bandwidth.
- Verifies the disk performance on increasing I/O threads and varying I/O block sizes.
- Checks how the application functions under high disk latency conditions and when I/O traffic is very high.
- Checks how the application functions under large I/O blocks, and when other services monopolize the I/O disks.
Permissions required
Below is a sample Kubernetes role that defines the permissions required to execute the fault.
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
namespace: hce
name: pod-io-stress
spec:
definition:
scope: Cluster # Supports "Namespaced" mode too
permissions:
- apiGroups: [""]
resources: ["pods"]
verbs: ["create", "delete", "get", "list", "patch", "deletecollection", "update"]
- apiGroups: [""]
resources: ["events"]
verbs: ["create", "get", "list", "patch", "update"]
- apiGroups: [""]
resources: ["pods/log"]
verbs: ["get", "list", "watch"]
- apiGroups: [""]
resources: ["deployments, statefulsets"]
verbs: ["get", "list"]
- apiGroups: [""]
resources: ["replicasets, daemonsets"]
verbs: ["get", "list"]
- apiGroups: [""]
resources: ["chaosEngines", "chaosExperiments", "chaosResults"]
verbs: ["create", "delete", "get", "list", "patch", "update"]
- apiGroups: ["batch"]
resources: ["jobs"]
verbs: ["create", "delete", "get", "list", "deletecollection"]
Prerequisites
- Kubernetes > 1.16
- The application pods should be in the running state before and after injecting chaos.
Optional tunables
Tunable | Description | Notes |
---|---|---|
FILESYSTEM_UTILIZATION_PERCENTAGE | Specifies the size as a percentage of free space on the file system. | Default: 10 %. For more information, go to file system utilization percentage |
FILESYSTEM_UTILIZATION_BYTES | Specifies the size in gigabytes (GB). FILESYSTEM_UTILIZATION_PERCENTAGE and FILESYSTEM_UTILIZATION_BYTES are mutually exclusive. If both the values are provided, FILESYSTEM_UTILIZATION_PERCENTAGE takes priority. For more information, go to file system utilization bytes | |
NUMBER_OF_WORKERS | Number of IO workers involved in IO disk stress. | Default: 4. For more information, go to workers for stress |
TOTAL_CHAOS_DURATION | Duration for which to insert chaos (in seconds). | Default: 120 s. For more information, go to duration of the chaos |
NODE_LABEL | Node label used to filter the target node if TARGET_NODE environment variable is not set. | It is mutually exclusive with the TARGET_NODE environment variable. If both are provided, the fault uses TARGET_NODE . For more information, go to node label. |
VOLUME_MOUNT_PATH | Fill the given volume mount path. | For more information, go to mount path |
TARGET_PODS | Comma-separated list of application pod names subject to pod IO stress. | If not provided, the fault selects target pods randomly based on provided appLabels. For more information, go to target specific pods |
PODS_AFFECTED_PERC | Percentage of total pods to target. Provide numeric values. | Default: 0 (corresponds to 1 replica). For more information, go to pod affected percentage |
LIB_IMAGE | Image used to inject chaos. | Default: harness/chaos-go-runner:main-latest . For more information, go to image used by the helper pod. |
CONTAINER_RUNTIME | Container runtime interface for the cluster. | Default: containerd. Supports docker, containerd and crio. For more information, go to container runtime |
SOCKET_PATH | Path of the containerd or crio or docker socket file. | Default: /run/containerd/containerd.sock For more information, go to socket path |
RAMP_TIME | Period to wait before and after injecting chaos (in seconds). | For example, 30 s. For more information, go to ramp time |
SEQUENCE | Sequence of chaos execution for multiple target pods. | Default: parallel. Supports serial and parallel. For more information, go to sequence of chaos execution |
File system utilization percentage
Amount (in percentage) of free space in the pod. Tune it by using the FILESYSTEM_UTILIZATION_PERCENTAGE
environment variable.
The following YAML snippet illustrates the use of this environment variable:
# stress the i/o of the targeted pod with FILESYSTEM_UTILIZATION_PERCENTAGE of total free space
# it is mutually exclusive with the FILESYSTEM_UTILIZATION_BYTES.
# if both are provided then it will use FILESYSTEM_UTILIZATION_PERCENTAGE for stress
apiVersion: litmuschaos.io/v1alpha1
kind: ChaosEngine
metadata:
name: engine-nginx
spec:
engineState: "active"
annotationCheck: "false"
appinfo:
appns: "default"
applabel: "app=nginx"
appkind: "deployment"
chaosServiceAccount: litmus-admin
experiments:
- name: pod-io-stress
spec:
components:
env:
# percentage of free space of file system, need to be stressed
- name: FILESYSTEM_UTILIZATION_PERCENTAGE
value: "10" #in GB
- name: TOTAL_CHAOS_DURATION
VALUE: "60"
File system utilization bytes
Amount of free space available in the pod in gigabytes (GB). Tune it by using the FILESYSTEM_UTILIZATION_BYTES
environment variable.
FILESYSTEM_UTILIZATION_PERCENTAGE
and FILESYSTEM_UTILIZATION_BYTES
environment variables are mutually exclusive. If both the values are provided, FILESYSTEM_UTILIZATION_PERCENTAGE
takes priority.
The following YAML snippet illustrates the use of this environment variable:
# stress the i/o of the targeted pod with given FILESYSTEM_UTILIZATION_BYTES
# it is mutually exclusive with the FILESYSTEM_UTILIZATION_PERCENTAGE.
# if both are provided then it will use FILESYSTEM_UTILIZATION_PERCENTAGE for stress
apiVersion: litmuschaos.io/v1alpha1
kind: ChaosEngine
metadata:
name: engine-nginx
spec:
engineState: "active"
annotationCheck: "false"
appinfo:
appns: "default"
applabel: "app=nginx"
appkind: "deployment"
chaosServiceAccount: litmus-admin
experiments:
- name: pod-io-stress
spec:
components:
env:
# size of io to be stressed
- name: FILESYSTEM_UTILIZATION_BYTES
value: "1" #in GB
- name: TOTAL_CHAOS_DURATION
VALUE: "60"
Container runtime and socket path
The CONTAINER_RUNTIME
and SOCKET_PATH
environment variables to set the container runtime and socket file path, respectively.
CONTAINER_RUNTIME
: It supportsdocker
,containerd
, andcrio
runtimes. The default value iscontainerd
.SOCKET_PATH
: It contains path of containerd socket file by default(/run/containerd/containerd.sock
). Fordocker
, specify path as/var/run/docker.sock
. Forcrio
, specify path as/var/run/crio/crio.sock
.
The following YAML snippet illustrates the use of this environment variable:
## provide the container runtime and socket file path
apiVersion: litmuschaos.io/v1alpha1
kind: ChaosEngine
metadata:
name: engine-nginx
spec:
engineState: "active"
annotationCheck: "false"
appinfo:
appns: "default"
applabel: "app=nginx"
appkind: "deployment"
chaosServiceAccount: litmus-admin
experiments:
- name: pod-io-stress
spec:
components:
env:
# runtime for the container
# supports docker, containerd, crio
- name: CONTAINER_RUNTIME
value: "containerd"
# path of the socket file
- name: SOCKET_PATH
value: "/run/containerd/containerd.sock"
- name: TOTAL_CHAOS_DURATION
VALUE: "60"
Mount path
Volume mount path that is to be filled. Tune it by using the VOLUME_MOUNT_PATH
environment variable.
The following YAML snippet illustrates the use of this environment variable:
# provide the volume mount path, which needs to be filled
apiVersion: litmuschaos.io/v1alpha1
kind: ChaosEngine
metadata:
name: engine-nginx
spec:
engineState: "active"
annotationCheck: "false"
appinfo:
appns: "default"
applabel: "app=nginx"
appkind: "deployment"
chaosServiceAccount: litmus-admin
experiments:
- name: pod-io-stress
spec:
components:
env:
# path need to be stressed/filled
- name: VOLUME_MOUNT_PATH
value: "/some-dir-in-container"
- name: TOTAL_CHAOS_DURATION
VALUE: "60"
Workers for stress
Number of workers for the stress. Tune it by using the NUMBER_OF_WORKERS
environment variable.
The following YAML snippet illustrates the use of this environment variable:
# number of workers for the stress
apiVersion: litmuschaos.io/v1alpha1
kind: ChaosEngine
metadata:
name: engine-nginx
spec:
engineState: "active"
annotationCheck: "false"
appinfo:
appns: "default"
applabel: "app=nginx"
appkind: "deployment"
chaosServiceAccount: litmus-admin
experiments:
- name: pod-io-stress
spec:
components:
env:
# number of io workers
- name: NUMBER_OF_WORKERS
value: "4"
- name: TOTAL_CHAOS_DURATION
VALUE: "60"