IO Stress/Latency Fault Workflow

This topic describes the flow of control when you execute a IO stress or latency chaos experiment in Harness Chaos Engineering.

The diagram below describes the flow of control for a IO stress or latency experiment.

stress/latency fault flow

IO stress consumes disk resources of the target application by injecting high load on the disk IO.

Latency increases the file operation delays by introducing latency in read/write operations of the target application.

Step 1: Fetch Target Container Info

The chaos helper pod retrieves the pod specification and identifies the containerID of the target application pod.

Step 2: Inspect Container Metadata

The helper pod inspects the container runtime to obtain metadata, including the cgroup details of the target container. This requires permissions such as sudo/root and host path for socket mount.

Step 3: Derive PID of the Target App Container

The helper pod extracts the process ID (PID) of the main process running inside the application container.

Step 4: Prepare Stress / Latency Process

IO Stress	Latency
The PID derived earlier is used to inject a stress process into the target application. The stress process is loaded into memory but kept in a paused state.	The helper pod execs into the PID namespace (`pid_ns`) and mount namespace (`mnt_ns`) of the target container.

Step 5: Transfer IO Stress / Inject Latency Process

IO Stress	Latency
Transfer I/O Stress Process into the Target Container cgroup: Using Linux namespaces (`pid_ns`, `mnt_ns`, and `cgroup`), the stress process is mapped into the target container’s namespace. This ensures that the stress process runs inside the application container cgroup.	Inject Latency using the following: FUSE (Filesystem in Userspace) is leveraged to add delays in file system operations. `ptrace` (Process Tracing) is used to attach and detach processes. Files are mounted, and backed up with delays to introduce latency.

Step 6: Resume Stress Process / Apply Network-Level Constraints

IO Stress	Latency
Resume I/O Stress Process: The stressor starts an intensive disk read/write operations, increasing I/O utilization. This affects the target application’s performance by making disk access slow or unresponsive.	Resume latency process by introducing delays in file IO operations. This slows down the reads, writes and other operations performed on files.

In case of IO stress chaos, after the chaos duration is complete, the helper pod stops the stressor process and cleans up resources.

In case of IO latency chaos, after the chaos duration is complete, the helper pod removes the latency injection rules and restores normal file operations.

Step 1: Fetch Target Container Info​

Step 2: Inspect Container Metadata​

Step 3: Derive PID of the Target App Container​

Step 4: Prepare Stress / Latency Process​

Step 5: Transfer IO Stress / Inject Latency Process​

Step 6: Resume Stress Process / Apply Network-Level Constraints​