Linux CPU stress
Linux CPU stress is a chaos fault that runs WORKERS busy workers at LOAD percent utilization each on the target Linux machine for DURATION, then stops the workers and frees the CPU. The fault runs through the Linux Chaos Infrastructure (LCI) systemd service installed on the target VM and consumes CPU cycles in the same process tree.
Use this fault to test how a workload behaves when compute headroom shrinks: whether application latency stays inside the SLA, whether autoscaling reacts on CPU pressure, whether noisy-neighbour effects appear on other processes on the same VM, and whether monitoring detects the load within the alerting SLA.
If you have not installed the Linux Chaos Infrastructure yet, go to Linux Chaos Infrastructure to install the agent and connect the VM to the control plane.
Use cases
Run this fault when you want to answer concrete questions like:
- CPU headroom: When
WORKERScores are pinned atLOAD%, does the application stay inside its latency SLA? - Autoscaling fidelity: Do CPU-driven autoscaling rules (VM autoscaler, custom scripts) trigger within the alerting SLA?
- Noisy neighbour: Do other processes co-located on the same VM degrade when the chaos workers consume their share?
- Throttling and back-pressure: Do upstream callers honor back-pressure when the function under test slows down?
Prerequisites
- Linux Chaos Infrastructure installed: The
linux-chaos-infrastructuresystemd service isactiveon the target VM and the infrastructure is inCONNECTEDstate. Go to Linux Chaos Infrastructure to install it. - CPU headroom for chaos: The VM has at least
WORKERSavailable cores. Pinning more workers than the host has cores still works but does not increase total load. - stress-ng available: The fault uses
stress-ng, which is installed by the LCI installer. No manual install is required.
Supported environments
The fault has been tested on the following Linux distributions. Go to Linux fault requirements to see the full compatibility matrix.
| Platform | Support status |
|---|---|
| Ubuntu 16+, Debian 10+ | Supported |
| CentOS 7+, RHEL 7+, Fedora 30+ | Supported |
| openSUSE LEAP 15.4+ / SUSE Linux Enterprise 15+ | Supported |
| Architectures | x86_64, arm64 (matches the LCI agent installer) |
Permissions required
This fault is classified as a Basic Linux fault. It runs with the privileges of the Linux Chaos Infrastructure systemd service (root user and root user group) on the target VM. No cloud credentials are needed.
Fault tunables
Configure the following fault parameters when you add Linux CPU stress to an experiment in Chaos Studio. Defaults are shown for reference.
Chaos parameters
| Tunable | Description | Default |
|---|---|---|
DURATION | Total duration of the fault. Accepts [hours]h[minutes]m[seconds]s format (for example, 30s, 1m25s, 1h3m2s). | 30s |
LOAD | CPU load percentage to apply per worker (0 to 100). A value of 0 is treated as full load (100%). | 0 |
WORKERS | Number of CPU workers to stress. Each worker pins one core at LOAD percent utilization. | 1 |
RAMP_TIME | Wait period in seconds before and after the fault. Go to ramp time to read how it is applied. | 0 |
Tunables that apply to every fault are documented in common tunables for all faults.
Fault execution in brief
Spawns WORKERS stress-ng workers at LOAD percent utilization for DURATION, then stops the workers and releases CPU back to the system.
Expected behavior during fault execution
- CPU utilization on the target VM rises by approximately
WORKERS x LOADpercent for the duration of the fault. - Application processes co-located on the VM slow down in proportion to their CPU share.
- CPU-driven monitoring (
cpu_user,cpu_system,node_cpu_seconds_total{mode!="idle"}) reports the elevated utilization. - After the duration ends, the workers exit and CPU returns to baseline.
The chaos workers exit when DURATION elapses. CPU returns to baseline immediately; no rollback is required.
Signals to watch
Attach resilience probes to assert each layer:
- CPU utilization on the VM: Use a Prometheus probe on
node_cpu_seconds_total(rate over the chaos window) and assert it rises by the expected amount. - Application latency: Use an HTTP probe on the user-visible endpoint and assert p95/p99 stays inside the SLA.
- Autoscaling: Use a command probe to verify the autoscaler added the expected capacity.
Verify the fault execution effect
While the experiment is running, confirm CPU was loaded and then released:
-
Observe live CPU on the target VM.
top -bn1 | head -5mpstat -P ALL 1 5You should see
WORKERSworker cores pinned near the configuredLOADpercent. -
List the chaos workers.
ps -ef | grep -E "stress-ng" | grep -v grepThe workers exit when the chaos duration ends.
-
Inspect Linux Chaos Infrastructure logs.
sudo journalctl -u linux-chaos-infrastructure -n 100 --no-pagerLook for the fault start, the worker count, and the fault end markers.
Recovery and cleanup
- End of duration: The chaos workers exit when
DURATIONelapses; CPU returns to baseline. - Abort the experiment: Stopping the experiment from Chaos Studio signals the chaos workers to exit.
- Manual recovery: If a worker survives an abort, kill it with
sudo pkill -f stress-ngon the target VM. - Workload recovery: Application processes resume normal CPU share as soon as the workers exit.
Limitations
- Single VM scope: Each fault run targets one VM (the VM hosting the selected Linux Chaos Infrastructure).
- Worker pinning:
WORKERSis honored but pinning more workers than the host has cores does not increase total load beyond 100% per core. - Time-bounded only: The fault is duration-based; there is no signal-driven exit before
DURATIONends (other than aborting the experiment). - Same process tree: Workers run inside the LCI service tree. If the LCI service is killed mid-experiment, the workers exit with it.
Troubleshooting
Linux CPU stress fault shows no measurable CPU rise in Harness Chaos Engineering
Confirm the linux-chaos-infrastructure systemd service is active on the target VM and the infrastructure is in CONNECTED state in Chaos Studio. Then verify LOAD is greater than 0 (or left at the default of 0 which maps to 100%) and that WORKERS is at least 1.
stress-ng not found
stress-ng is installed by the Linux Chaos Infrastructure installer. Re-run the installer or install it manually with the distro package manager (for example, sudo apt install stress-ng on Ubuntu).
CPU stays elevated after the experiment ends
If stress-ng workers survive an abort, kill them with sudo pkill -f stress-ng on the target VM and check the linux-chaos-infrastructure systemd service logs with sudo journalctl -u linux-chaos-infrastructure for the abort path.
Related faults
- Linux memory stress: Apply memory stress instead of CPU.
- Linux disk I/O stress: Apply I/O stress to the disk.
- Linux JVM CPU stress: Apply CPU stress inside a target Java process instead of the whole VM.