Chaos Faults for Linux
Introduction
Linux faults disrupt resources on a target Linux machine through the Linux Chaos Infrastructure (LCI) systemd service installed on the VM. Use them to test how a workload behaves under CPU/memory/disk pressure, network degradation, DNS outages, JVM-level faults, and API-level faults injected by a local proxy.
Go to Linux Chaos Infrastructure to install the agent and connect a VM to the control plane, then go to Linux fault requirements for the supported OS distributions and the basic/advanced permission tiers.
Linux CPU stress
Linux CPU stress runs WORKERS busy workers at LOAD percent utilization each on the target Linux machine for DURATION. Use it to test how a workload behaves when compute headroom shrinks.
Linux memory stress
Linux memory stress allocates MEMORY of memory across WORKERS workers on the target Linux machine for DURATION. Use it to test how a workload behaves under memory pressure and OOM conditions.
Linux disk fill
Linux disk fill writes a file under FILL_PATH until it occupies FILL_STORAGE for DURATION, then removes the file. Use it to test how a workload behaves when its writable storage fills up.
Linux disk IO stress
Linux disk I/O stress runs WORKERS I/O workers that consume FILE_SYSTEM_UTILISATION of the filesystem at VOLUME_MOUNT_PATH for DURATION. Use it to test how a workload behaves when disk bandwidth is saturated.
Linux fs fill
Linux fs fill writes a file under FILL_PATH until it occupies FILL_STORAGE for DURATION, then removes the file. Use it to test how a workload behaves when its writable storage fills up.
Linux DNS error
Linux DNS error returns DNS failures for host names matching HOST_NAMES on the target Linux machine for DURATION. Use it to test how a workload behaves during a DNS outage.
Linux CPU stress
Linux CPU stress runs WORKERS busy workers at LOAD percent utilization each on the target Linux machine for DURATION, then frees the CPU. Use it to test how a workload behaves when compute headroom shrinks.Use cases
Linux memory stress
Linux memory stress allocates MEMORY of memory across WORKERS workers on the target Linux machine for DURATION, then frees the memory. Use it to test how a workload behaves under memory pressure and OOM conditions.Use cases
Linux disk fill
Linux disk fill writes a file under FILL_PATH until it occupies FILL_STORAGE for DURATION, then removes the file. Use it to test how a workload behaves when its writable storage fills up.Use cases
ENOSPC handling in write paths.
Linux disk IO stress
Linux disk I/O stress runs WORKERS I/O workers that consume FILE_SYSTEM_UTILISATION of the filesystem at VOLUME_MOUNT_PATH for DURATION. Use it to test how a workload behaves when disk bandwidth is saturated.Use cases
Linux fs fill
Linux fs fill writes a file under FILL_PATH until it occupies FILL_STORAGE for DURATION, then removes the file. It targets a filesystem path with a smaller tunable surface than Linux disk fill.Use cases
ENOSPC handling in write paths.
Linux DNS error
Linux DNS error returns DNS failures for host names matching HOST_NAMES (filtered by MATCH_SCHEME) on the target Linux machine for DURATION. Use it to test how a workload behaves during a DNS outage.Use cases
Linux DNS spoof
Linux DNS spoof resolves host names in SPOOF_MAP to the configured spoofed IP addresses on the target Linux machine for DURATION. Use it to test how a workload behaves when DNS resolves to unexpected endpoints.Use cases
Linux network loss
Linux network loss drops NETWORK_PACKET_LOSS_PERCENTAGE percent of packets leaving the target Linux machine on NETWORK_INTERFACES for DURATION. Use it to test how a workload behaves when the network is unreliable.Use cases
Linux network latency
Linux network latency adds NETWORK_LATENCY milliseconds of delay (plus optional JITTER) to packets leaving the target Linux machine on NETWORK_INTERFACES for DURATION. Use it to test how a workload behaves when the network is slow.Use cases
Linux network corruption
Linux network corruption bit-flips NETWORK_PACKET_CORRUPTION_PERCENTAGE percent of egress packets on NETWORK_INTERFACES of the target Linux machine for DURATION. Use it to test how a workload behaves when packet payloads are damaged.Use cases
Linux network duplication
Linux network duplication duplicates NETWORK_PACKET_DUPLICATION_PERCENTAGE percent of egress packets on NETWORK_INTERFACES of the target Linux machine for DURATION. Use it to test how a workload behaves under at-least-once delivery.Use cases
Linux network rate limit
Linux network rate limit throttles egress bandwidth on NETWORK_INTERFACES of the target Linux machine to NETWORK_BANDWIDTH (with BURST and LIMIT) for DURATION. Use it to test how a workload behaves when bandwidth is constrained.Use cases
Linux process kill
Linux process kill terminates processes matching PROCESS_IDS, PROCESS_NAMES, or PROCESS_COMMAND on the target Linux machine for DURATION. Use it to test how a workload behaves when a critical process disappears.Use cases
SIGTERM vs abrupt termination on SIGKILL.
Linux service restart
Linux service restart stops the systemd services in SERVICES and starts them again after INTERVAL, repeating for DURATION. With SELF_HEALING_SERVICES=true, the fault relies on systemd auto-restart.Use cases
Restart=on-failure triggers within the expected window.systemd_unit_state and end-to-end availability fire within the alerting SLA.
Linux time chaos
Linux time chaos skews the system clock on the target Linux machine by OFFSET for DURATION. With DISABLE_NTP=true, NTP is paused to keep the skew stable.Use cases
Linux JVM CPU stress
Linux JVM CPU stress uses Byteman to pin CPU cores of busy work inside the target Java process for DURATION. Use it to test how a Java workload behaves when its own threads pin the CPU.Use cases
Linux JVM memory stress
Linux JVM memory stress uses Byteman to consume memory in the heap or stack of the target Java process for DURATION. Use it to test how a Java workload behaves under memory pressure.Use cases
OutOfMemoryError handling and recovery.
Linux JVM method exception
Linux JVM method exception uses Byteman to throw EXCEPTION from CLASS.METHOD of the target Java process for DURATION. Use it to test how a Java workload handles unexpected exceptions from a hot method.Use cases
try/catch discipline.
Linux JVM method latency
Linux JVM method latency uses Byteman to add LATENCY milliseconds of delay to every invocation of CLASS.METHOD in the target Java process for DURATION. Use it to test how a Java workload behaves when an internal method gets slow.Use cases
Linux JVM modify return
Linux JVM modify return uses Byteman to overwrite the return value of CLASS.METHOD with RETURN in the target Java process for DURATION. Use it to test how callers handle unexpected return data.Use cases
Integer.MAX_VALUE) to surface boundary errors.
Linux JVM trigger GC
Linux JVM trigger GC uses Byteman to force garbage collection events in the target Java process for DURATION. Use it to test how a Java workload behaves under repeated GC pressure.Use cases
jvm_gc_pause_seconds fire when expected.
Linux API block
Linux API block starts a local proxy on the target Linux machine and returns STATUS_CODE for matching API calls (filtered by path, header, method, source/destination, and direction) for DURATION. Use it to test how callers handle a sudden API outage.Use cases
Linux API latency
Linux API latency starts a local proxy on the target Linux machine and adds LATENCY of delay to matching API requests (in request, response, or both directions) for DURATION. Use it to test how callers handle slow API responses.Use cases
Linux API modify body
Linux API modify body starts a local proxy on the target Linux machine and overwrites the body of matching API calls with RESPONSE_BODY (in request, response, or both directions) for DURATION. Use it to test how callers handle unexpected payloads.Use cases
Linux API modify header
Linux API modify header starts a local proxy on the target Linux machine and replaces header values in matching API calls with the keys/values from HEADERS_MAP (in request, response, or both directions) for DURATION. Use it to test how callers handle altered headers.Use cases
Authorization is stale or invalid.Accept or Content-Type.Cache-Control or ETag.
Linux API status code
Linux API status code starts a local proxy on the target Linux machine and overrides matching API responses with STATUS_CODE (and optionally RESPONSE_BODY) for DURATION. Use it to test how callers handle specific error responses.Use cases
5xx vs 4xx classes.429) to verify back-off.404) for content filtering.