Skip to main content

Chaos Faults for Linux

Last updated on

Introduction

Linux faults disrupt resources on a target Linux machine through the Linux Chaos Infrastructure (LCI) systemd service installed on the VM. Use them to test how a workload behaves under CPU/memory/disk pressure, network degradation, DNS outages, JVM-level faults, and API-level faults injected by a local proxy.

Go to Linux Chaos Infrastructure to install the agent and connect a VM to the control plane, then go to Linux fault requirements for the supported OS distributions and the basic/advanced permission tiers.

[object Object]

Linux CPU stress

Back to top

Linux CPU stress runs WORKERS busy workers at LOAD percent utilization each on the target Linux machine for DURATION, then frees the CPU. Use it to test how a workload behaves when compute headroom shrinks.

Use cases
  • Validate application latency under sustained CPU pressure.
  • Verify CPU-driven autoscaling reacts within the alerting SLA.
  • Surface noisy-neighbour effects on co-located processes.
View details
[object Object]

Linux memory stress

Back to top

Linux memory stress allocates MEMORY of memory across WORKERS workers on the target Linux machine for DURATION, then frees the memory. Use it to test how a workload behaves under memory pressure and OOM conditions.

Use cases
  • Validate application latency when free memory shrinks.
  • Verify the kernel OOM killer targets the expected process.
  • Confirm alerts on memory pressure and swap usage fire within the alerting SLA.
View details
[object Object]

Linux disk fill

Back to top

Linux disk fill writes a file under FILL_PATH until it occupies FILL_STORAGE for DURATION, then removes the file. Use it to test how a workload behaves when its writable storage fills up.

Use cases
  • Validate ENOSPC handling in write paths.
  • Verify log rotation kicks in before the volume fills.
  • Confirm disk-space alerts fire within the alerting SLA.
View details
[object Object]

Linux disk IO stress

Back to top

Linux disk I/O stress runs WORKERS I/O workers that consume FILE_SYSTEM_UTILISATION of the filesystem at VOLUME_MOUNT_PATH for DURATION. Use it to test how a workload behaves when disk bandwidth is saturated.

Use cases
  • Validate database throughput under I/O contention.
  • Surface noisy-neighbour effects on co-located processes sharing the disk.
  • Confirm alerts on disk saturation fire within the alerting SLA.
View details
[object Object]

Linux fs fill

Back to top

Linux fs fill writes a file under FILL_PATH until it occupies FILL_STORAGE for DURATION, then removes the file. It targets a filesystem path with a smaller tunable surface than Linux disk fill.

Use cases
  • Validate ENOSPC handling in write paths.
  • Verify log rotation kicks in before the volume fills.
  • Confirm disk-space alerts fire within the alerting SLA.
View details
[object Object]

Linux DNS error

Back to top

Linux DNS error returns DNS failures for host names matching HOST_NAMES (filtered by MATCH_SCHEME) on the target Linux machine for DURATION. Use it to test how a workload behaves during a DNS outage.

Use cases
  • Validate DNS-failure handling in application clients.
  • Verify local DNS caches absorb failures for previously resolved entries.
  • Confirm alerts on DNS failures and connection errors fire within the alerting SLA.
View details
[object Object]

Linux DNS spoof

Back to top

Linux DNS spoof resolves host names in SPOOF_MAP to the configured spoofed IP addresses on the target Linux machine for DURATION. Use it to test how a workload behaves when DNS resolves to unexpected endpoints.

Use cases
  • Validate TLS pinning by routing real host names to attacker-controlled IPs.
  • Verify host header and certificate verification at the destination.
  • Route a dependency to a stub server without changing application configuration.
View details
[object Object]

Linux network loss

Back to top

Linux network loss drops NETWORK_PACKET_LOSS_PERCENTAGE percent of packets leaving the target Linux machine on NETWORK_INTERFACES for DURATION. Use it to test how a workload behaves when the network is unreliable.

Use cases
  • Validate client timeout handling under packet loss.
  • Verify circuit breakers open within the configured threshold.
  • Surface retry-storm behavior on dropped traffic.
View details
[object Object]

Linux network latency

Back to top

Linux network latency adds NETWORK_LATENCY milliseconds of delay (plus optional JITTER) to packets leaving the target Linux machine on NETWORK_INTERFACES for DURATION. Use it to test how a workload behaves when the network is slow.

Use cases
  • Validate client timeout handling when the network gets slow.
  • Verify p95/p99 stays inside the SLA under added latency.
  • Surface thread-pool starvation when callers hold threads waiting for responses.
View details
[object Object]

Linux network corruption

Back to top

Linux network corruption bit-flips NETWORK_PACKET_CORRUPTION_PERCENTAGE percent of egress packets on NETWORK_INTERFACES of the target Linux machine for DURATION. Use it to test how a workload behaves when packet payloads are damaged.

Use cases
  • Validate TCP retransmit recovery under corruption.
  • Verify UDP-based protocols reject or recover from bad packets.
  • Confirm alerts on retransmits and decode errors fire within the alerting SLA.
View details
[object Object]

Linux network duplication

Back to top

Linux network duplication duplicates NETWORK_PACKET_DUPLICATION_PERCENTAGE percent of egress packets on NETWORK_INTERFACES of the target Linux machine for DURATION. Use it to test how a workload behaves under at-least-once delivery.

Use cases
  • Verify application handlers stay idempotent (no double-charges, no double-writes).
  • Confirm queue consumers detect duplicate messages.
  • Validate inflated egress counters trigger the right alerts.
View details
[object Object]

Linux network rate limit

Back to top

Linux network rate limit throttles egress bandwidth on NETWORK_INTERFACES of the target Linux machine to NETWORK_BANDWIDTH (with BURST and LIMIT) for DURATION. Use it to test how a workload behaves when bandwidth is constrained.

Use cases
  • Validate bulk-transfer behavior when egress is throttled.
  • Verify back-pressure flows through producers without OOM.
  • Confirm alerts on transmit queue length and SLA breach fire within the alerting SLA.
View details
[object Object]

Linux process kill

Back to top

Linux process kill terminates processes matching PROCESS_IDS, PROCESS_NAMES, or PROCESS_COMMAND on the target Linux machine for DURATION. Use it to test how a workload behaves when a critical process disappears.

Use cases
  • Validate systemd/supervisor restart behavior.
  • Verify graceful shutdown on SIGTERM vs abrupt termination on SIGKILL.
  • Confirm alerts on process absence fire within the alerting SLA.
View details
[object Object]

Linux service restart

Back to top

Linux service restart stops the systemd services in SERVICES and starts them again after INTERVAL, repeating for DURATION. With SELF_HEALING_SERVICES=true, the fault relies on systemd auto-restart.

Use cases
  • Validate clean restart and reconnect behavior of dependents.
  • Verify systemd Restart=on-failure triggers within the expected window.
  • Confirm alerts on systemd_unit_state and end-to-end availability fire within the alerting SLA.
View details
[object Object]

Linux time chaos

Back to top

Linux time chaos skews the system clock on the target Linux machine by OFFSET for DURATION. With DISABLE_NTP=true, NTP is paused to keep the skew stable.

Use cases
  • Validate TLS certificate-expiry handling when time jumps forward.
  • Verify JWT/HMAC validation surfaces clean errors under clock skew.
  • Confirm scheduled jobs do not double-fire or skip across the boundary.
View details
[object Object]

Linux JVM CPU stress

Back to top

Linux JVM CPU stress uses Byteman to pin CPU cores of busy work inside the target Java process for DURATION. Use it to test how a Java workload behaves when its own threads pin the CPU.

Use cases
  • Validate request handler tail latency under in-JVM CPU pressure.
  • Verify GC keeps up under additional CPU pressure.
  • Confirm thread-pool occupancy stays inside bounds.
View details
[object Object]

Linux JVM memory stress

Back to top

Linux JVM memory stress uses Byteman to consume memory in the heap or stack of the target Java process for DURATION. Use it to test how a Java workload behaves under memory pressure.

Use cases
  • Validate application latency when the heap fills up.
  • Verify clean OutOfMemoryError handling and recovery.
  • Confirm alerts on JVM memory usage and full GC rate fire within the alerting SLA.
View details
[object Object]

Linux JVM method exception

Back to top

Linux JVM method exception uses Byteman to throw EXCEPTION from CLASS.METHOD of the target Java process for DURATION. Use it to test how a Java workload handles unexpected exceptions from a hot method.

Use cases
  • Validate caller try/catch discipline.
  • Verify higher-level error handling surfaces clean user-visible errors.
  • Confirm retry storms are contained by backoff and circuit breakers.
View details
[object Object]

Linux JVM method latency

Back to top

Linux JVM method latency uses Byteman to add LATENCY milliseconds of delay to every invocation of CLASS.METHOD in the target Java process for DURATION. Use it to test how a Java workload behaves when an internal method gets slow.

Use cases
  • Validate p99 stays inside the SLA when a hot method slows down.
  • Verify caller timeouts fire cleanly without thread-pool starvation.
  • Confirm tail-latency alerts fire within the alerting SLA.
View details
[object Object]

Linux JVM modify return

Back to top

Linux JVM modify return uses Byteman to overwrite the return value of CLASS.METHOD with RETURN in the target Java process for DURATION. Use it to test how callers handle unexpected return data.

Use cases
  • Validate caller-side validation of internal method returns.
  • Inject edge-case values (empty, null, Integer.MAX_VALUE) to surface boundary errors.
  • Simulate stale-cache returns to verify dependent behavior.
View details
[object Object]

Linux JVM trigger GC

Back to top

Linux JVM trigger GC uses Byteman to force garbage collection events in the target Java process for DURATION. Use it to test how a Java workload behaves under repeated GC pressure.

Use cases
  • Validate request handler tail latency under repeated GC events.
  • Verify the chosen collector keeps pause time inside the SLA.
  • Confirm alerts on jvm_gc_pause_seconds fire when expected.
View details
[object Object]

Linux API block

Back to top

Linux API block starts a local proxy on the target Linux machine and returns STATUS_CODE for matching API calls (filtered by path, header, method, source/destination, and direction) for DURATION. Use it to test how callers handle a sudden API outage.

Use cases
  • Validate caller error handling under API failure.
  • Verify circuit breakers open within the configured threshold.
  • Confirm fallback paths or cached responses kick in correctly.
View details
[object Object]

Linux API latency

Back to top

Linux API latency starts a local proxy on the target Linux machine and adds LATENCY of delay to matching API requests (in request, response, or both directions) for DURATION. Use it to test how callers handle slow API responses.

Use cases
  • Validate caller timeout handling when an API gets slow.
  • Verify retries and backoff contain the failure.
  • Confirm alerts on end-to-end p99 fire within the alerting SLA.
View details
[object Object]

Linux API modify body

Back to top

Linux API modify body starts a local proxy on the target Linux machine and overwrites the body of matching API calls with RESPONSE_BODY (in request, response, or both directions) for DURATION. Use it to test how callers handle unexpected payloads.

Use cases
  • Validate body schema validation in dependent code.
  • Test PII redaction by substituting redacted strings.
  • Inject stub responses to exercise downstream paths.
View details
[object Object]

Linux API modify header

Back to top

Linux API modify header starts a local proxy on the target Linux machine and replaces header values in matching API calls with the keys/values from HEADERS_MAP (in request, response, or both directions) for DURATION. Use it to test how callers handle altered headers.

Use cases
  • Validate behavior when Authorization is stale or invalid.
  • Test content negotiation by overriding Accept or Content-Type.
  • Verify cache-validation behavior by overriding Cache-Control or ETag.
View details
[object Object]

Linux API status code

Back to top

Linux API status code starts a local proxy on the target Linux machine and overrides matching API responses with STATUS_CODE (and optionally RESPONSE_BODY) for DURATION. Use it to test how callers handle specific error responses.

Use cases
  • Validate retry behavior on 5xx vs 4xx classes.
  • Simulate rate-limit responses (429) to verify back-off.
  • Verify "not found" semantics (404) for content filtering.
View details