Best Practices for Probe Validation - Pod Level Faults
This topic describes the best practices to use with resilience probes in Kubernetes pod-level chaos faults.
Common pod fault tunables
Introduction
Container Kill
Kill a specific container inside a Kubernetes pod to test restart loops, sidecar resilience, probe tuning, and multi-container coordination.
Disk Fill
Fill a target Kubernetes container's ephemeral storage as a percentage of its limit to test ephemeral-storage eviction, retention, and back-pressure logic.
FS Fill
Write a configurable amount of data into a specific path inside a Kubernetes container to test mounted-volume capacity, eviction, and write-failure handling.
Pod API Block
Block selected API requests or responses on a target Kubernetes pod using path, method, header, query parameter, and source or destination filters to test client retry and failover behavior.
Pod API Latency
Add a configurable delay to selected API calls on a target Kubernetes pod using path, method, header, query, and source or destination filters to test client timeouts, retries, and tail-latency budgets.
Pod API Modify Body
Overwrite API request or response bodies on a target Kubernetes pod using path, method, header, query, and source or destination filters to test client behavior under corrupted payloads.
Pod API Modify Header
Override API request or response headers on a target Kubernetes pod using path, method, query, and source or destination filters to test resilience to missing, altered, or unexpected header values.
Pod API Modify Response Custom
Combine status code, header, and body modifications on selected API calls of a target Kubernetes pod in a single fault, with filtering by path, method, query, source, or destination.
Pod API Status Code
Override the HTTP status code returned by selected API calls on a target Kubernetes pod using path, method, header, query, and source or destination filters to test client error handling and circuit-breaker behavior.
Pod Application Function Error
Inject a configurable error into a specific function of an instrumented application running in a Kubernetes pod so you can test how callers and dependents handle the failure.
Pod Application Function Exception
Throw a configurable exception from a specific function of an instrumented application running in a Kubernetes pod so you can test how callers and dependents handle the failure.
Pod Application Function Latency
Add a configurable delay to a specific function of an instrumented application running in a Kubernetes pod so you can test timeout, retry, and tail-latency behavior of callers.
Pod Autoscaler
Scale a Kubernetes workload's replicas up to a target count to test cluster capacity, node autoscaling, scheduling pressure, and rollback behavior.
Pod CPU Hog
Consume CPU on a target Kubernetes pod's container to test autoscaling, throttling, latency budgets, and noisy-neighbor tolerance.
Pod Delete
Delete one or more pods of a Kubernetes workload to test replica availability, controller recovery, graceful termination, and disruption budgets.
Pod DNS Error
Block DNS resolution for selected hostnames inside a target Kubernetes pod to test how the application handles upstream lookup failures and cluster DNS outages.
Pod DNS Spoof
Redirect DNS lookups for selected hostnames inside a target Kubernetes pod to a different address to test how the application handles misdirected upstream traffic and cache poisoning.
Pod HTTP Latency
Add a configurable delay to HTTP responses served by a target Kubernetes pod to test timeouts, retries, and tail-latency behavior at the application protocol layer.
Pod HTTP Modify Body
Overwrite the HTTP response body returned by a target Kubernetes pod to test client behavior under corrupted, empty, or unexpected response payloads.
Pod HTTP Modify Header
Override HTTP request or response headers served by a target Kubernetes pod to test client and server resilience to missing, altered, or unexpected header values.
Pod HTTP Reset Peer
Forcibly reset TCP connections carrying HTTP requests to a target Kubernetes pod to test client retry, connection-pool, and circuit-breaker behavior on abrupt disconnects.
Pod HTTP Status Code
Override the HTTP response status code returned by a target Kubernetes pod to test client error handling, retry classification, and circuit-breaker behavior on specific HTTP status codes.
Pod IO Attribute Override
Override file attributes (such as permissions, size, or ownership) returned by stat syscalls on a target Kubernetes pod's mounted volume to test how the application reacts to changed metadata.
Pod IO Error
Make filesystem syscalls on a target Kubernetes pod's mounted volume return a configurable error code, so you can validate how the application handles failed reads, writes, and opens.
Pod IO Latency
Add configurable delay to filesystem syscalls against a target Kubernetes pod's mounted volume so you can test how the application behaves under slow storage.
Pod IO Mistake
Seed wrong data into reads or writes against a target Kubernetes pod's mounted volume so you can validate how the application detects and recovers from silent data corruption.
Pod IO Stress
Generate sustained filesystem read and write load inside a target Kubernetes pod to test how the application handles disk pressure, slow IO, and ephemeral storage exhaustion.
Pod JVM CPU Stress
Generate sustained CPU load inside a JVM running in a target Kubernetes pod to test how the application behaves when its Java process is starved of CPU.
Pod JVM Kafka Exception
Cause Kafka producer or consumer calls from a JVM running in a target Kubernetes pod to throw a configurable exception on a chosen topic so you can test caller error handling.
Pod JVM Kafka Latency
Add a configurable delay to Kafka producer or consumer calls from a JVM running in a target Kubernetes pod, scoped by topic, so you can test timeout, back-pressure, and lag behavior under slow Kafka traffic.
Pod JVM Method Exception
Cause a specific Java method in a JVM running in a target Kubernetes pod to throw a configurable exception so you can test how callers handle the failure.
Pod JVM Method Latency
Add a configurable delay to every invocation of a specific Java method in a JVM running in a target Kubernetes pod so you can test how callers and dependents behave under slow methods.
Pod JVM Modify Return
Override the return value of a specific Java method in a JVM running in a target Kubernetes pod so you can test how callers behave when a method silently returns wrong data.
Pod JVM Mongo Exception
Cause MongoDB operations from a JVM running in a target Kubernetes pod to throw a configurable exception on a chosen database, collection, and operation so you can test caller error handling.
Pod JVM Mongo Latency
Add a configurable delay to MongoDB operations from a JVM running in a target Kubernetes pod, scoped by database, collection, and operation, so you can test timeout and back-pressure behavior under a slow MongoDB.
Pod JVM Solace Exception
Cause Solace publisher or subscriber calls from a JVM running in a target Kubernetes pod to throw a configurable exception on a chosen topic or queue so you can test caller error handling.
Pod JVM Solace Latency
Add a configurable delay to Solace publisher or subscriber calls from a JVM running in a target Kubernetes pod, scoped by topic or queue, so you can test timeout and back-pressure behavior under slow Solace messaging.
Pod JVM SQL Exception
Cause JDBC calls from a JVM running in a target Kubernetes pod to throw a configurable exception on a chosen table and SQL operation so you can test caller error handling.
Pod JVM SQL Latency
Add a configurable delay to JDBC calls from a JVM running in a target Kubernetes pod, scoped by table and SQL operation, so you can test timeout and back-pressure behavior under a slow database.
Pod JVM Trigger GC
Force the JVM in a target Kubernetes pod to run garbage collection on a configurable schedule so you can test how the application behaves under repeated GC pauses.
Pod Memory Hog
Consume memory inside a target Kubernetes pod's container to test OOM behavior, eviction order, request handling under pressure, and limit enforcement.
Pod Network Corruption
Corrupt a configurable percentage of packets on a target Kubernetes pod's network namespace to test checksum, retransmit, and integrity behavior.
Pod Network Duplication
Duplicate a configurable percentage of packets on a target Kubernetes pod's network namespace to test idempotency and dedup behavior.
Pod Network Latency
Add a configurable delay to packets on a target Kubernetes pod's network path to test timeout, retry, and tail-latency behavior of upstream and downstream calls.
Pod Network Loss
Drop a configurable percentage of packets on a target Kubernetes pod's network path to test retry, timeout, and failover behavior.
Pod Network Partition
Apply a temporary Kubernetes NetworkPolicy to isolate a target pod from its peers, dependencies, or namespaces and test split-brain behavior.
Pod Network Rate Limit
Cap bandwidth on a target Kubernetes pod's network path to test throughput-sensitive workloads, batch jobs, and bandwidth-bound flows.
Redis Cache Expire
Expire one or more keys (or all keys) in a target Redis instance for a configurable duration so you can test how the application behaves when its cache is suddenly evicted.
Redis Cache Limit
Cap the maximum memory of a target Redis instance to force evictions and write errors so you can test how the application behaves when Redis runs out of memory.
Redis Cache Penetration
Generate a configurable burst of cache-miss requests against a target Redis instance so you can test how the application and its downstream database behave when the cache is bypassed.
Time Chaos
Shift the wall-clock time observed by selected processes inside a target Kubernetes pod to test application behavior under clock skew, token expiry, and time-based scheduling errors.