Best Practices for Probe Validation - Pod Level Faults
This topic describes the best practices to use with resilience probes in Kubernetes pod-level chaos faults.
Common pod fault tunables
Introduction
Container Kill
Kill a specific container inside a Kubernetes pod to test restart loops, sidecar resilience, probe tuning, and multi-container coordination.
Disk Fill
Fill a target Kubernetes container's ephemeral storage as a percentage of its limit to test ephemeral-storage eviction, retention, and back-pressure logic.
FS Fill
Write a configurable amount of data into a specific path inside a Kubernetes container to test mounted-volume capacity, eviction, and write-failure handling.
Pod API Block
Block selected API requests or responses on a target Kubernetes pod using path, method, header, query parameter, and source or destination filters to test client retry and failover behavior.
Pod API Latency
Add a configurable delay to selected API calls on a target Kubernetes pod using path, method, header, query, and source or destination filters to test client timeouts, retries, and tail-latency budgets.
Pod API Modify Body
Overwrite API request or response bodies on a target Kubernetes pod using path, method, header, query, and source or destination filters to test client behavior under corrupted payloads.
Pod API Modify Header
Override API request or response headers on a target Kubernetes pod using path, method, query, and source or destination filters to test resilience to missing, altered, or unexpected header values.
Pod API Modify Response Custom
Combine status code, header, and body modifications on selected API calls of a target Kubernetes pod in a single fault, with filtering by path, method, query, source, or destination.
Pod API Status Code
Override the HTTP status code returned by selected API calls on a target Kubernetes pod using path, method, header, query, and source or destination filters to test client error handling and circuit-breaker behavior.
Pod Application Function Error
The Pod Application Function Error fault injects an error into a specified function of an application running within a Kubernetes pod. This fault helps assess the application's resilience to failures at the function level.
Pod Application Function Latency
The Pod Application Function Latency fault introduces artificial delay into a specified function of an application running within a Kubernetes pod. This helps evaluate the application's resilience to function-level latency and performance degradation.
Pod Autoscaler
Scale a Kubernetes workload's replicas up to a target count to test cluster capacity, node autoscaling, scheduling pressure, and rollback behavior.
Pod CPU Hog
Consume CPU on a target Kubernetes pod's container to test autoscaling, throttling, latency budgets, and noisy-neighbor tolerance.
Pod Delete
Delete one or more pods of a Kubernetes workload to test replica availability, controller recovery, graceful termination, and disruption budgets.
Pod DNS Error
Block DNS resolution for selected hostnames inside a target Kubernetes pod to test how the application handles upstream lookup failures and cluster DNS outages.
Pod DNS Spoof
Redirect DNS lookups for selected hostnames inside a target Kubernetes pod to a different address to test how the application handles misdirected upstream traffic and cache poisoning.
Pod HTTP Latency
Add a configurable delay to HTTP responses served by a target Kubernetes pod to test timeouts, retries, and tail-latency behavior at the application protocol layer.
Pod HTTP Modify Body
Overwrite the HTTP response body returned by a target Kubernetes pod to test client behavior under corrupted, empty, or unexpected response payloads.
Pod HTTP Modify Header
Override HTTP request or response headers served by a target Kubernetes pod to test client and server resilience to missing, altered, or unexpected header values.
Pod HTTP Reset Peer
Forcibly reset TCP connections carrying HTTP requests to a target Kubernetes pod to test client retry, connection-pool, and circuit-breaker behavior on abrupt disconnects.
Pod HTTP Status Code
Override the HTTP response status code returned by a target Kubernetes pod to test client error handling, retry classification, and circuit-breaker behavior on specific HTTP status codes.
Pod IO attribute override
Pod IO attribute override modifies the properties of files located within the mounted volume of the pod.
Pod IO error
The pod IO error simulates an error that can occur during system calls of the files located within the mounted volume of the pod.
Pod IO latency
Pod IO latency simulates slow I/O operations by introducing delays in system calls of the files located within the mounted volume of the pod. This fault is used for testing the resilience, performance, and scalability of the pod.
Pod IO mistake
Pod IO mistake simulates a scenario where the file system in the mounted volume of the pod reads or writes incorrect values. This fault determines how the application handles data corruption or errors during file operations, ensuring robustness and stability under adverse conditions.
Pod IO Stress
Generate sustained filesystem read and write load inside a target Kubernetes pod to test how the application handles disk pressure, slow IO, and ephemeral storage exhaustion.
Pod JVM CPU stress
Pod JVM CPU stress injects JVM CPU stress for a Java process executing in a Kubernetes pod by consuming excessive CPU threads of the JVM.
Pod JVM Kafka Exception
Pod JVM Kafka Exception fault simulates Kafka producer/consumer failures by raising exceptions for Kafka operations executed by the Java process running inside a Kubernetes pod. This helps test the application's behavior and resilience against Kafka-related errors.
Pod JVM Kafka Latency
Pod JVM Kafka Latency fault simulates latency in Kafka producer/consumer operations by introducing delays for Kafka operations executed by the Java process running inside a Kubernetes pod. This helps test the application's behavior and resilience against Kafka performance degradation.
Pod JVM method exception
Pod JVM method exception injects chaos into a Java application executing in a Kubernetes pod by invoking an exception.
Pod JVM method latency
Pod JVM method latency slows down the Java application executing on Kubernetes pod by introducing delays in executing the method calls.
Pod JVM modify return
Pod JVM modify return modifies the return value of a method in a Java application executing on a Kubernetes pod, for a specific duration.
Pod JVM Mongo Exception
Pod JVM Mongo Exception fault simulates MongoDB calls failures by raising exceptions for db calls executed by the Java process running inside a Kubernetes pod. This helps test the application's behavior and resilience against database-related errors.
Pod JVM Mongo Latency
Pod JVM Mongo Latency fault introduces latency in the mongodb calls executed by the Java process running inside a Kubernetes pod.
Pod JVM Solace Exception
Pod JVM Solace Exception fault simulates Solace messaging failures by raising exceptions in both publisher and receiver Java processes running inside a Kubernetes pod. This helps test the application's behavior and resilience against messaging disruptions.
Pod JVM Solace Latency
Pod JVM Solace Latency fault simulates Solace messaging delays by injecting latency into both publisher and receiver Java processes running inside a Kubernetes pod. This helps test the application's behavior and resilience against messaging slowdowns.
Pod JVM SQL Exception
Pod JVM SQL Exception fault simulates SQL query failures by raising exceptions for SQL queries executed by the Java process running inside a Kubernetes pod. This helps test the application's behavior and resilience against SQL-related errors.
Pod JVM SQL Latency
Pod JVM SQL Latency fault introduces latency in the SQL queries executed by the Java process running inside a Kubernetes pod.
Pod JVM trigger gc
Pod JVM trigger gc triggers the garbage collector for a Java process executing in a Kubernetes pod. This causes unused (or out of scope) objects and variables to be garbage collected and recycled, thereby freeing up memory space.
Pod Memory Hog
Consume memory inside a target Kubernetes pod's container to test OOM behavior, eviction order, request handling under pressure, and limit enforcement.
Pod Network Corruption
Corrupt a configurable percentage of packets on a target Kubernetes pod's network namespace to test checksum, retransmit, and integrity behavior.
Pod Network Duplication
Duplicate a configurable percentage of packets on a target Kubernetes pod's network namespace to test idempotency and dedup behavior.
Pod Network Latency
Add a configurable delay to packets on a target Kubernetes pod's network path to test timeout, retry, and tail-latency behavior of upstream and downstream calls.
Pod Network Loss
Drop a configurable percentage of packets on a target Kubernetes pod's network path to test retry, timeout, and failover behavior.
Pod Network Partition
Apply a temporary Kubernetes NetworkPolicy to isolate a target pod from its peers, dependencies, or namespaces and test split-brain behavior.
Pod Network Rate Limit
Cap bandwidth on a target Kubernetes pod's network path to test throughput-sensitive workloads, batch jobs, and bandwidth-bound flows.
Redis cache expire
Redis cache expire expires a given key (or all keys) for a specified duration. During this period of chaos, you can't access the keys associated with the cache.
Redis cache limit
Redis cache limit fault limits the amount of memory used by a Redis cache and restores it after the chaos duration.
Redis cache penetration
Redis cache penetration fault continuously sends cache requests to the Redis database to find the value for a key that does not exist. This continuous request reduces the performance of the application.
Time Chaos
Shift the wall-clock time observed by selected processes inside a target Kubernetes pod to test application behavior under clock skew, token expiry, and time-based scheduling errors.