Redis cache penetration

Last updated on Jun 2, 2026

Redis cache penetration is a Kubernetes pod-level chaos fault that issues a configurable number of cache-miss reads (requests for keys that do not exist) against a target Redis server for a configurable duration. The fault simulates a cache-penetration attack or runaway client behavior, where every request bypasses the cache and pushes load onto the downstream source of truth. When the fault ends, the chaos pod stops issuing requests and traffic returns to normal.

Use this fault to test how a service behaves when a workload starts asking for non-existent keys: client retries that hammer the database, missing null-cache protection, or a flood that exhausts connection pools downstream.

Run your first experiment

If you have not configured the chaos infrastructure yet, go to Quickstart to install the chaos infrastructure and run an experiment end to end.

Use cases

Run this fault when you want to answer concrete questions like:

Null-cache protection: Does the application cache misses (negative caching) to prevent repeated database hits, or does every miss reach the database?
Rate-limit and quota coverage: Do rate limits at the application or gateway layer catch the surge before it reaches the database?
Bloom-filter / pre-check guards: If a Bloom filter or existence check fronts Redis, does it correctly stop most penetration attempts?
Downstream connection pool: Does the database connection pool size up gracefully, or do connections starve?
Logging and detection: Do dashboards and alerts surface the miss-rate spike fast enough to drive manual intervention?

Prerequisites

Kubernetes version: 1.21 or later. Go to What's supported to confirm distribution support.
Target Redis reachable: The chaos pod can resolve and connect to ADDRESS.
Credentials available (if needed): If Redis requires authentication or TLS, a Kubernetes secret is mounted at SECRET_FILE_PATH (see Redis authentication below).
No privileged access required: The fault connects to Redis over the network and does not require container runtime sockets or privileged pods.

Supported environments

Platform	Support status
Amazon EKS	Supported
Azure AKS	Supported
Google GKE	Supported
Red Hat OpenShift	Supported
Rancher	Supported
VMware Tanzu	Supported
Self-managed Kubernetes (CNCF-certified)	Supported
GKE Autopilot	Supported (no privileged access required)
EKS Fargate, ACI virtual nodes	Supported (no container runtime socket required)

Permissions required

The fault runs under the chaos infrastructure's service account.

Resource (`apiGroup`)	Verbs	Why it is needed
`pods` (`""`)	`get`, `list`, `create`, `delete`, `deletecollection`, `patch`, `update`	Run the chaos pod that connects to Redis
`pods/log` (`""`)	`get`, `list`, `watch`	Stream chaos pod logs for status and debugging
`events` (`""`)	`get`, `list`, `create`, `patch`, `update`	Record fault progress as Kubernetes events
`jobs` (`batch`)	`get`, `list`, `create`, `delete`, `deletecollection`	Run the chaos job that drives the fault
`secrets` (`""`)	`get`, `list`	Mount the Redis credentials secret (only if `SECRET_FILE_PATH` is used)

The default Harness chaos infrastructure service account already includes these permissions.

Redis authentication

note

If your Redis server doesn't require authentication, you can directly provide the ADDRESS tunable, that refers to the Redis server address. Refer here.

If your application requires a secret or authentication, provide the ADDRESS, PASSWORD and the TLS authentication certificate. Create a Kubernetes secret (say redis-secret) in the namespace where the fault executes. A sample is shown below.

apiVersion: v1
kind: Secret
metadata:
  name: redis-secret  # Name of the Secret
type: Opaque       # Default Secret type
stringData:
  redis-secret.yaml: |-
    address: 34.136.111.6:6379
    password: mypass
    tlsCertFile: <cert>

After creating the secret, mount the secret into the experiment, and reference the mounted file path using the SECRET_FILE_PATH environment variable in the experiment manifest. A sample is shown below.

apiVersion: litmuschaos.io/v1alpha1
kind: K8sFault
metadata:
  name: redis-cache-penetration
spec:
  definition:
    chaos:
      env:
        ...  # other env
        ...  # other env
        - name: SECRET_FILE_PATH
          value: "/tmp/redis-secret.yaml"
      components:
        secrets:   # Kubernetes secret mounted
          - name: redis-secret
            mountPath: /tmp/

Fault tunables

Configure the following fault parameters when you add Redis cache penetration to an experiment in Chaos Studio. Defaults are shown for reference.

Required parameters

Tunable	Description	Default
`ADDRESS`	Redis server address as `host:port` (for example `redis.svc:6379`).	(required)

Chaos parameters

Tunable	Description	Default
`REQUEST_COUNT`	Number of cache-miss requests to issue over the fault duration.	`1000`
`TOTAL_CHAOS_DURATION`	Duration of the fault in seconds.	`60`
`RAMP_TIME`	Wait period in seconds before and after the fault. Go to ramp time to read how it is applied.	`0`

Authentication

Tunable	Description	Default
`SECRET_FILE_PATH`	Path to the mounted Redis credentials file inside the chaos pod. Required only if Redis needs authentication or TLS.	`""`
`REDIS_PASSWORD`	Name of the Kubernetes secret that contains the Redis password.	`""`
`REDIS_TLS_FILE`	Name of the Kubernetes secret that contains the Redis TLS certificate.	`""`

Tunables that apply to every fault are documented in common tunables for all faults.

Targeting

For Redis cache faults, target selection refers to the Kubernetes workload that produces the test load against Redis. Use the common workload tunables (TARGET_WORKLOAD_KIND, TARGET_WORKLOAD_NAMESPACE, TARGET_WORKLOAD_NAMES, TARGET_WORKLOAD_LABELS) documented in common pod fault tunables.

Pair with a downstream watch

The interesting failures during cache penetration usually happen downstream of Redis, not in Redis itself. Watch database query rate, connection pool saturation, and application error rate while the fault runs.

Fault execution in brief

Connects to the Redis server at ADDRESS and issues REQUEST_COUNT reads against keys that do not exist, spread across TOTAL_CHAOS_DURATION seconds.

Expected behavior during fault execution

Redis GET commands return nil for every requested key. Redis itself handles the load without significant CPU or memory impact.
Applications that do not cache misses fall through to the source of truth for each request, multiplying database load.
Caller-side metrics show a sharp drop in cache hit ratio and a corresponding spike in database query rate.
Connection pools may saturate if the downstream database has fewer slots than concurrent miss handlers.

When the fault ends

The chaos pod stops issuing requests. Traffic returns to whatever the application was generating before, and the downstream surge subsides.

Signals to watch

Attach resilience probes to assert each layer:

Cache hit ratio: Use a Prometheus probe on cache_hits_total / cache_requests_total to confirm the miss spike.
Downstream database load: Use a Prometheus probe on database query rate or connection count to detect saturation.
Application error rate: Use an HTTP probe against an endpoint backed by the cache to detect failures triggered by downstream saturation.

Verify the fault execution effect

While the experiment is running, confirm the miss storm:

Inspect Redis command rate.

kubectl run -n <namespace> tester --image=redis:alpine --rm -it -- \
  redis-cli -h <redis-host> -p <port> INFO stats | grep instantaneous_ops_per_sec

Operations per second should rise during the fault.

Compare cache miss ratio in metrics.

The cache hit ratio dashboard should drop sharply and downstream database query rate should rise.

Recovery and cleanup

End of duration: The chaos pod stops automatically.
Abort the experiment: Stopping the experiment from Chaos Studio triggers the same cleanup path.
Lingering load: If downstream connection pools or queues built up during the fault, they typically drain within seconds. If the application has retried failed downstream calls onto an internal queue, allow time for the queue to flush.

Limitations

No actual data modification: This fault only issues reads against non-existent keys. It does not change Redis state.
Synthetic miss only: Requests originate from the chaos pod, not from real application clients, so connection-pool effects upstream of Redis are not exercised.
Authentication or TLS errors block the fault: If SECRET_FILE_PATH references the wrong file or the secret contents are malformed, the chaos pod fails fast.
Cluster mode: Requests connect to one node; for Redis Cluster, the miss storm focuses on the connected node's keyspace.

Troubleshooting

Redis cache penetration experiment stays Pending or never starts in Harness Chaos Engineering

Inspect the chaos pods in the experiment namespace with kubectl describe pod -n <chaos-namespace>. Common causes are taints on the target node that the chaos pods do not tolerate, insufficient resources, or a missing secrets mount when SECRET_FILE_PATH is set. Add the required tolerations or correct the secret mount path.

No miss spike observed during redis-cache-penetration

The most common causes are: ADDRESS points to the wrong host or port; REQUEST_COUNT is too small relative to TOTAL_CHAOS_DURATION to be measurable; or authentication is required and SECRET_FILE_PATH is not set. Re-run with a larger REQUEST_COUNT and verify the chaos pod logs show successful connections to Redis.

Authentication errors connecting to Redis during redis-cache-penetration

Verify the Kubernetes secret name in REDIS_PASSWORD or REDIS_TLS_FILE matches an existing secret in the experiment namespace, that SECRET_FILE_PATH points to the mounted file, and that the file contents include the correct address, password, and (if needed) TLS certificate. Test the same credentials with redis-cli from a debug pod.

Redis cache expire: Expire selected keys to simulate a cold cache.
Redis cache limit: Cap Redis memory to force evictions or write errors.
Common pod fault tunables: Shared environment variables for selecting target pods and workloads.

Use cases​

Prerequisites​

Supported environments​

Permissions required​

Redis authentication​

Fault tunables​

Fault execution in brief​

Expected behavior during fault execution​

Signals to watch​

Verify the fault execution effect​

Recovery and cleanup​

Limitations​

Troubleshooting​

Related faults​