Skip to main content

Redis cache expire

Last updated on

Redis cache expire is a Kubernetes pod-level chaos fault that expires a configurable set of keys (or all keys) on a target Redis server for a configurable duration. Only the chosen keys are affected; unrelated keys and other Redis databases keep serving normal traffic. When the fault ends, the chaos pod stops issuing expirations; keys that were not re-set by the application stay gone.

Use this fault to test how a service behaves when its cache is suddenly cold: read latency rises, downstream databases see a query burst, and stampede protection (if present) decides whether the system survives the burst.

Run your first experiment

If you have not configured the chaos infrastructure yet, go to Quickstart to install the chaos infrastructure and run an experiment end to end.


Use cases

Run this fault when you want to answer concrete questions like:

  • Cold-cache resilience: When a hot key disappears, does the application refill from the source of truth or fail closed?
  • Cache-stampede protection: Do single-flight, request coalescing, or probabilistic refresh patterns hold up when many callers race to refill the same key?
  • Database back-pressure: Does the downstream database survive the surge of refill queries, or do connections saturate?
  • TTL-driven invalidation: For applications that intentionally use short TTLs, confirm the refill path is fast enough to keep p99 acceptable.
  • Critical-key blast radius: Expire only an EXPIRY_OPTION-controlled subset to validate behavior for important keys without touching the whole cache.

Prerequisites

  • Kubernetes version: 1.21 or later. Go to What's supported to confirm distribution support.
  • Target Redis reachable: The chaos pod can resolve and connect to ADDRESS.
  • Credentials available (if needed): If Redis requires authentication or TLS, a Kubernetes secret is mounted at SECRET_FILE_PATH (see Redis authentication below).
  • No privileged access required: The fault connects to Redis over the network and does not require container runtime sockets or privileged pods.

Supported environments

PlatformSupport status
Amazon EKSSupported
Azure AKSSupported
Google GKESupported
Red Hat OpenShiftSupported
RancherSupported
VMware TanzuSupported
Self-managed Kubernetes (CNCF-certified)Supported
GKE AutopilotSupported (no privileged access required)
EKS Fargate, ACI virtual nodesSupported (no container runtime socket required)

Permissions required

The fault runs under the chaos infrastructure's service account.

Resource (apiGroup)VerbsWhy it is needed
pods ("")get, list, create, delete, deletecollection, patch, updateRun the chaos pod that connects to Redis
pods/log ("")get, list, watchStream chaos pod logs for status and debugging
events ("")get, list, create, patch, updateRecord fault progress as Kubernetes events
jobs (batch)get, list, create, delete, deletecollectionRun the chaos job that drives the fault
secrets ("")get, listMount the Redis credentials secret (only if SECRET_FILE_PATH is used)

The default Harness chaos infrastructure service account already includes these permissions.


Redis authentication

note

If your Redis server doesn't require authentication, you can directly provide the ADDRESS tunable, that refers to the Redis server address. Refer here.

If your application requires a secret or authentication, provide the ADDRESS, PASSWORD and the TLS authentication certificate. Create a Kubernetes secret (say redis-secret) in the namespace where the fault executes. A sample is shown below.

apiVersion: v1
kind: Secret
metadata:
name: redis-secret # Name of the Secret
type: Opaque # Default Secret type
stringData:
redis-secret.yaml: |-
address: 34.136.111.6:6379
password: mypass
tlsCertFile: <cert>

After creating the secret, mount the secret into the experiment, and reference the mounted file path using the SECRET_FILE_PATH environment variable in the experiment manifest. A sample is shown below.

apiVersion: litmuschaos.io/v1alpha1
kind: K8sFault
metadata:
name: redis-cache-penetration
spec:
definition:
chaos:
env:
... # other env
... # other env
- name: SECRET_FILE_PATH
value: "/tmp/redis-secret.yaml"
components:
secrets: # Kubernetes secret mounted
- name: redis-secret
mountPath: /tmp/

Fault tunables

Configure the following fault parameters when you add Redis cache expire to an experiment in Chaos Studio. Defaults are shown for reference.

Required parameters

TunableDescriptionDefault
ADDRESSRedis server address as host:port (for example redis.svc:6379).(required)

Chaos parameters

TunableDescriptionDefault
KEYSComma-separated list of Redis keys to expire. Empty pairs with EXPIRY_OPTION to expire a broader set.""
EXPIRY_OPTIONHow to choose keys when KEYS is empty. Common values: all (every key in the database) or a pattern matching a key namespace.""
EXPIRATIONExpiration time string (for example 0 for immediate, 60s for delayed expiry).""
DATABASERedis database index.0
TOTAL_CHAOS_DURATIONDuration of the fault in seconds.60
RAMP_TIMEWait period in seconds before and after the fault. Go to ramp time to read how it is applied.0

Authentication

TunableDescriptionDefault
SECRET_FILE_PATHPath to the mounted Redis credentials file inside the chaos pod. Required only if Redis needs authentication or TLS.""
REDIS_PASSWORDName of the Kubernetes secret that contains the Redis password.""
REDIS_TLS_FILEName of the Kubernetes secret that contains the Redis TLS certificate.""

Tunables that apply to every fault are documented in common tunables for all faults.

Targeting

For Redis cache faults, target selection refers to the Kubernetes workload that produces the test load against Redis. Use the common workload tunables (TARGET_WORKLOAD_KIND, TARGET_WORKLOAD_NAMESPACE, TARGET_WORKLOAD_NAMES, TARGET_WORKLOAD_LABELS) documented in common pod fault tunables.

Scope key expiry carefully

Setting EXPIRY_OPTION=all expires every key in the chosen database. Verify the database index and run this against a recoverable environment before using it in shared infrastructure.


Fault execution in brief

Connects to the Redis server at ADDRESS and issues expire commands for keys matching KEYS or EXPIRY_OPTION on database DATABASE, repeating across TOTAL_CHAOS_DURATION seconds.


Expected behavior during fault execution

  • Calls to GET on expired keys return nil. Applications that treat nil as a cache miss fall through to the source of truth.
  • Downstream databases see a surge in queries as the application refills the cache; if rate-limited or under-provisioned, those queries can saturate.
  • Application p99 latency rises until the cache warms back up.
  • Logs typically show an increase in cache miss events.
When the fault ends

The chaos pod stops issuing expirations. Any keys that were refilled by the application stay; keys that were not refilled remain expired.

Signals to watch

Attach resilience probes to assert each layer:

  • Cache hit ratio: Use a Prometheus probe on cache_hits_total / cache_requests_total.
  • Downstream database load: Use a Prometheus probe on database query rate or connection count to detect stampede.
  • End-to-end latency: Use an HTTP probe on a cache-backed endpoint.

Verify the fault execution effect

While the experiment is running, confirm keys are missing:

  1. Check a known key with redis-cli.

    kubectl run -n <namespace> tester --image=redis:alpine --rm -it -- \
    redis-cli -h <redis-host> -p <port> -n <DATABASE> GET <known-key>

    The reply should be (nil) while the fault runs.

  2. Confirm cache miss rate in metrics.

    The cache hit ratio should drop sharply and downstream database query rate should rise.


Recovery and cleanup

  • End of duration: The chaos pod stops automatically.
  • Abort the experiment: Stopping the experiment from Chaos Studio triggers the same cleanup path.
  • Refill state: Keys are refilled organically as the application receives requests. For pre-warming, run a known refill workload after the experiment ends.

Limitations

  • Cluster mode: Some EXPIRY_OPTION=all semantics rely on a single-node view. For Redis Cluster, expirations apply to keys reachable from the connected node only.
  • AOF/RDB persistence: Expired keys are not undone by AOF/RDB; they are simply expired entries. Plan recovery if your application depends on keys outliving the fault.
  • Read replicas: Expirations propagate to replicas; replica reads also miss until refill.
  • Authentication or TLS errors block the fault: If SECRET_FILE_PATH references the wrong file or the secret contents are malformed, the chaos pod fails fast.

Troubleshooting

Redis cache expire experiment stays Pending or never starts in Harness Chaos Engineering

Inspect the chaos pods in the experiment namespace with kubectl describe pod -n <chaos-namespace>. Common causes are taints on the target node that the chaos pods do not tolerate, insufficient resources, or a missing secrets mount when SECRET_FILE_PATH is set. Add the required tolerations or correct the secret mount path.

No expiration observed during redis-cache-expire

The most common causes are: ADDRESS points to the wrong host or port; DATABASE index does not contain the expected keys; KEYS lists names the application does not use; EXPIRY_OPTION is empty and no explicit KEYS are given; or authentication is required and SECRET_FILE_PATH is not set. Re-run with EXPIRY_OPTION=all on a test database to confirm the path is working, then narrow the scope.

Authentication errors connecting to Redis during redis-cache-expire

Verify the Kubernetes secret name in REDIS_PASSWORD or REDIS_TLS_FILE matches an existing secret in the experiment namespace, that SECRET_FILE_PATH points to the mounted file, and that the file contents include the correct address, password, and (if needed) TLS certificate. Test the same credentials with redis-cli from a debug pod.