Skip to main content

Redis cache limit

Last updated on

Redis cache limit is a Kubernetes pod-level chaos fault that temporarily caps the maxmemory setting on a target Redis server for a configurable duration. Depending on the configured eviction policy, Redis evicts existing keys or rejects new writes once the cap is exceeded. When the fault ends, the original maxmemory is restored.

Use this fault to test how a service behaves when its Redis instance is starved for memory: evictions of hot keys, write errors from SET, and degraded throughput as the cache stops serving traffic that it used to.

Run your first experiment

If you have not configured the chaos infrastructure yet, go to Quickstart to install the chaos infrastructure and run an experiment end to end.


Use cases

Run this fault when you want to answer concrete questions like:

  • Eviction policy validation: Does the configured eviction policy (allkeys-lru, volatile-lru, noeviction, and so on) match the application's expectations under pressure?
  • Write-error handling: With noeviction, SET fails. Does the application retry, queue, or surface the error?
  • Hot-key behavior: When LRU evicts keys, does the application notice the missing keys quickly enough to refill, or does p99 degrade?
  • Capacity planning: Confirm that the chosen maxmemory headroom matches realistic burst behavior.
  • Multi-tenant isolation: If multiple workloads share the Redis instance, does memory pressure on one tenant affect others as expected?

Prerequisites

  • Kubernetes version: 1.21 or later. Go to What's supported to confirm distribution support.
  • Target Redis reachable: The chaos pod can resolve and connect to ADDRESS.
  • CONFIG SET allowed: The Redis credentials used by the fault have permission to run CONFIG SET maxmemory. Many managed Redis offerings disable CONFIG; check first.
  • Credentials available (if needed): If Redis requires authentication or TLS, a Kubernetes secret is mounted at SECRET_FILE_PATH (see Redis authentication below).
  • No privileged access required: The fault connects to Redis over the network and does not require container runtime sockets or privileged pods.

Supported environments

PlatformSupport status
Amazon EKSSupported
Azure AKSSupported
Google GKESupported
Red Hat OpenShiftSupported
RancherSupported
VMware TanzuSupported
Self-managed Kubernetes (CNCF-certified)Supported
GKE AutopilotSupported (no privileged access required)
EKS Fargate, ACI virtual nodesSupported (no container runtime socket required)

Permissions required

The fault runs under the chaos infrastructure's service account.

Resource (apiGroup)VerbsWhy it is needed
pods ("")get, list, create, delete, deletecollection, patch, updateRun the chaos pod that connects to Redis
pods/log ("")get, list, watchStream chaos pod logs for status and debugging
events ("")get, list, create, patch, updateRecord fault progress as Kubernetes events
jobs (batch)get, list, create, delete, deletecollectionRun the chaos job that drives the fault
secrets ("")get, listMount the Redis credentials secret (only if SECRET_FILE_PATH is used)

The default Harness chaos infrastructure service account already includes these permissions.


Redis authentication

note

If your Redis server doesn't require authentication, you can directly provide the ADDRESS tunable, that refers to the Redis server address. Refer here.

If your application requires a secret or authentication, provide the ADDRESS, PASSWORD and the TLS authentication certificate. Create a Kubernetes secret (say redis-secret) in the namespace where the fault executes. A sample is shown below.

apiVersion: v1
kind: Secret
metadata:
name: redis-secret # Name of the Secret
type: Opaque # Default Secret type
stringData:
redis-secret.yaml: |-
address: 34.136.111.6:6379
password: mypass
tlsCertFile: <cert>

After creating the secret, mount the secret into the experiment, and reference the mounted file path using the SECRET_FILE_PATH environment variable in the experiment manifest. A sample is shown below.

apiVersion: litmuschaos.io/v1alpha1
kind: K8sFault
metadata:
name: redis-cache-penetration
spec:
definition:
chaos:
env:
... # other env
... # other env
- name: SECRET_FILE_PATH
value: "/tmp/redis-secret.yaml"
components:
secrets: # Kubernetes secret mounted
- name: redis-secret
mountPath: /tmp/

Fault tunables

Configure the following fault parameters when you add Redis cache limit to an experiment in Chaos Studio. Defaults are shown for reference.

Required parameters

TunableDescriptionDefault
ADDRESSRedis server address as host:port (for example redis.svc:6379).(required)

Chaos parameters

TunableDescriptionDefault
MAX_MEMORYMemory cap to apply to Redis during the fault. Accepts byte units (100mb, 1gb) or a percentage of currently-used memory (50%)."50%"
TOTAL_CHAOS_DURATIONDuration of the fault in seconds.60
RAMP_TIMEWait period in seconds before and after the fault. Go to ramp time to read how it is applied.0

Authentication

TunableDescriptionDefault
SECRET_FILE_PATHPath to the mounted Redis credentials file inside the chaos pod. Required only if Redis needs authentication or TLS.""
REDIS_PASSWORDName of the Kubernetes secret that contains the Redis password.""
REDIS_TLS_FILEName of the Kubernetes secret that contains the Redis TLS certificate.""

Tunables that apply to every fault are documented in common tunables for all faults.

Targeting

For Redis cache faults, target selection refers to the Kubernetes workload that produces the test load against Redis. Use the common workload tunables (TARGET_WORKLOAD_KIND, TARGET_WORKLOAD_NAMESPACE, TARGET_WORKLOAD_NAMES, TARGET_WORKLOAD_LABELS) documented in common pod fault tunables.

Confirm the eviction policy

Behavior under MAX_MEMORY depends entirely on Redis's maxmemory-policy. Check the running policy with CONFIG GET maxmemory-policy before running the fault so you can predict whether keys evict (LRU/LFU) or writes start failing (noeviction).


Fault execution in brief

Connects to the Redis server at ADDRESS, lowers maxmemory to the configured MAX_MEMORY for TOTAL_CHAOS_DURATION seconds, and restores the original maxmemory setting when the fault ends.


Expected behavior during fault execution

  • Once Redis crosses the new maxmemory, it acts according to its maxmemory-policy: eviction (LRU/LFU/random) or write rejection (noeviction).
  • Applications see either elevated cache miss rates (with eviction) or OOM command not allowed when used memory > 'maxmemory' errors on writes (with noeviction).
  • Reads against still-present keys succeed normally.
  • Replicas mirror evictions.
When the fault ends

The original maxmemory is restored. Redis stops evicting and writes succeed again; previously evicted keys remain gone until refilled.

Signals to watch

Attach resilience probes to assert each layer:

  • Eviction rate: Use a Prometheus probe on redis_evicted_keys_total.
  • Used memory: Use a Prometheus probe on redis_memory_used_bytes to confirm Redis tracks toward the cap.
  • Write error rate: Use an HTTP probe against a write-heavy endpoint to detect failures.

Verify the fault execution effect

While the experiment is running, confirm the cap is applied:

  1. Check maxmemory from redis-cli.

    kubectl run -n <namespace> tester --image=redis:alpine --rm -it -- \
    redis-cli -h <redis-host> -p <port> CONFIG GET maxmemory

    The reply should show the lowered value during the fault.

  2. Inspect evictions or write errors.

    kubectl run -n <namespace> tester --image=redis:alpine --rm -it -- \
    redis-cli -h <redis-host> -p <port> INFO stats | grep evicted_keys

    evicted_keys should rise (with eviction policies), or writes should start failing with OOM (with noeviction).


Recovery and cleanup

  • End of duration: The original maxmemory is restored automatically.
  • Abort the experiment: Stopping the experiment from Chaos Studio triggers the same cleanup path.
  • Failed cleanup: If the original value was not restored, run CONFIG SET maxmemory <original> manually. Capture chaos pod logs and share with Harness support.

Limitations

  • Managed Redis: Many managed Redis offerings disable CONFIG SET. Confirm with your provider before running this fault.
  • Cluster mode: CONFIG SET maxmemory applies per node. For Redis Cluster, run the fault against each node or accept that only one node will be constrained.
  • noeviction policy: With noeviction, writes fail with explicit OOM errors rather than silently evicting; expect application-visible errors.
  • Authentication or TLS errors block the fault: If SECRET_FILE_PATH references the wrong file or the secret contents are malformed, the chaos pod fails fast.

Troubleshooting

Redis cache limit experiment stays Pending or never starts in Harness Chaos Engineering

Inspect the chaos pods in the experiment namespace with kubectl describe pod -n <chaos-namespace>. Common causes are taints on the target node that the chaos pods do not tolerate, insufficient resources, or a missing secrets mount when SECRET_FILE_PATH is set. Add the required tolerations or correct the secret mount path.

CONFIG SET fails during redis-cache-limit

The Redis user does not have permission to change CONFIG, or the Redis offering disables CONFIG SET entirely (common on managed Redis). Run CONFIG SET maxmemory <value> manually with the same credentials to confirm; switch to a Redis instance that allows runtime configuration if needed.

No evictions observed during redis-cache-limit

The most common causes are: maxmemory-policy is noeviction (writes will fail instead of evictions); MAX_MEMORY is higher than current used_memory so the cap is not actually constrained; or the application has stopped writing during the test. Confirm with INFO stats and CONFIG GET maxmemory-policy.

Authentication errors connecting to Redis during redis-cache-limit

Verify the Kubernetes secret name in REDIS_PASSWORD or REDIS_TLS_FILE matches an existing secret in the experiment namespace, that SECRET_FILE_PATH points to the mounted file, and that the file contents include the correct address, password, and (if needed) TLS certificate. Test the same credentials with redis-cli from a debug pod.