Skip to main content

Pod memory hog exec

Pod memory hog exec is a Kubernetes pod-level chaos fault that consumes memory resources on the application container in megabytes.

  • It simulates conditions where app pods experience Memory spikes either due to expected/undesired processes thereby testing how the overall application stack behaves when this occurs.

Pod Memory Hog Exec

Usage

View fault usage
Memory usage within containers is subject to various constraints in Kubernetes. If the limits are specified in their spec, exceeding them results in termination of the container (due to OOMKill of the primary process, often pid 1). This restarts container dependng on policy specified. For containers with no limits on memory, node can be killed based on their oom_score. This results in a bigger blast radius. It simulates the situation of memory leaks in the deployment of microservices, application slowness due to memory starvation, and noisy neighbour problems due to hogging. It verifies pod priority and QoS setting for eviction purposes. It also verifies application restarts on OOM kills. This fault causes stress within the target container, which may result in the primary process in the container to be constrained or eat up the available system memory on the node.

Prerequisites

  • Kubernetes > 1.16.

Default validations

The application pods should be in running state before and after chaos injection.

Fault tunables

Fault tunables
Variables Description Notes
MEMORY_CONSUMPTION The amount of memory used of hogging a Kubernetes pod (megabytes) Defaults to 500MB (Up to 2000MB)
TOTAL_CHAOS_DURATION The time duration for chaos insertion (seconds) Defaults to 60s
TARGET_PODS Comma separated list of application pod name subjected to pod memory hog chaos If not provided, it will select target pods randomly based on provided appLabels
TARGET_CONTAINER Name of the target container under chaos If not provided, it will select the first container of the target pod
CHAOS_KILL_COMMAND The command to kill the chaos process Defaults to kill $(find /proc -name exe -lname '*/dd' 2>&1 | grep -v 'Permission denied' | awk -F/ '{print $(NF-1)}' | head -n 1). Another useful one that generally works (in case the default doesn't) is kill -9 $(ps afx | grep \"[dd] if=/dev/zero\" | awk '{print $1}' | tr '\n' ' '). In case neither works, please check whether the target pod's base image offers a shell. If yes, identify appropriate shell command to kill the chaos process.
PODS_AFFECTED_PERC The Percentage of total pods to target Defaults to 0 (corresponds to 1 replica), provide numeric value only
RAMP_TIME Period to wait before injection of chaos in sec For example, 30s.
SEQUENCE It defines sequence of chaos execution for multiple target pods Default value: parallel. Supported: serial, parallel

Fault examples

Common and pod-specific tunables

Refer to the common attributes and pod-specific tunables to tune the common tunables for all fault and pod specific tunables.

Memory consumption

It specifies the amount of memory consumed by the tatget pod for a duration specified by the TOTAL_CHAOS_DURATION environment variable. You can tune it using the MEMORY_CONSUMPTION environment variable. The memory consumption limit is 2000 MB.

Use the following example to tune it:

# memory to be stressed in MB
apiVersion: litmuschaos.io/v1alpha1
kind: ChaosEngine
metadata:
name: engine-nginx
spec:
engineState: "active"
annotationCheck: "false"
appinfo:
appns: "default"
applabel: "app=nginx"
appkind: "deployment"
chaosServiceAccount: litmus-admin
experiments:
- name: pod-memory-hog
spec:
components:
env:
# memory consumption value in MB
# it is limited to 2000MB
- name: MEMORY_CONSUMPTION
value: "500" #in MB
- name: TOTAL_CHAOS_DURATION
value: "60"

Chaos kill commands

It defines kill command that is set to exhaust the resources. Ypu can tune it using the CHAOS_KILL_COMMAND environment variable.

  • CHAOS_KILL_COMMAND: "kill $(find /proc -name exe -lname '*/dd' 2>&1 | grep -v 'Permission denied' | awk -F/ '{print $(NF-1)}' | head -n 1)"

Use the following example to tune it:

# provide the chaos kill command used to kill the chaos process
apiVersion: litmuschaos.io/v1alpha1
kind: ChaosEngine
metadata:
name: engine-nginx
spec:
engineState: "active"
annotationCheck: "false"
appinfo:
appns: "default"
applabel: "app=nginx"
appkind: "deployment"
chaosServiceAccount: litmus-admin
experiments:
- name: pod-memory-hog-exec
spec:
components:
env:
# command to kill the dd process
# alternative command: "kill -9 $(ps afx | grep \"[dd] if=/dev/zero\" | awk '{print $1}' | tr '\n' ' ')"
- name: CHAOS_KILL_COMMAND
value: "kill $(find /proc -name exe -lname '*/dd' 2>&1 | grep -v 'Permission denied' | awk -F/ '{print $(NF-1)}' | head -n 1)"
- name: TOTAL_CHAOS_DURATION
value: "60"