Skip to main content

Node memory hog

Node memory hog causes memory resource exhaustion on the Kubernetes node.

  • It is injected using a helper pod running the Linux stress-ng tool (a workload generator).
  • The chaos affects the application for a specific duration.

Node Memory Hog

Use cases

  • Node memory hog fault causes memory resource exhaustion on the Kubernetes node.
  • It aims to verify resilience of applications whose replicas may be evicted on account on nodes becoming unschedulable (in NotReady state) due to lack of memory resources.
  • It simulates the situation of memory leaks in the deployment of microservices.
  • It simulates application slowness due to memory starvation.
  • It simulates noisy neighbour problems due to hogging.
  • It verifies pod priority and QoS setting for eviction purposes.
  • It also verifies application restarts on OOM kills.

Permissions required

Below is a sample Kubernetes role that defines the permissions required to execute the fault.

apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
namespace: hce
name: node-memory-hog
spec:
definition:
scope: Cluster
permissions:
- apiGroups: [""]
resources: ["pods"]
verbs: ["create", "delete", "get", "list", "patch", "deletecollection", "update"]
- apiGroups: [""]
resources: ["events"]
verbs: ["create", "get", "list", "patch", "update"]
- apiGroups: [""]
resources: ["chaosEngines", "chaosExperiments", "chaosResults"]
verbs: ["create", "delete", "get", "list", "patch", "update"]
- apiGroups: [""]
resources: ["pods/log"]
verbs: ["get", "list", "watch"]
- apiGroups: [""]
resources: ["pods/exec"]
verbs: ["get", "list", "create"]
- apiGroups: ["batch"]
resources: ["jobs"]
verbs: ["create", "delete", "get", "list", "deletecollection"]
- apiGroups: [""]
resources: ["nodes"]
verbs: ["get", "list"]

Prerequisites

  • Kubernetes > 1.16
  • The target nodes should be in the ready state before and after injecting chaos.

Mandatory tunables

Tunable Description Notes
TARGET_NODES Comma-separated list of nodes subject to node I/O stress. For example, node-1,node-2. For more information, go to target nodes.
NODE_LABEL It contains the node label that is used to filter the target nodes.It is mutually exclusive with the TARGET_NODES environment variable. If both are provided, TARGET_NODES takes precedence. For more information, go to target nodes with labels.

Optional tunables

Tunable Description Notes
TOTAL_CHAOS_DURATION Duration that you specify, through which chaos is injected into the target resource (in seconds). Default: 120 s. For more information, go to duration of the chaos.
LIB_IMAGE Image used to run the stress command. Default: harness/chaos-go-runner:main-latest. For more information, go to image used by the helper pod.
MEMORY_CONSUMPTION_PERCENTAGE Percent of the total node memory capacity. Default: 30. For more information, go to memory consumption percentage.
MEMORY_CONSUMPTION_MEBIBYTES Amount of the total available memory (in mebibytes). It is mutually exclusive with MEMORY_CONSUMPTION_PERCENTAGE. For example, 256. For more information, go to memory consumption bytes.
NUMBER_OF_WORKERS Number of VM workers involved in the stress. Default: 1. For more information, go to workers for stress.
RAMP_TIME Period to wait before and after injecting chaos (in seconds). For example, 30 s. For more information, go to ramp time.
NODES_AFFECTED_PERC Percentage of the total nodes to target. It takes numeric values only. Default: 0 (corresponds to 1 node). For more information, go to node affected percentage.
SEQUENCE Sequence of chaos execution for multiple target pods. Default: parallel. Supports serial sequence as well. For more information, go to sequence of chaos execution.

Memory consumption percentage

Memory consumed (in percentage). Tune it by using the MEMORY_CONSUMPTION_PERCENTAGE environment variable.

The following YAML snippet illustrates the use of this environment variable:

# stress the memory of the targeted node with MEMORY_CONSUMPTION_PERCENTAGE of node capacity
# it is mutually exclusive with the MEMORY_CONSUMPTION_MEBIBYTES.
# if both are provided then it will use MEMORY_CONSUMPTION_PERCENTAGE for stress
apiVersion: litmuschaos.io/v1alpha1
kind: ChaosEngine
metadata:
name: engine-nginx
spec:
engineState: "active"
annotationCheck: "false"
chaosServiceAccount: litmus-admin
experiments:
- name: node-memory-hog
spec:
components:
env:
# percentage of total node capacity to be stressed
- name: MEMORY_CONSUMPTION_PERCENTAGE
value: '10' # in percentage
- name: TOTAL_CHAOS_DURATION
VALUE: '60'

Memory consumption mebibytes

Memory available (in mebibytes). Tune it by using the MEMORY_CONSUMPTION_MEBIBYTES environment variable. It is mutually exclusive with the MEMORY_CONSUMPTION_PERCENTAGE environment variable. If MEMORY_CONSUMPTION_PERCENTAGE environment variable is set, the fault uses this value for the stress.

The following YAML snippet illustrates the use of this environment variable:

# stress the memory of the targeted node with given MEMORY_CONSUMPTION_MEBIBYTES
# it is mutually exclusive with the MEMORY_CONSUMPTION_PERCENTAGE.
# if both are provided then it will use MEMORY_CONSUMPTION_PERCENTAGE for stress
apiVersion: litmuschaos.io/v1alpha1
kind: ChaosEngine
metadata:
name: engine-nginx
spec:
engineState: "active"
annotationCheck: "false"
chaosServiceAccount: litmus-admin
experiments:
- name: node-memory-hog
spec:
components:
env:
# node memory to be stressed
- name: MEMORY_CONSUMPTION_MEBIBYTES
value: '500' # in MiBi
- name: TOTAL_CHAOS_DURATION
VALUE: '60'

Workers for stress

Number of workers for stress. Tune it by using the NUMBER_OF_WORKERS environment variable.

The following YAML snippet illustrates the use of this environment variable:

# provide for the workers count for the stress
apiVersion: litmuschaos.io/v1alpha1
kind: ChaosEngine
metadata:
name: engine-nginx
spec:
engineState: "active"
annotationCheck: "false"
chaosServiceAccount: litmus-admin
experiments:
- name: node-memory-hog
spec:
components:
env:
# total number of workers involved in stress
- name: NUMBER_OF_WORKERS
value: '1'
- name: TOTAL_CHAOS_DURATION
VALUE: '60'