Skip to main content

Node memory hog

Last updated on

Node memory hog is a Kubernetes node-level chaos fault that consumes a configurable share of a target node's memory for a configurable duration. As free memory drops, the kubelet's eviction thresholds trip and pods are evicted in QoS order: BestEffort first, then Burstable (the pods that exceed their request the most), then Guaranteed only as a last resort.

Use this fault to simulate a memory-leak neighbor: a runaway batch process, a JVM heap that grew past its node-level budget, or a container that ignores its memory limit and consumes whatever the kernel gives it.

Run your first experiment

If you have not configured the chaos infrastructure yet, go to Quickstart to install the chaos infrastructure and run an experiment end to end.


Use cases

Run this fault when you want to answer concrete questions like:

  • QoS-based eviction order: When the kubelet starts evicting, do Guaranteed workloads stay protected while BestEffort and over-quota Burstable pods are reclaimed first?
  • Pod priority and preemption: Does pod priority correctly influence which workloads survive an eviction sweep?
  • HPA and VPA reactions: When the memory footprint of a service grows because the node is under pressure, do autoscaling controllers add capacity in time?
  • OOM kill behavior at the container level: Does the application restart cleanly after an OOM kill, or does it leave behind leaked state (open file handles, half-written files, orphaned children)?
  • Restart-loop and crash-loop containment: Does a single OOM kill stay contained, or does it cascade into a CrashLoopBackOff because the new pod immediately hits the same constraint?
  • Memory-leak detection in monitoring: Do your alerts fire on the right signal (working set growth, eviction rate, OOM count), and at the right severity?

Prerequisites

  • Kubernetes version: 1.21 or later. Go to What's supported to confirm distribution support.
  • Privileged pods allowed: The cluster lets you schedule privileged pods in the chaos namespace. The fault allocates memory against the host.
  • Container runtime access: The chaos infrastructure can reach the container runtime on the target nodes. The default containerd socket path is mounted automatically.
  • Node readiness: Target nodes are in Ready state before the fault is launched. The fault reports a precheck failure otherwise.
  • Workloads have memory requests and limits: Without memory requests, the kubelet cannot reason about QoS class, every pod is treated as BestEffort, and the experiment observes generic eviction noise rather than meaningful prioritization.
  • Chaos infrastructure isolation: The target nodes are not single points of failure for the chaos infrastructure itself. If chaos control-plane pods are scheduled on the saturated node and end up evicted, the experiment loses observability.

Supported environments

PlatformSupport status
Amazon EKSSupported
Azure AKSSupported
Google GKESupported
Red Hat OpenShiftSupported
RancherSupported
VMware TanzuSupported
Self-managed Kubernetes (CNCF-certified)Supported
GKE AutopilotNot supported (Autopilot does not expose the node-level access this fault requires; only Node Network Loss and Node Network Latency are allowlisted, see Chaos on GKE Autopilot)

Permissions required

The fault runs under the chaos infrastructure's service account. The account must be able to perform the following operations against the target cluster.

Resource (apiGroup)VerbsWhy it is needed
pods ("")get, list, create, delete, deletecollection, patch, updateRun the chaos pod that injects memory pressure on the target node
pods/log ("")get, list, watchStream chaos pod logs for status and debugging
events ("")get, list, create, patch, updateRecord fault progress and any pod evictions as Kubernetes events
nodes ("")get, listDiscover target nodes and validate selectors
jobs (batch)get, list, create, delete, deletecollectionRun the chaos job that drives the fault

The default Harness chaos infrastructure service account already includes these permissions. You only need to extend it if you are running with a restricted scope.


Fault tunables

Configure the following fault parameters when you add Node memory hog to an experiment in Chaos Studio. Defaults are shown for reference.

Chaos parameters

TunableDescriptionDefault
MEMORY_CONSUMPTION_PERCENTAGEMemory to consume as a percentage of the node's total capacity. When non-zero, it takes precedence over MEMORY_CONSUMPTION_MEBIBYTES.0
MEMORY_CONSUMPTION_MEBIBYTESMemory to consume as an absolute value in MiB. Used when MEMORY_CONSUMPTION_PERCENTAGE is 0 (the default).500
NUMBER_OF_WORKERSNumber of VM workers used to allocate memory. More workers reach the target faster but use more CPU.1
TOTAL_CHAOS_DURATIONDuration of the fault in seconds.30

Targeting

TunableDescriptionDefault
TARGET_NODESComma-separated list of node names to target. Go to target multiple nodes to read more.""
NODE_LABELLabel selector for choosing target nodes. Go to target nodes with labels to read more.""
NODES_AFFECTED_PERCENTAGEPercentage of nodes (matching the selector) to target. 0 means one node.0
SEQUENCEWhen multiple nodes are targeted, inject parallel (all at once) or serial (one after another).parallel

Runtime and helper

TunableDescriptionDefault
RAMP_TIMEWait period in seconds before and after the fault. Go to ramp time to read how it is applied.0

Tunables that apply to every chaos fault are documented in common tunables for all faults.

Pick percentage or absolute bytes, not both

MEMORY_CONSUMPTION_PERCENTAGE and MEMORY_CONSUMPTION_MEBIBYTES are mutually exclusive. The default configuration consumes 500 MiB (the MEMORY_CONSUMPTION_PERCENTAGE default of 0 cedes precedence to the absolute value). To consume a percentage of node memory instead, set MEMORY_CONSUMPTION_PERCENTAGE to a non-zero value; start at 30% to 50% on production-shaped nodes because higher values cross kubelet eviction thresholds quickly.


Fault execution in brief

Allocates a specified percentage of the target node's memory for the configured duration, so workloads sharing the node experience kubelet eviction or container OOM kills once the kubelet's memory-pressure threshold is crossed.

The kubelet eviction manager ranks pods for eviction in this order:

Eviction orderWhat is reclaimed
BestEffort pods (no memory request)Reclaimed first. Cheapest cost to the cluster.
Burstable pods using more than their memory requestReclaimed next, ranked by how far over request they are.
Guaranteed pods (memory request = limit)Reclaimed last, only when the kernel itself is about to OOM.
OOM killerFires inside individual containers that exceed their per-container memory limit.

The kubelet emits Evicted events naming the evicted pod and the eviction signal. Watch for them with kubectl get events --field-selector reason=Evicted.


Expected behavior during fault execution

  • Memory consumed by the fault is added on top of whatever the node was already using. A 30% setting on a node already at 60% utilization pushes the node to 90% and likely trips kubelet eviction thresholds.
  • The kubelet evicts whole pods, not individual containers. Once a pod is evicted, the scheduler tries to place it on another node with capacity.
  • Application containers that hit their own memory limit are OOM-killed by the kernel and counted in kube_pod_container_status_restarts_total. This is independent of node eviction.
  • If NUMBER_OF_WORKERS is high, memory is allocated faster but consumes more CPU. For most experiments, the default 1 is enough; raise it only if you want to reach the target memory consumption in the first few seconds.
  • The node almost never flips to NotReady from memory pressure alone. Eviction is the expected outcome, not partition.
Hog and OOM are different signals

This fault tests how the cluster handles a memory-saturated node. To test how a single pod handles hitting its container memory limit specifically, use Pod memory hog. The mechanisms and observed signals are different.

Signals to watch

A useful experiment captures signals from three layers. Attach resilience probes to assert each layer automatically:

  • Cluster state and eviction: Run kubectl top node <name> and kubectl get events --field-selector reason=Evicted -n <namespace> -w to see eviction in real time. Use a Kubernetes probe to validate that critical pods stay scheduled and Running.
  • Application service-level indicators: Watch error rate and request availability for the affected workloads. The signal that matters is whether QoS protected the right pods. Use an HTTP probe for direct endpoint health.
  • Eviction and OOM metrics: Track kube_pod_status_reason{reason="Evicted"}, node_memory_MemAvailable_bytes, and kube_pod_container_status_restarts_total for OOM-driven restarts. Use a Prometheus probe or an APM probe to fail the experiment when an unexpected pod is evicted or when restart counts spike.

Verify the fault execution effect

While the experiment is running, confirm that memory pressure is reaching the node:

  1. Check memory usage on the node.

    kubectl top node <target-node>

    Memory usage should rise toward the percentage you configured. If it stays flat, the fault is not driving memory pressure on the expected node.

  2. Watch for eviction events.

    kubectl get events --field-selector reason=Evicted --all-namespaces -w

    At higher consumption levels you should see Evicted events listing MemoryPressure as the reason. If no evictions occur, either the node had plenty of free memory or MEMORY_CONSUMPTION_PERCENTAGE was set too low to cross the kubelet's eviction threshold.

  3. Look for OOM kills in pods that breached their own limit.

    kubectl get pods --field-selector spec.nodeName=<target-node> -o wide
    kubectl describe pod <restarted-pod> | grep -A3 'Last State'

    Reason: OOMKilled and Exit Code: 137 indicate a container-level OOM, separate from kubelet eviction.


Recovery and cleanup

  • End of duration: When TOTAL_CHAOS_DURATION elapses, the allocation is freed and node memory returns to baseline within a few seconds.
  • Evicted pods reschedule: Pods that were evicted during the fault are scheduled on other Ready nodes by the scheduler. They are not placed back on the recovered node automatically.
  • Pods stuck Pending: If your cluster lacks capacity on other nodes, evicted pods may sit in Pending. The cluster autoscaler should add capacity if configured. Otherwise, the pods land back on the recovered node only after another scheduling cycle.
  • Container OOM restarts: Containers that were OOM-killed during the fault are restarted by the kubelet. If a container hits its limit again immediately on restart, it can enter CrashLoopBackOff. Investigate and raise the per-container memory limit before re-running.
  • If automated cleanup did not complete: Memory is reclaimed as soon as the chaos pod exits. No node-level cleanup is required.
  • Abort the experiment early: Stop the experiment from Harness Chaos Studio. Memory is reclaimed once the chaos pod exits.

Limitations

This fault is not appropriate in the following scenarios:

  • Serverless Kubernetes (EKS Fargate, ACI virtual nodes, GKE Autopilot): These platforms do not expose real nodes or allow the privileged access this fault needs.
  • Windows nodes: This fault is supported on Linux nodes only.
  • Single-node clusters or co-located chaos infrastructure: If the chaos infrastructure pods live on the node you are about to saturate, the kubelet may evict them along with everything else, and the experiment loses observability. Schedule chaos infrastructure on a node outside the blast radius.
  • Workloads without memory requests: Without requests, every pod is BestEffort and the experiment observes generic eviction rather than meaningful QoS prioritization.
  • Very large consumption values on small nodes: Setting MEMORY_CONSUMPTION_PERCENTAGE close to 100% on a node with less headroom than the chaos pod needs can OOM the chaos pod itself before it produces useful signal. Start at 30% to 50% and tune from there.

Troubleshooting

Node memory hog experiment stays Pending or never starts in Harness Chaos Engineering

Inspect the chaos pods in the experiment namespace with kubectl describe pod -n <chaos-namespace>. The most common causes are taints on the target node, insufficient memory available to schedule the chaos pod, or a PodSecurity admission policy blocking privileged pods. Add the required tolerations, free resources on the node, or run the experiment in a namespace with privileged Pod Security level.

Node memory hog runs but kubectl top shows memory usage unchanged on the target node

The chaos pod may be constrained by its own memory limit, or it may be scheduled on a different node than expected. Verify with kubectl get pods -n <chaos-namespace> -o wide that the chaos pod is on the intended target node, and remove or raise its memory limit if it is constrained.

No pods are evicted during node-memory-hog even at high MEMORY_CONSUMPTION_PERCENTAGE

The kubelet's eviction thresholds (memory.available, nodefs.available) may not be set, or the node simply has enough free memory left after the hog. Check kubelet config on the node with kubectl get --raw /api/v1/nodes/<node>/proxy/configz | jq .evictionHard. Raise the consumption value or lower the eviction threshold (in a non-production cluster) to reproduce eviction reliably.

Helper pod is killed with OOMKilled during node-memory-hog instead of evicting other pods

The chaos pod hit its own container memory limit before the kubelet reached its node-level eviction threshold. Lower MEMORY_CONSUMPTION_MEBIBYTES so the chaos pod stays within bounds, or raise the chaos pod's memory limit. Inspect the chaos pod in the experiment namespace with kubectl describe pod -n <chaos-namespace> to confirm the last state shows OOMKilled and Exit Code 137.

Critical Guaranteed pods are evicted during node-memory-hog

Guaranteed pods should be the last to be evicted. If they are reclaimed first, verify their QoS class with kubectl describe pod <name> | grep QoS. The most common cause is that their memory request and limit are not exactly equal, downgrading them to Burstable. Set request == limit for both memory and CPU on critical pods.


  • Node CPU hog: Same blast-radius shape but applies CPU pressure instead of memory. Use it to test throttling rather than eviction.
  • Node I/O stress: Stresses disk I/O on the node. Use it to test disk-bound workloads.
  • Pod memory hog: Scope memory pressure to a single pod rather than the whole node. Use it to test per-container OOM behavior.
  • Common node fault tunables: Shared environment variables for selecting target nodes across node faults.