Node drain
Node drain drains the node of all its resources running on it.
- Due to this, services running on the target node should be rescheduled to run on other nodes.
Usage
View the uses of the fault
Node drain fault drains all the resources running on a node. This fault determines the resilience of the application when the application replicas scheduled on a node are removed. It validates the application failover capabilities when a node suddenly becomes unavailable. It simulates node maintenance activity (hardware refresh, OS patching, Kubernetes upgrade). It verifies resource budgeting on cluster nodes (whether request (or limit) settings honored on available nodes), and whether topology constraints are adhered to (node selectors, tolerations, zone distribution, affinity(or anti-affinity) policies) or not.
Prerequisites
- Kubernetes > 1.16
- Node specified in the
TARGET_NODE
environment variable (the node for which Docker service would be killed) should be cordoned before executing the chaos fault. This ensures that the fault resources are not scheduled on it (or subject to eviction). This is achieved by the following steps:- Get node names against the applications pods using command
kubectl get pods -o wide
. - Cordon the node using command
kubectl cordon <nodename>
.
- Get node names against the applications pods using command
Default validations
The target nodes should be in the ready state before and after injecting chaos.
Fault tunables
Fault tunables
Mandatory fields
Variables | Description | Notes |
---|---|---|
TARGET_NODES | Comma-separated list of nodes subject to node CPU hog. | |
NODE_LABEL | It contains the node label that is used to filter the target nodes. | It is mutually exclusive with the TARGET_NODES environment variable. If both are provided, TARGET_NODES takes precedence. |
Optional fields
Variables | Description | Notes |
---|---|---|
TOTAL_CHAOS_DURATION | Duration that you specify, through which chaos is injected into the target resource (in seconds). | Defaults to 60s. |
RAMP_TIME | Period to wait before and after injecting chaos (in seconds). | For example, 30s. |
Fault examples
Common and node-specific tunables
Refer to the common attributes and node-specific tunables to tune the common tunables for all faults and node specific tunables.
Drain node
It contains the name of the target node subject to the chaos. You can tune it using the TARGET_NODE
environment variable.
Use the following example to tune it:
# drain the targeted node
apiVersion: litmuschaos.io/v1alpha1
kind: ChaosEngine
metadata:
name: engine-nginx
spec:
engineState: "active"
annotationCheck: "false"
chaosServiceAccount: litmus-admin
experiments:
- name: node-drain
spec:
components:
env:
# name of the target node
- name: TARGET_NODE
value: 'node01'
- name: TOTAL_CHAOS_DURATION
VALUE: '60'