Skip to main content

Node restart

Node restart disrupts the state of the node by restarting it.

  • It tests deployment sanity (replica availability and uninterrupted service) and recovery workflows of the application pod.

Node Restart

Usage

View the uses of the fault
This fault determines the deployment sanity (replica availability and uninterrupted service) and recovery workflows of the application pod in the event of an unexpected node restart. It simulates loss of critical services (or node-crash). It verifies resource budgeting on cluster nodes (whether request(or limit) settings honored on available nodes), and whether topology constraints are adhered to (node selectors, tolerations, zone distribution, affinity(or anti-affinity) policies) or not.

Prerequisites

  • Kubernetes > 1.16
  • Create a Kubernetes secret named id-rsa where the fault will be executed. The contents of the secret will be the private SSH key for SSH_USER that will be used to connect to the node that hosts the target pod in the secret field ssh-privatekey. Below is a sample secret file:
apiVersion: v1
kind: Secret
metadata:
name: id-rsa
type: kubernetes.io/ssh-auth
stringData:
ssh-privatekey: |-
# SSH private key for ssh contained here

Creating the RSA key pair for remote SSH access for those who are already familiar with an SSH client, has been summarized below.

  1. Create a new key pair and store the keys in a file named my-id-rsa-key and my-id-rsa-key.pub for the private and public keys respectively:
ssh-keygen -f ~/my-id-rsa-key -t rsa -b 4096
  1. For each available node, run the below command that copies the public key of my-id-rsa-key:
ssh-copy-id -i my-id-rsa-key [email protected]

For further details, refer to this documentation. After copying the public key to all nodes and creating the secret, you are all set to execute the fault.

Default validations

The target nodes should be in the ready state before and after injecting chaos.

Fault tunables

Fault tunables

Mandatory fields

Variables Description Notes
TARGET_NODE Name of the target node subject to chaos. If this is not provided, a random node is selected.
NODE_LABEL It contains the node label that is used to filter the target nodes.It is mutually exclusive with the TARGET_NODES environment variable. If both are provided, TARGET_NODES takes precedence.

Optional fields

Variables Description Notes
LIB_IMAGE Image used to run the stress command. Defaults to litmuschaos/go-runner:latest.
SSH_USER Name of the SSH user. Defaults to root.
TARGET_NODE_IP Internal IP of the target node subject to chaos. If not provided, the fault uses the node IP of the TARGET_NODE. Defaults to empty.
REBOOT_COMMAND Command used to reboot. Defaults to sudo systemctl reboot.
TOTAL_CHAOS_DURATION Duration that you specify, through which chaos is injected into the target resource (in seconds). Default to 120s.
RAMP_TIME Period to wait before and after injecting chaos (in seconds). For example, 30s.

Fault examples

Common and Node specific tunables

Refer the common attributes and Node specific tunable to tune the common tunables for all faults and node specific tunables.

Reboot Command

It defines the command used to restart the targeted node. It can be tuned via REBOOT_COMMAND ENV.

Use the following example to tune this:

# provide the reboot command
apiVersion: litmuschaos.io/v1alpha1
kind: ChaosEngine
metadata:
name: engine-nginx
spec:
engineState: "active"
annotationCheck: "false"
chaosServiceAccount: litmus-admin
experiments:
- name: node-restart
spec:
components:
env:
# command used for the reboot
- name: REBOOT_COMMAND
value: 'sudo systemctl reboot'
# name of the target node
- name: TARGET_NODE
value: 'node01'
- name: TOTAL_CHAOS_DURATION
VALUE: '60'

SSH User

It defines the name of the SSH user for the targeted node. It can be tuned via SSH_USER ENV.

Use the following example to tune this:

# name of the ssh user used to ssh into targeted node
apiVersion: litmuschaos.io/v1alpha1
kind: ChaosEngine
metadata:
name: engine-nginx
spec:
engineState: "active"
annotationCheck: "false"
chaosServiceAccount: litmus-admin
experiments:
- name: node-restart
spec:
components:
env:
# name of the ssh user
- name: SSH_USER
value: 'root'
# name of the target node
- name: TARGET_NODE
value: 'node01'
- name: TOTAL_CHAOS_DURATION
VALUE: '60'

Target Node Internal IP

It defines the internal IP of the targeted node. It is an optional field, if internal IP is not provided then it will derive the internal IP of the targeted node. It can be tuned via TARGET_NODE_IP ENV.

Use the following example to tune this:

# internal ip of the targeted node
apiVersion: litmuschaos.io/v1alpha1
kind: ChaosEngine
metadata:
name: engine-nginx
spec:
engineState: "active"
annotationCheck: "false"
chaosServiceAccount: litmus-admin
experiments:
- name: node-restart
spec:
components:
env:
# internal ip of the targeted node
- name: TARGET_NODE_IP
value: '10.0.170.92'
# name of the target node
- name: TARGET_NODE
value: 'node01'
- name: TOTAL_CHAOS_DURATION
VALUE: '60'