Node restart
Node restart disrupts the state of the node by restarting it.
- It tests deployment sanity (replica availability and uninterrupted service) and recovery workflows of the application pod.
Usage
View the uses of the fault
Prerequisites
- Kubernetes > 1.16
- Create a Kubernetes secret named
id-rsa
where the fault will be executed. The contents of the secret will be the private SSH key forSSH_USER
that will be used to connect to the node that hosts the target pod in the secret fieldssh-privatekey
. Below is a sample secret file:
apiVersion: v1
kind: Secret
metadata:
name: id-rsa
type: kubernetes.io/ssh-auth
stringData:
ssh-privatekey: |-
# SSH private key for ssh contained here
Creating the RSA key pair for remote SSH access for those who are already familiar with an SSH client, has been summarized below.
- Create a new key pair and store the keys in a file named
my-id-rsa-key
andmy-id-rsa-key.pub
for the private and public keys respectively:
ssh-keygen -f ~/my-id-rsa-key -t rsa -b 4096
- For each available node, run the below command that copies the public key of
my-id-rsa-key
:
ssh-copy-id -i my-id-rsa-key [email protected]
For further details, refer to this documentation. After copying the public key to all nodes and creating the secret, you are all set to execute the fault.
Default validations
The target nodes should be in the ready state before and after injecting chaos.
Fault tunables
Fault tunables
Mandatory fields
Variables | Description | Notes |
---|---|---|
TARGET_NODE | Name of the target node subject to chaos. If this is not provided, a random node is selected. | |
NODE_LABEL | It contains the node label that is used to filter the target nodes. | It is mutually exclusive with the TARGET_NODES environment variable. If both are provided, TARGET_NODES takes precedence. |
Optional fields
Variables | Description | Notes |
---|---|---|
LIB_IMAGE | Image used to run the stress command. | Defaults to litmuschaos/go-runner:latest . |
SSH_USER | Name of the SSH user. | Defaults to root . |
TARGET_NODE_IP | Internal IP of the target node subject to chaos. If not provided, the fault uses the node IP of the TARGET_NODE . | Defaults to empty. |
REBOOT_COMMAND | Command used to reboot. | Defaults to sudo systemctl reboot . |
TOTAL_CHAOS_DURATION | Duration that you specify, through which chaos is injected into the target resource (in seconds). | Default to 120s. |
RAMP_TIME | Period to wait before and after injecting chaos (in seconds). | For example, 30s. |
Fault examples
Common and Node specific tunables
Refer the common attributes and Node specific tunable to tune the common tunables for all faults and node specific tunables.
Reboot Command
It defines the command used to restart the targeted node. It can be tuned via REBOOT_COMMAND
ENV.
Use the following example to tune this:
# provide the reboot command
apiVersion: litmuschaos.io/v1alpha1
kind: ChaosEngine
metadata:
name: engine-nginx
spec:
engineState: "active"
annotationCheck: "false"
chaosServiceAccount: litmus-admin
experiments:
- name: node-restart
spec:
components:
env:
# command used for the reboot
- name: REBOOT_COMMAND
value: 'sudo systemctl reboot'
# name of the target node
- name: TARGET_NODE
value: 'node01'
- name: TOTAL_CHAOS_DURATION
VALUE: '60'
SSH User
It defines the name of the SSH user for the targeted node. It can be tuned via SSH_USER
ENV.
Use the following example to tune this:
# name of the ssh user used to ssh into targeted node
apiVersion: litmuschaos.io/v1alpha1
kind: ChaosEngine
metadata:
name: engine-nginx
spec:
engineState: "active"
annotationCheck: "false"
chaosServiceAccount: litmus-admin
experiments:
- name: node-restart
spec:
components:
env:
# name of the ssh user
- name: SSH_USER
value: 'root'
# name of the target node
- name: TARGET_NODE
value: 'node01'
- name: TOTAL_CHAOS_DURATION
VALUE: '60'
Target Node Internal IP
It defines the internal IP of the targeted node. It is an optional field, if internal IP is not provided then it will derive the internal IP of the targeted node. It can be tuned via TARGET_NODE_IP
ENV.
Use the following example to tune this:
# internal ip of the targeted node
apiVersion: litmuschaos.io/v1alpha1
kind: ChaosEngine
metadata:
name: engine-nginx
spec:
engineState: "active"
annotationCheck: "false"
chaosServiceAccount: litmus-admin
experiments:
- name: node-restart
spec:
components:
env:
# internal ip of the targeted node
- name: TARGET_NODE_IP
value: '10.0.170.92'
# name of the target node
- name: TARGET_NODE
value: 'node01'
- name: TOTAL_CHAOS_DURATION
VALUE: '60'