Docker Service Kill


  • This fault causes the application to become unreachable on account of node turning unschedulable (NotReady) due to docker service kill
  • The docker service has been stopped/killed on a node to make it unschedulable for a certain duration i.e TOTAL_CHAOS_DURATION. The application node should be healthy after the chaos injection and the services should be re-accessible.
  • The application implies services. Can be reframed as: Test application resiliency upon replica getting unreachable caused due to docker service down.
Fault execution flow chart

  • Ensure that Kubernetes Version > 1.16
  • Ensure that the node specified in the fault ENV variable TARGET_NODE (the node for which docker service need to be killed) should be cordoned before execution of the chaos fault to ensure that the fault resources are not scheduled on it or subjected to eviction. This can be achieved with the following steps:
    • Get node names against the applications pods: kubectl get pods -o wide
    • Cordon the node kubectl cordon <nodename>

Default Validations


The target nodes should be in ready state before and after chaos injection.

Fault Tunables

Check the Fault Tunables

Mandatory Fields

Variables Description Notes
TARGET_NODE Name of the target node Eg. node-1
NODE_LABEL It contains node label, which will be used to filter the target node if TARGET_NODE ENV is not set It is mutually exclusive with the TARGET_NODE ENV. If both are provided then it will use the TARGET_NODE

Optional Fields

Variables Description Notes
TOTAL_CHAOS_DURATION The time duration for chaos insertion (seconds) Defaults to 60s
LIB The chaos lib used to inject the chaos Defaults to litmus
RAMP_TIME Period to wait before injection of chaos in sec Eg. 30

Fault Examples

Common and Node specific tunables

Refer the common attributes and Node specific tunable to tune the common tunables for all faults and node specific tunables.

Kill Docker Service

It contains name of target node subjected to the chaos. It can be tuned via TARGET_NODE ENV.

Use the following example to tune this:

# kill the docker service of the target node
kind: ChaosEngine
name: engine-nginx
engineState: "active"
annotationCheck: "false"
chaosServiceAccount: litmus-admin
- name: docker-service-kill
# name of the target node
value: 'node01'
VALUE: '60'