Skip to main content

GCP VM service kill

GCP VM service kill fault stops a given service for specified duration. As a consequence, the node becomes unschedulable and it transitions to NotReady state.

  • GCP VM service kill stops a target service on a node to make it unschedulable for a specific duration.
  • The node reverts to its original state and services resume after a specific duration. Sometimes, a new node replica may substitute the existing one.

GCP VM service service kill

Use cases

GCP VM service kill fault assesses a GKE node's resilience by evaluating the service operating on it.

Prerequisites

  • Kubernetes > 1.23
  • Cordon the node specified in the VM_INSTANCE_NAMES environment variable (the node for which the target service is killed) before executing the chaos fault. This ensures that the fault resources aren't scheduled on it or subject to eviction. You can achieve this using the following steps:
    • Get node names against the applications pods using command kubectl get pods -o wide.
    • Cordon the node using command kubectl cordon <nodename>.
  • The target nodes should be in the ready state before and after injecting chaos.
  • Adequate GCP permissions to do gcloud ssh on the VM instance.
  • The VM instances should be in a healthy state.
  • Kubernetes secret should have the GCP service account credentials in the default namespace. Refer generate the necessary credentials in order to authenticate your identity with the Google Cloud Platform (GCP) docs for more information.
apiVersion: v1
kind: Secret
metadata:
name: cloud-secret
type: Opaque
stringData:
type:
project_id:
private_key_id:
private_key:
client_email:
client_id:
auth_uri:
token_uri:
auth_provider_x509_cert_url:
client_x509_cert_url:

Mandatory tunables

Tunable Description Notes
SERVICE_NAME Name of the target service to stop on node. For example, containerd. For more information, go to service name.
GCP_PROJECT_ID Id of the GCP project that belong to the VM instances. All the VM instances must belong to a single GCP project. For more information, go to GCP project ID.
VM_INSTANCE_NAMES Name of the target VM instances. Provide the name of the target instance. For more information, go to target GCP instances.

Optional tunables

Tunable Description Notes
TOTAL_CHAOS_DURATION Duration that you specify, through which chaos is injected into the target resource (in seconds). Default: 60s. For more information, go to duration of the chaos.
LIB_IMAGE Image used to inject chaos. Default value is the experiment image. For more information, go to image used by the helper pod.
MASK Mask the target service like containerd. Supports 'enable' and 'disable'. For more information, go to service mask details.
NODE_LABEL Node label used to filter the target node if TARGET_NODE environment variable is not set. It is mutually exclusive with the TARGET_NODE environment variable. If both are provided, the fault uses TARGET_NODE. For more information, go to node label.
RAMP_TIME Period to wait before injecting chaos (in seconds). For example, 30 s. For more information, go to ramp time.
DEFAULT_HEALTH_CHECKDetermines if you wish to run the default health check which is present inside the fault. Default: 'true'. For more information, go to default health check.

Target GCP instances

It selects the target instance usingVM_INSTANCE_NAMES tunable in the given GCP_PROJECT_ID project.

GCP project ID: The project ID which is a unique identifier for a GCP project. Tune it by using the GCP_PROJECT_ID environment variable.

The following YAML snippet illustrates the use of this environment variable:

## details of the GCP instance
apiVersion: litmuschaos.io/v1alpha1
kind: ChaosEngine
metadata:
name: engine-nginx
spec:
engineState: "active"
chaosServiceAccount: litmus-admin
experiments:
- name: gcp-vm-service-kill
spec:
components:
env:
# comma-separated list of vm instance names
- name: VM_INSTANCE_NAMES
value: 'instance-01'
# GCP project ID to which vm instance belongs
- name: GCP_PROJECT_ID
value: 'project-id'

Target service

Name of the target service to kill onxd the specified VM instance. Tune it by using the SERVICE_NAME environment variable.

The following YAML snippet illustrates the use of this environment variable:

# kill the target service on the target VM instance
apiVersion: litmuschaos.io/v1alpha1
kind: ChaosEngine
metadata:
name: engine-nginx
spec:
engineState: "active"
annotationCheck: "false"
chaosServiceAccount: litmus-admin
experiments:
- name: gcp-vm-service-kill
spec:
components:
env:
# name of the target node
- name: SERVICE_NAME
value: 'containerd'
- name: VM_INSTANCE_NAMES
VALUE: 'instance-01'

Mask

You can also mask a service before stopping it. Tune it by using the MAKE environment variable with value enable to apply mask on the service.

The following YAML snippet illustrates the use of this environment variable:

# mask a service before stopping it
apiVersion: litmuschaos.io/v1alpha1
kind: ChaosEngine
metadata:
name: engine-nginx
spec:
engineState: "active"
annotationCheck: "false"
chaosServiceAccount: litmus-admin
experiments:
- name: gcp-vm-service-kill
spec:
components:
env:
- name: MASK
value: 'enable'
- name: SERVICE_NAME
VALUE: 'containerd'