GCP VM disk loss by label
GCP VM disk loss by label disrupts the state of GCP persistent disk volume filtered using a label by detaching it from its VM instance for a specific duration.
Use cases
GCP VM disk loss by label fault:
- Determines the resilience of the GKE infrastructure.
- Determines how quickly a node can recover when a persistent disk volume is detached from the VM instance associated with it.
Prerequisites
- Kubernetes > 1.16
- Service account should have editor access (or owner access) to the GCP project.
- Target disk volume should not be a boot disk of any VM instance.
- Disk volumes with the target label should be attached to their respective instances.
- Kubernetes secret should have the GCP service account credentials in the default namespace. Refer generate the necessary credentials in order to authenticate your identity with the Google Cloud Platform (GCP) docs for more information.
apiVersion: v1
kind: Secret
metadata:
name: cloud-secret
type: Opaque
stringData:
type:
project_id:
private_key_id:
private_key:
client_email:
client_id:
auth_uri:
token_uri:
auth_provider_x509_cert_url:
client_x509_cert_url:
Mandatory tunables
Tunable | Description | Notes |
---|---|---|
GCP_PROJECT_ID | Id of the GCP project containing the disk volumes. | All the target disk volumes should belong to a single GCP project. For more information, go to GCP project ID. |
DISK_VOLUME_LABEL | Label of the target non-boot persistent disk volume. | This value is provided as a key:value pair or as a key if the corresponding value is empty. For example, disk:target-disk . For more information, go to detach volumes by label. |
ZONES | The zone of the target disk volumes. | Only one zone is provided, which indicates that all target disks reside in the same zone. For more information, go to zones. |
Optional tunables
Tunable | Description | Notes |
---|---|---|
TOTAL_CHAOS_DURATION | Duration that you specify, through which chaos is injected into the target resource (in seconds). | Defaults to 30s. For more information, go to duration of the chaos. |
CHAOS_INTERVAL | Time interval between two successive chaos iterations (in seconds). | Defaults to 30s. For more information, go to chaos interval. |
DISK_AFFECTED_PERC | Percentage of total disks that are filtered using the target label (specify numeric values only). | Defaults to 0 (that corresponds to 1 disk). |
SEQUENCE | Sequence of chaos execution for multiple target disks. | Defaults to parallel. It supports serial sequence as well. For more information, go to sequence of chaos execution. |
RAMP_TIME | Period to wait before and after injecting chaos (in seconds). | For example, 30s. For more information, go to ramp time. |
DEFAULT_HEALTH_CHECK | Determines if you wish to run the default health check which is present inside the fault. | Default: 'true'. For more information, go to default health check. |
IAM permissions
Listed below are the IAM permissions leveraged by the fault:
compute.disks.get
compute.instances.attachDisk
compute.instances.detachDisk
compute.disks.list
compute.instances.get
Detach volumes by label
The label of disk volumes subject to disk loss. It detaches all the disks with the DISK_VOLUME_LABEL
label in the ZONES
zone within the GCP_PROJECT_ID
project. It re-attaches the disk volume after waiting for the duration specified by TOTAL_CHAOS_DURATION
environment variable.
GCP project ID: The project ID which is a unique identifier for a GCP project. Tune it by using the GCP_PROJECT_ID
environment variable.
Zones: The zone of the disk volumes subject to the fault. Tune it by using the ZONES
environment variable.
Note: The DISK_VOLUME_LABEL
accepts only one label and ZONES
accepts only one zone name. Therefore, all the disks must reside in the same zone.
The following YAML snippet illustrates the use of this environment variable:
apiVersion: litmuschaos.io/v1alpha1
kind: ChaosEngine
metadata:
name: engine-nginx
spec:
engineState: "active"
chaosServiceAccount: litmus-admin
experiments:
- name: gcp-vm-disk-loss-by-label
spec:
components:
env:
- name: DISK_VOLUME_LABEL
value: 'disk:target-disk'
- name: ZONES
value: 'us-east1-b'
- name: GCP_PROJECT_ID
value: 'my-project-4513'
- name: TOTAL_CHAOS_DURATION
VALUE: '60'