Skip to main content

GCP VM Instance Status Check

Validates if a GCP Compute Engine VM instance is in Running state.

Infrastructure type

  • Kubernetes

Use cases

GCP VM Instance Status Check probe helps you:

  • Verify VM instances remain running during chaos experiments
  • Validate instance recovery after failures or restarts
  • Monitor VM health in multi-zone deployments
  • Ensure compute availability during infrastructure chaos

Overview

This probe uses the GCP CLI to query VM instance status and validates that the instance is in the 'RUNNING' state. It supports filtering by instance names or labels, making it flexible for various deployment scenarios.

Probe type

Command Probe

Prerequisites

  • Kubernetes cluster with chaos infrastructure installed
  • GCP credentials configured with appropriate IAM permissions:
    • compute.instances.get
    • compute.instances.list
  • Network connectivity to GCP API endpoints
  • Target VM instances should exist in the specified zones

Probe properties

Command

healthchecks -name gcp-vm-instance

Comparator

TypeCriteriaValue
stringcontains[Pass]

The probe passes when the command output contains [Pass], indicating the VM instance is in the 'RUNNING' state.

Environment variables

VariableDescriptionRequiredDefault
VM_INSTANCE_NAMESComma-separated list of VM instance names to check (e.g., instance-1,instance-2). One of VM_INSTANCE_NAMES or INSTANCE_LABEL must be specified.No*-
INSTANCE_LABELLabel of the VM instance to check (e.g., env=production). One of VM_INSTANCE_NAMES or INSTANCE_LABEL must be specified.No*-
GCP_PROJECT_IDGCP Project ID where the VM is located (e.g., my-project-123456).Yes-
ZONESComma-separated list of GCP zones where the VM is deployed (e.g., us-central1-a,us-central1-b).Yes-

Note: At least one of VM_INSTANCE_NAMES or INSTANCE_LABEL must be provided.


Run properties

PropertyDescriptionTypeDefault
timeoutMaximum time to wait for the probe to complete (e.g., 30s, 1m, 5m)String300s
intervalTime between probe executions (e.g., 5s, 30s, 1m)String10s
attemptNumber of retry attempts before marking the probe as failedInteger1
pollingIntervalTime between retry attempts (e.g., 1s, 5s, 10s)String-
initialDelayInitial delay before starting the probe (e.g., 0s, 10s, 30s)String-
stopOnFailureStop the experiment if the probe failsBooleanfalse
verbosityLog verbosity level (info, debug, trace)String-
retryNumber of times to retry the probe on failureInteger-

Probe definition

You can define this probe in your chaos experiment as follows:

Using instance names

probe:
- name: "gcp-vm-instance-health-check"
type: "cmdProbe"
mode: "Continuous"
cmdProbe/inputs:
command: "healthchecks -name gcp-vm-instance"
comparator:
type: "string"
criteria: "contains"
value: "[Pass]"
env:
- name: VM_INSTANCE_NAMES
value: "instance-1,instance-2,instance-3"
- name: GCP_PROJECT_ID
value: "my-project-123456"
- name: ZONES
value: "us-central1-a,us-central1-b"
runProperties:
timeout: 300s
interval: 10s
attempt: 1
stopOnFailure: false

Using instance labels

probe:
- name: "gcp-vm-label-check"
type: "cmdProbe"
mode: "Edge"
cmdProbe/inputs:
command: "healthchecks -name gcp-vm-instance"
comparator:
type: "string"
criteria: "contains"
value: "[Pass]"
env:
- name: INSTANCE_LABEL
value: "environment=production"
- name: GCP_PROJECT_ID
value: "my-project-123456"
- name: ZONES
value: "asia-south1-a"
runProperties:
timeout: 60s
interval: 5s
attempt: 3
pollingInterval: 2s