Skip to main content

GCP VM Instance Status Check

Last updated on

GCP VM Instance Status Check is a built-in Command Probe template that validates whether one or more Google Cloud Compute Engine VM instances are in the RUNNING state during a chaos experiment. You select instances by name or by label, which makes the probe flexible across static fleets and dynamically scaled deployments.

The probe runs the healthchecks utility bundled in the chaos probe image, queries the Compute Engine API, and prints [Pass] when every targeted instance is in the RUNNING state. The comparator marks the probe as passed when the output contains [Pass].

Built-in probe template

This is a built-in Command Probe template that runs on Kubernetes chaos infrastructure. Add it to an experiment from the probe library and customize its inputs. Go to Built-in probe templates to browse the full library, or go to Command probe to understand how command probes work.


Use cases

Use this probe template to:

  • Verify that VM instances stay in the RUNNING state during chaos experiments.
  • Validate instance recovery after failures or restarts.
  • Monitor VM health in multi-zone deployments.
  • Confirm compute availability during infrastructure chaos.

How the probe works

The template configures a Command Probe that runs healthchecks -name gcp-vm-instance. The utility resolves the target instances from VM_INSTANCE_NAMES or INSTANCE_LABEL in the supplied GCP_PROJECT_ID and ZONES, calls the Compute Engine API, and prints [Pass] when every resolved instance is in the RUNNING state. The comparator passes the probe when the output contains [Pass], and fails it otherwise.


Prerequisites

  • Chaos infrastructure: A Kubernetes chaos infrastructure with network access to the Google Cloud Compute Engine API endpoints.
  • GCP credentials: Cloud credentials available to the chaos infrastructure, with the permissions listed below.
  • Target instances exist: Every value in VM_INSTANCE_NAMES or INSTANCE_LABEL resolves to an instance in GCP_PROJECT_ID and ZONES.

Permissions required

The service account used by the probe needs the following Compute Engine permissions:

  • compute.instances.get
  • compute.instances.list

The probe uses the GCP credentials available to your chaos infrastructure. Go to GCP IAM integration to grant access, or go to prepare a secret for GCP to provide service account credentials as a secret.


Probe properties

Command

healthchecks -name gcp-vm-instance

Comparator

TypeCriteriaValue
stringcontains[Pass]

The probe passes when the command output contains [Pass], which indicates that every targeted VM instance is in the RUNNING state.

Environment variables

VariableDescriptionRequiredDefault
VM_INSTANCE_NAMESComma-separated list of VM instance names to check (for example, instance-1,instance-2). Provide this or INSTANCE_LABEL.Conditional-
INSTANCE_LABELLabel of the VM instances to check (for example, env=production). Provide this or VM_INSTANCE_NAMES.Conditional-
GCP_PROJECT_IDGCP project ID where the VM is located (for example, my-project-123456).Yes-
ZONESComma-separated list of GCP zones where the VM is deployed (for example, us-central1-a,us-central1-b).Yes-
Instance selection

Provide at least one of VM_INSTANCE_NAMES or INSTANCE_LABEL.


Run properties

PropertyDescriptionTypeDefault
timeoutMaximum time to wait for the probe to complete (for example, 30s, 1m, 5m).String300s
intervalTime between probe executions (for example, 5s, 30s, 1m).String10s
attemptNumber of retry attempts before the probe is marked as failed.Integer1
pollingIntervalTime between retry attempts (for example, 1s, 5s, 10s).String-
initialDelayInitial delay before the probe starts (for example, 0s, 10s, 30s).String-
stopOnFailureStop the experiment if the probe fails.Booleanfalse
verbosityLog verbosity level (info, debug, trace).String-
retryNumber of times to retry the probe on failure.Integer-

Troubleshooting

GCP VM Instance Status Check probe fails with a permission denied error

The service account available to the chaos infrastructure does not have the required Compute Engine permissions. Confirm that the service account has compute.instances.get and compute.instances.list (for example through the Compute Viewer role) on the project named in GCP_PROJECT_ID.

GCP VM Instance Status Check probe reports the instance was not found

The instance name or label did not resolve in the supplied project and zones. Verify that the values in VM_INSTANCE_NAMES or INSTANCE_LABEL are correct, that GCP_PROJECT_ID is the project that owns the instances, and that ZONES lists every zone where the instances run.

GCP VM Instance Status Check probe times out before the instance reaches RUNNING

The instance did not return to the RUNNING state within the probe timeout. Increase the run-property timeout and the retry count, and confirm that the fault recovery step restores the instance. Inspect the chaos pod logs to see the last observed instance state.