Skip to main content

GCP SQL Instance Status Check

Last updated on

GCP SQL Instance Status Check is a built-in Command Probe template that validates whether a Google Cloud SQL instance is in the running state during a chaos experiment. Use it to assert that a managed database stays available and recovers cleanly while a fault disrupts the instance or its dependencies.

The probe runs the healthchecks utility bundled in the chaos probe image, queries the Cloud SQL Admin API, and prints [Pass] when the instance is in the RUNNABLE state. The comparator marks the probe as passed when the output contains [Pass].

Built-in probe template

This is a built-in Command Probe template that runs on Kubernetes chaos infrastructure. Add it to an experiment from the probe library and customize its inputs. Go to Built-in probe templates to browse the full library, or go to Command probe to understand how command probes work.


Use cases

Use this probe template to:

  • Verify that Cloud SQL instances stay available during chaos experiments.
  • Validate database failover behavior and recovery.
  • Monitor database health during network disruptions.
  • Confirm database availability during infrastructure changes.

How the probe works

The template configures a Command Probe that runs healthchecks -name gcp-sql-instance. The utility resolves the instance named in SQL_INSTANCE_NAME in the supplied GCP_PROJECT_ID, calls the Cloud SQL Admin API, and prints [Pass] when the instance is in the RUNNABLE state. The comparator passes the probe when the output contains [Pass], and fails it otherwise.


Prerequisites

  • Chaos infrastructure: A Kubernetes chaos infrastructure with network access to the Cloud SQL Admin API endpoints.
  • GCP credentials: Cloud credentials available to the chaos infrastructure, with the permissions listed below.
  • Target instance exists: The instance named in SQL_INSTANCE_NAME exists in GCP_PROJECT_ID.

Permissions required

The service account used by the probe needs the following Cloud SQL permissions:

  • cloudsql.instances.get
  • cloudsql.instances.list

The probe uses the GCP credentials available to your chaos infrastructure. Go to GCP IAM integration to grant access, or go to prepare a secret for GCP to provide service account credentials as a secret.


Probe properties

Command

healthchecks -name gcp-sql-instance

Comparator

TypeCriteriaValue
stringcontains[Pass]

The probe passes when the command output contains [Pass], which indicates that the Cloud SQL instance is in the RUNNABLE state.

Environment variables

VariableDescriptionRequiredDefault
SQL_INSTANCE_NAMEName of the Cloud SQL instance to check (for example, my-sql-instance).Yes-
GCP_PROJECT_IDGCP project ID where the instance is located (for example, my-project-123456).Yes-

Run properties

PropertyDescriptionTypeDefault
timeoutMaximum time to wait for the probe to complete (for example, 30s, 1m, 5m).String300s
intervalTime between probe executions (for example, 5s, 30s, 1m).String10s
attemptNumber of retry attempts before the probe is marked as failed.Integer1
pollingIntervalTime between retry attempts (for example, 1s, 5s, 10s).String-
initialDelayInitial delay before the probe starts (for example, 0s, 10s, 30s).String-
stopOnFailureStop the experiment if the probe fails.Booleanfalse
verbosityLog verbosity level (info, debug, trace).String-
retryNumber of times to retry the probe on failure.Integer-

Troubleshooting

GCP SQL Instance Status Check probe fails with a permission denied error

The service account available to the chaos infrastructure does not have the required Cloud SQL permissions. Confirm that the service account has cloudsql.instances.get and cloudsql.instances.list (for example through the Cloud SQL Viewer role) on the project named in GCP_PROJECT_ID.

GCP SQL Instance Status Check probe reports the instance was not found

The instance did not resolve in the supplied project. Verify that SQL_INSTANCE_NAME matches the Cloud SQL instance name exactly and that GCP_PROJECT_ID is the project that owns the instance.

GCP SQL Instance Status Check probe times out before the instance reaches RUNNABLE

The instance did not return to the RUNNABLE state within the probe timeout. Cloud SQL maintenance, failover, or restart operations can leave the instance in a transient state. Increase the run-property timeout and retry count, and confirm that the fault recovery step restores the instance.