GCP SQL Instance Failover

GCP SQL Instance Failover disrupts the state of GCP SQL instance filtered using a name and project ID by triggering failover on the SQL instance.

GCP VM Disk Loss By Label

Use cases

GCP SQL instance failover fault:

Determines the resilience of the GKE infrastructure.
Determines how quickly an SQL Instance can recover when a failover on one of the replicas is triggered.

Prerequisites

Kubernetes > 1.16
Service account should have editor access (or owner access) to the GCP project.
High Availability should be enabled on target GCP SQL Instance
Kubernetes secret should have the GCP service account credentials in the default namespace. Refer generate the necessary credentials in order to authenticate your identity with the Google Cloud Platform (GCP) docs for more information.

apiVersion: v1
kind: Secret
metadata:
  name: cloud-secret
type: Opaque
stringData:
  type:
  project_id:
  private_key_id:
  private_key:
  client_email:
  client_id:
  auth_uri:
  token_uri:
  auth_provider_x509_cert_url:
  client_x509_cert_url:

Mandatory tunables

Tunable	Description	Notes
GCP_PROJECT_ID	Id of the GCP project containing the SQL Instance.	Target SQL Instance should belong to this GCP project. For more information, go to GCP project ID.
SQL_INSTANCE_NAME	Name of the target GCP SQL Instance.	For more information, go to SQL INSTANCE NAME.

Optional tunables

Tunable	Description	Notes
TOTAL_CHAOS_DURATION	Duration that you specify, through which chaos is injected into the target resource (in seconds).	Defaults to 30s. For more information, go to duration of the chaos.
CHAOS_INTERVAL	Time interval between two successive chaos iterations (in seconds).	Defaults to 30s. For more information, go to chaos interval.
SEQUENCE	Sequence of chaos execution for multiple target disks.	Defaults to parallel. It supports serial sequence as well. For more information, go to sequence of chaos execution.
RAMP_TIME	Period to wait before and after injecting chaos (in seconds).	For example, 30s. For more information, go to ramp time.
DEFAULT_HEALTH_CHECK	Determines if you wish to run the default health check which is present inside the fault.	Default: 'true'. For more information, go to default health check.

IAM permissions

Listed below are the IAM permissions leveraged by the fault:

cloudsql.instances.failover
cloudsql.instances.list

Failover SQL Instance by name

The name of SQL Instance subject to Failover. It triggers failover on the sql instances with the provided name under SQL_INSTANCE_NAME within the GCP_PROJECT_ID project. It waits for the failover to complete & target instance to come in RUNNING state again in different zone.

GCP project ID: The project ID which is a unique identifier for a GCP project. Tune it by using the GCP_PROJECT_ID environment variable.

The following YAML snippet illustrates the use of this environment variable:

apiVersion: litmuschaos.io/v1alpha1
kind: ChaosEngine
metadata:
  name: engine-nginx
spec:
  engineState: "active"
  chaosServiceAccount: litmus-admin
  experiments:
    - name: gcp-sql-instance-failover
      image: docker.io/harness/chaos-go-runner:main-latest
      imagePullPolicy: Always
      args:
        - -c
        - ./experiments -name gcp-sql-instance-failover
      command:
        - /bin/bash
      components:
        env:
          - name: TOTAL_CHAOS_DURATION
            value: "30"
          - name: SQL_INSTANCE_NAME
            value: "test-sql-instance"
          - name: GCP_PROJECT_ID
            value: "sample-project-id"
          - name: DEFAULT_HEALTH_CHECK
            value: "false"

Use cases​

Prerequisites​

Mandatory tunables​

Optional tunables​

IAM permissions​

Failover SQL Instance by name​