GCP SQL Instance Failover
GCP SQL Instance Failover disrupts the state of GCP SQL instance filtered using a name and project ID by triggering failover on the SQL instance.
Use cases
GCP SQL instance failover fault:
- Determines the resilience of the GKE infrastructure.
- Determines how quickly an SQL Instance can recover when a failover on one of the replicas is triggered.
Prerequisites
- Kubernetes > 1.16
- Service account should have editor access (or owner access) to the GCP project.
- High Availability should be enabled on target GCP SQL Instance
- Kubernetes secret should have the GCP service account credentials in the default namespace. Refer generate the necessary credentials in order to authenticate your identity with the Google Cloud Platform (GCP) docs for more information.
apiVersion: v1
kind: Secret
metadata:
name: cloud-secret
type: Opaque
stringData:
type:
project_id:
private_key_id:
private_key:
client_email:
client_id:
auth_uri:
token_uri:
auth_provider_x509_cert_url:
client_x509_cert_url:
Mandatory tunables
Tunable | Description | Notes |
---|---|---|
GCP_PROJECT_ID | Id of the GCP project containing the SQL Instance. | Target SQL Instance should belong to this GCP project. For more information, go to GCP project ID. |
SQL_INSTANCE_NAME | Name of the target GCP SQL Instance. | For more information, go to SQL INSTANCE NAME. |
Optional tunables
Tunable | Description | Notes |
---|---|---|
TOTAL_CHAOS_DURATION | Duration that you specify, through which chaos is injected into the target resource (in seconds). | Defaults to 30s. For more information, go to duration of the chaos. |
CHAOS_INTERVAL | Time interval between two successive chaos iterations (in seconds). | Defaults to 30s. For more information, go to chaos interval. |
SEQUENCE | Sequence of chaos execution for multiple target disks. | Defaults to parallel. It supports serial sequence as well. For more information, go to sequence of chaos execution. |
RAMP_TIME | Period to wait before and after injecting chaos (in seconds). | For example, 30s. For more information, go to ramp time. |
DEFAULT_HEALTH_CHECK | Determines if you wish to run the default health check which is present inside the fault. | Default: 'true'. For more information, go to default health check. |
IAM permissions
Listed below are the IAM permissions leveraged by the fault:
cloudsql.instances.failover
cloudsql.instances.list
Failover SQL Instance by name
The name of SQL Instance subject to Failover. It triggers failover on the sql instances with the provided name under SQL_INSTANCE_NAME
within the GCP_PROJECT_ID
project. It waits for the failover to complete & target instance to come in RUNNING state again in different zone.
GCP project ID: The project ID which is a unique identifier for a GCP project. Tune it by using the GCP_PROJECT_ID
environment variable.
The following YAML snippet illustrates the use of this environment variable:
apiVersion: litmuschaos.io/v1alpha1
kind: ChaosEngine
metadata:
name: engine-nginx
spec:
engineState: "active"
chaosServiceAccount: litmus-admin
experiments:
- name: gcp-sql-instance-failover
image: docker.io/harness/chaos-go-runner:main-latest
imagePullPolicy: Always
args:
- -c
- ./experiments -name gcp-sql-instance-failover
command:
- /bin/bash
components:
env:
- name: TOTAL_CHAOS_DURATION
value: "30"
- name: SQL_INSTANCE_NAME
value: "test-sql-instance"
- name: GCP_PROJECT_ID
value: "sample-project-id"
- name: DEFAULT_HEALTH_CHECK
value: "false"