Dynamo DB replication pause
DynamoDB replication pause fault pauses the data replication in DynamoDB tables over multiple locations for the chaos duration.
- When chaos experiment is being executed, any changes to the DynamoDB table will not be replicated in different regions, thereby making the data in the DynamoDB inconsistent.
- You can execute this fault on a DynamoDB table that is global, that is, there should be more than one replica of the table.
Use cases
DynamoDB replication pause determines the resilience of the application when data (in a database) that needs to be constantly updated is disrupted.
Prerequisites
- Kubernetes >= 1.17
- DynamoDB should be up and running.
- The derived table name should be a global table.
- The Kubernetes secret should have AWS access configuration (key) in the
CHAOS_NAMESPACE
. A sample secret file looks like:apiVersion: v1
kind: Secret
metadata:
name: cloud-secret
type: Opaque
stringData:
cloud_config.yml: |-
# Add the cloud AWS credentials respectively
[default]
aws_access_key_id = XXXXXXXXXXXXXXXXXXX
aws_secret_access_key = XXXXXXXXXXXXXXX
HCE recommends that you use the same secret name, that is, cloud-secret
. Otherwise, you will need to update the AWS_SHARED_CREDENTIALS_FILE
environment variable in the fault template and you won't be able to use the default health check probes.
Below is an example AWS policy to execute the fault.
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"fis:CreateExperimentTemplate",
"fis:StartExperiment",
"fis:StopExperiment",
"fis:GetExperiment",
"fis:ListExperiments"
],
"Resource": "*"
},
{
"Effect": "Allow",
"Action": [
"dynamodb:DescribeTable"
],
"Resource": [
"arn:aws:dynamodb:*:*:table/*"
]
},
{
"Effect": "Allow",
"Action": [
"cloudwatch:DescribeAlarms"
],
"Resource": "*"
}
]
}
- Go to AWS named profile for chaos to use a different profile for AWS faults and superset permission or policy to execute all AWS faults.
- Go to the common tunables and AWS-specific tunables to tune the common tunables for all faults and AWS-specific tunables.
Mandatory tunables
Tunable | Description | Notes |
---|---|---|
TABLE_NAMES | Name of the table (or comma-separated name of tables) in the DynamoDB on which you apply chaos. | For example, "table-name1,table-name-2,..". |
COUNT | Number of iterations of the chaos experiment to run. | For example: 2. |
STOP_CONDITION_CLOUDWATCH_ALARMS | Comma-separated ARN (Amazon Resource Names) ARN of the cloudwatch alarm that is used as the stop condition by the fault. | For example, "arn-1,arn-2,..". |
FIS_ACCOUNT_ID | Amazon FIS account used by DynamoDB. | |
FIS_ROLE_ARN | Provide the role ARN that you want to update. | For example: "arn:aws:iam:234567901244:role/CustomFISRole" |
REGION | Region name for the target volumes | For example, us-east-1 . |
Optional tunables
Tunable | Description | Notes |
---|---|---|
TOTAL_CHAOS_DURATION | Time duration for chaos insertion (sec) | Default: 30 s. For more information, go to duration of the chaos. |
CHAOS_INTERVAL | The time duration between the attachment and detachment of the volumes (sec) | Default: 30 s. For more information, go to chaos interval. |
RAMP_TIME | Period to wait before and after injection of chaos in sec | For example, 30 s. For more information, go to ramp time. |
AWS_SHARED_CREDENTIALS_FILE | Path to the AWS secret credentials. | Default: /tmp/cloud_config.yml . |
Table names
Name of the DynamoDB global table on which you apply chaos. You can provide multiple table names as comma-separated values. Tune it using the TABLE_NAMES
environment variable.
The following YAML snippet illustrates the use of this environment variable:
kind: Workflow
apiVersion: argoproj.io/v1alpha1
metadata:
name: fis-dynamodb-replication-pause
namespace: hce
spec:
- name: fis-dynamodb-replication-pause
components:
env:
- name: TOTAL_CHAOS_DURATION
value: "60"
- name: TABLE_NAMES
value: "name-1,name-2"
- name: REGION
value: "us-east"
- name: AWS_SHARED_CREDENTIALS_FILE
value: /tmp/cloud_config.yml
secrets:
- name: cloud-secret
mountPath: /tmp/
Count
Number of iterations of the chaos experiment to execute. The DynamoDB replication is paused these many times. Tune it using the COUNT
environment variable.
The following YAML snippet illustrates the use of this environment variable:
kind: Workflow
apiVersion: argoproj.io/v1alpha1
metadata:
name: fis-dynamodb-replication-pause
namespace: hce
spec:
- name: fis-dynamodb-replication-pause
components:
env:
- name: TOTAL_CHAOS_DURATION
value: "60"
- name: COUNT
value: 3
- name: AWS_SHARED_CREDENTIALS_FILE
value: /tmp/cloud_config.yml
secrets:
- name: cloud-secret
mountPath: /tmp/
Stop condition cloudwatch alarms
Comma-separated ARN (Amazon Resource Names) of the cloudwatch alarm that is used as the stop condition by the fault. Tune it by using the STOP_CONDITION_CLOUDWATCH_ALARMS
environment variable.
The following YAML snippet illustrates the use of this environment variable:
kind: Workflow
apiVersion: argoproj.io/v1alpha1
metadata:
name: fis-dynamodb-replication-pause
namespace: hce
spec:
- name: fis-dynamodb-replication-pause
components:
env:
- name: TOTAL_CHAOS_DURATION
value: "60"
- name: STOP_CONDITION_CLOUDWATCH_ALARMS
value: "arn-1,arn-2,.."
- name: AWS_SHARED_CREDENTIALS_FILE
value: /tmp/cloud_config.yml
secrets:
- name: cloud-secret
mountPath: /tmp/
FIS details
- FIS account ID: Amazon FIS account used by DynamoDB. Tune it by using the
FIS_ACCOUNT_ID
environment variable. - FIS role ARN: Role ARN that you want to update. Tune it by using the
FIS_ROLE_ARN
environment variable.
The following YAML snippet illustrates the use of this environment variable:
kind: Workflow
apiVersion: argoproj.io/v1alpha1
metadata:
name: fis-dynamodb-replication-pause
namespace: hce
spec:
- name: fis-dynamodb-replication-pause
components:
env:
- name: TOTAL_CHAOS_DURATION
value: "60"
- name: FIS_ACCOUNT_ID
value: "jane-doe"
- name: FIS_ROLE_ARN
value: "arn:aws:iam:234567901244:role/CustomFISRole"
- name: AWS_SHARED_CREDENTIALS_FILE
value: /tmp/cloud_config.yml
secrets:
- name: cloud-secret
mountPath: /tmp/