The NLB (Network Load Balancer) AZ (Availability Zone) down fault triggers the unavailability of an AZ on a target network load balancer, resulting in potential disruptions to service delivery. This fault deliberately restricts access to specific availability zones by blocking the subnet ACL (Access Control List) for a defined duration. By simulating this scenario, you can assess the resilience and performance of your system when faced with an inaccessible AZ.
- With this experiment, you can evaluate the application's behavior and assess its ability to handle and recover from a scenario where traffic from a particular AZ is blocked.
- It conducts an application test by deliberately blocking traffic originating from a specific AZ on the network load balancer. This experiment involves intentionally preventing incoming and outgoing traffic from the designated AZ from reaching the application through the load balancer.
- Kubernetes >= 1.17
- ECS cluster running with the desired tasks and containers and familiarity with ECS service update and deployment concepts.
- Create a Kubernetes secret that has the AWS access configuration(key) in the
CHAOS_NAMESPACE. Below is a sample secret file:
# Add the cloud AWS credentials respectively
aws_access_key_id = XXXXXXXXXXXXXXXXXXX
aws_secret_access_key = XXXXXXXXXXXXXXX
It is recommended to use the same secret name, that is,
cloud-secret. Otherwise, you will need to update the
AWS_SHARED_CREDENTIALS_FILE environment variable in the fault template and you may be unable to use the default health check probes.
Here is an example AWS policy to execute the fault.
|LOAD_BALANCER_ARN||Target load balancer ARN whose AZ should be detached|| For example, |
|ZONES||Target zones that should be detached from the NLB|| For example, |
|REGION||Region name for the target volumes|| For example, |
|TOTAL_CHAOS_DURATION||Duration to insert chaos (in seconds)||Default: 30 s. For more information, go to duration of the chaos.|
|CHAOS_INTERVAL||Duration between the attachment and detachment of the volumes (in seconds)||Default: 30 s. For more information, go to chaos interval.|
|SEQUENCE||Sequence of chaos execution for multiple volumes||Default: parallel. Supports serial and parallel. For more information, go to sequence of chaos execution.|
|RAMP_TIME||Duration to wait before and after injecting chaos (in seconds)||For example, 30 s. For more information, go to ramp time.|
Comma-separated list of target zones. Tune it by using the
ZONES environment variable.
The following YAML snippet illustrates the use of this environment variable:
# contains nlb az down for given zones
- name: nlb-az-down
# load balancer arn for chaos
- name: LOAD_BALANCER_ARN
# target zones for the chaos
- name: ZONES
# region for chaos
- name: REGION