Chaos faults for AWS

Introduction

AWS faults disrupt the resources running on different AWS services from the EKS cluster. To perform such AWS chaos experiments, you will need to authenticate CE with the AWS platform. This can be done in two ways.

Using secrets: You can use secrets to authenticate CE with AWS regardless of whether the Kubernetes cluster is used for the deployment. This is Kubernetes' native way of authenticating CE with AWS.
IAM integration: You can authenticate CE using AWS using IAM when you have deployed chaos on the EKS cluster. You can associate an IAM role with a Kubernetes service account. This service account can be used to provide AWS permissions to the experiment pod which uses the particular service account.

Here are AWS faults that you can execute and validate.

ALB AZ down

ALB AZ down takes down the AZ (Availability Zones) on a target application load balancer for a specific duration.

availabilityload balancer

CLB AZ down

CLB AZ down takes down the AZ (Availability Zones) on a target CLB for a specific duration.

availabilityload balancer

EBS loss by ID

EBS loss by ID disrupts the state of EBS volume by detaching it from the node (or EC2) instance using volume ID for a certain duration.

lossid

EBS loss by tag

EBS loss by tag disrupts the state of EBS volume by detaching it from the node (or EC2) instance using volume ID for a certain duration.

losstag

EC2 CPU hog

EC2 CPU hog disrupts the state of infrastructure resources. It induces stress on the AWS ECS container using Amazon SSM Run command, which is carried out using SSM docs which is in-built into the fault.

cpustress

EC2 DNS chaos

EC2 DNS chaos causes DNS errors on the specified EC2 instance for a specific duration.

dns

Page 1 of 9

ALB AZ down

ALB AZ down takes down the AZ (Availability Zones) on a target application load balancer for a specific duration. This fault restricts access to certain availability zones for a specific duration.

Use cases

Tests the application sanity, availability, and recovery workflows of the application pod attached to the load balancer.
ALB AZ down fault breaks the connectivity of an ALB with the given zones and impacts their delivery.
Detaching the AZ from the application load balancer disrupts the application's performance.

Introduction​

ALB AZ down​

CLB AZ down​

EBS loss by ID​

EBS loss by tag​

EC2 CPU hog​

EC2 DNS chaos​

EC2 HTTP latency​

EC2 HTTP modify body​

EC2 HTTP modify header​

EC2 HTTP reset peer​

EC2 HTTP status code​

EC2 IO stress​

EC2 memory hog​

EC2 network latency​

EC2 network loss​

EC2 process kill​

EC2 stop by ID​

EC2 stop by tag​

ECS agent stop​

ECS container CPU hog​

ECS container HTTP latency​

ECS container HTTP modify body​

ECS container HTTP reset peer​

ECS container HTTP status code​

ECS container IO stress​

ECS container memory hog​

ECS container network latency​

ECS container network loss​

ECS container volume detach​

ECS Fargate CPU Hog​

ECS Fargate memory hog​

ECS instance stop​

ECS invalid container image​

ECS network restrict​

ECS task scale​

ECS task stop​

ECS update container resource limit​

ECS update container timeout​

ECS update task role​

Lambda delete event source mapping​

Lambda delete function concurrency​

Lambda toggle event mapping state​

Lambda update function memory​

Lambda update function timeout​

Lambda update role permission​

NLB AZ down​

RDS instance delete​

RDS instance reboot​

Resource access restrict​

SSM chaos by ID​

SSM chaos by tag​

Windows EC2 blackhole chaos​

Windows EC2 CPU hog​

Windows EC2 memory hog​

Introduction

ALB AZ down

CLB AZ down

EBS loss by ID

EBS loss by tag

EC2 CPU hog

EC2 DNS chaos

EC2 HTTP latency

EC2 HTTP modify body

EC2 HTTP modify header

EC2 HTTP reset peer

EC2 HTTP status code

EC2 IO stress

EC2 memory hog

EC2 network latency

EC2 network loss

EC2 process kill

EC2 stop by ID

EC2 stop by tag

ECS agent stop

ECS container CPU hog

ECS container HTTP latency

ECS container HTTP modify body

ECS container HTTP reset peer

ECS container HTTP status code

ECS container IO stress

ECS container memory hog

ECS container network latency

ECS container network loss

ECS container volume detach

ECS Fargate CPU Hog

ECS Fargate memory hog

ECS instance stop

ECS invalid container image

ECS network restrict

ECS task scale

ECS task stop

ECS update container resource limit

ECS update container timeout

ECS update task role

Lambda delete event source mapping

Lambda delete function concurrency

Lambda toggle event mapping state

Lambda update function memory

Lambda update function timeout

Lambda update role permission

NLB AZ down

RDS instance delete

RDS instance reboot

Resource access restrict

SSM chaos by ID

SSM chaos by tag

Windows EC2 blackhole chaos

Windows EC2 CPU hog

Windows EC2 memory hog