EC2 IO stress

EC2 IO stress disrupts the state of infrastructure resources. This fault:

Induces stress on AWS EC2 instance using Amazon SSM Run command. The SSM Run command is executed using SSM documentation that is built into the fault.
Causes IO stress on the EC2 instance for a specific duration.

EC2 IO Stress

Use cases

EC2 IO stress:

Simulates slower disk operations by the application.
Simulates noisy neighbour problems by hogging the disk bandwidth.
Verifies the disk performance on increasing IO threads and varying IO block sizes.
Checks how the application functions under high disk latency conditions, when IO traffic is high and includes large I/O blocks, and when other services monopolize the IO disks.

Prerequisites

Kubernetes >= 1.17
The EC2 instance should be in a healthy state.
SSM agent should be installed and running on the target EC2 instance.

The Kubernetes secret should have the AWS Access Key ID and Secret Access Key credentials in the CHAOS_NAMESPACE. Below is a sample secret file:

apiVersion: v1
kind: Secret
metadata:
  name: cloud-secret
type: Opaque
stringData:
  cloud_config.yml: |-
    # Add the cloud AWS credentials respectively
    [default]
    aws_access_key_id = XXXXXXXXXXXXXXXXXXX
    aws_secret_access_key = XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

tip

HCE recommends that you use the same secret name, that is, cloud-secret. Otherwise, you will need to update the AWS_SHARED_CREDENTIALS_FILE environment variable in the fault template with the new secret name and you won't be able to use the default health check probes.

Below is an example AWS policy to execute the fault.

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "ssm:GetDocument",
                "ssm:DescribeDocument",
                "ssm:GetParameter",
                "ssm:GetParameters",
                "ssm:SendCommand",
                "ssm:CancelCommand",
                "ssm:CreateDocument",
                "ssm:DeleteDocument",
                "ssm:GetCommandInvocation",
                "ssm:UpdateInstanceInformation",
                "ssm:DescribeInstanceInformation"
            ],
            "Resource": "*"
        },
        {
            "Effect": "Allow",
            "Action": [
                "ec2messages:AcknowledgeMessage",
                "ec2messages:DeleteMessage",
                "ec2messages:FailMessage",
                "ec2messages:GetEndpoint",
                "ec2messages:GetMessages",
                "ec2messages:SendReply"
            ],
            "Resource": "*"
        },
        {
            "Effect": "Allow",
            "Action": [
                "ec2:DescribeInstanceStatus",
                "ec2:DescribeInstances"
            ],
            "Resource": [
                "*"
            ]
        }
    ]
}

note

Go to AWS named profile for chaos to use a different profile for AWS faults, and the superset permission/policy to execute all AWS faults.

Mandatory tunables

Tunable	Description	Notes
EC2_INSTANCE_ID	ID of the target EC2 instance.	For example, `i-044d3cb4b03b8af1f`. For more information, go to EC2 instance ID.
REGION	The AWS region ID where the EC2 instance has been created.	For example, `us-east-1`.

Optional tunables

Tunable	Description	Notes
TOTAL_CHAOS_DURATION	Duration to insert chaos (in seconds).	Default: 30 s. For more information, go to duration of the chaos.
CHAOS_INTERVAL	Time interval between two successive instance terminations (in seconds).	Default: 60 s. For more information, go to chaos interval.
AWS_SHARED_CREDENTIALS_FILE	Path to the AWS secret credentials.	Default: `/tmp/cloud_config.yml`.
INSTALL_DEPENDENCIES	Install dependencies used to run IO chaos. It can be 'True' or 'False'.	If the dependency already exists, you can turn it off. Defaults to True.
FILESYSTEM_UTILIZATION_PERCENTAGE	Specify the size as percentage of free space on the file system.	Default: 0 %. Results in 1 GB utilization. For more information, go to filesystem utilization in percentage.
FILESYSTEM_UTILIZATION_BYTES	Specify the size in gigabytes(GB). `FILESYSTEM_UTILIZATION_PERCENTAGE` and `FILESYSTEM_UTILIZATION_BYTES` are mutually exclusive. If both are provided, `FILESYSTEM_UTILIZATION_PERCENTAGE` is prioritized.	Default: 0 GB. Results in 1 GB Utilization. For more information, go to filesystem utilization in MB.
NUMBER_OF_WORKERS	Number of IO workers involved in IO stress.	Default: 4. For more information, go to workers.
VOLUME_MOUNT_PATH	Fill the given volume mount path.	Default: User HOME directory. For more information, go to volume mount path.
SEQUENCE	Sequence of chaos execution for multiple instances.	Default: parallel. Supports serial and parallel. For more information, go to sequence of chaos execution.
RAMP_TIME	Period to wait before and after injecting chaos (in seconds).	For example, 30 s. For more information, go to ramp time.

File system utilization in megabytes

Amount of file system that is utilized on the EC2 instance (in megabytes). Tune it by using the FILESYSTEM_UTILIZATION_BYTES environment variable.