Skip to main content

EC2 IO stress

EC2 IO stress disrupts the state of infrastructure resources.

  • The fault induces stress on AWS EC2 instance using Amazon SSM Run command that is carried out using the SSM docs that comes in-built in the fault.
  • It causes IO stress on the EC2 instance for a certain duration.

EC2 IO Stress

Usage

View fault usage
Failure in file system read and write impacts the delivery, which is also known as "noisy neighbour' problems. It simulates slower disk operations by the application and nosiy neighbour problems by hogging the disk bandwidth. It also verifies the disk performance on increasing I/O threads and varying I/O block sizes. It checks if the application functions under high disk latency conditions, when I/O traffic is very high and includes large I/O blocks, and when other services monopolize the I/O disks. Injecting a rogue process into an EC2 instance may starve the main processes (or applications) (typically pid 1) of the resources allocated to it. This may slow down the application traffic or exhaust the resources resulting in degradation of the performance of the application. These faults determine the resilience of the application that undergo this stress.

Prerequisites

  • Kubernetes >= 1.17
  • Ensure that the SSM agent is installed and running in the target EC2 instance.
  • Ensure to create a Kubernetes secret having the AWS Access Key ID and Secret Access Key credentials in the CHAOS_NAMESPACE. Below is the sample secret file:
apiVersion: v1
kind: Secret
metadata:
name: cloud-secret
type: Opaque
stringData:
cloud_config.yml: |-
# Add the cloud AWS credentials respectively
[default]
aws_access_key_id = XXXXXXXXXXXXXXXXXXX
aws_secret_access_key = XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
  • If you change the secret name, ensure that you update the experiment.yml environment variable for deriving the respective data from the secret. Also account for the path at which this secret is mounted as a file in the manifest environment variable AWS_SHARED_CREDENTIALS_FILE.

Permissions required

Here is an example AWS policy to execute the fault.

View policy for the fault
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"ssm:GetDocument",
"ssm:DescribeDocument",
"ssm:GetParameter",
"ssm:GetParameters",
"ssm:SendCommand",
"ssm:CancelCommand",
"ssm:CreateDocument",
"ssm:DeleteDocument",
"ssm:GetCommandInvocation",
"ssm:UpdateInstanceInformation",
"ssm:DescribeInstanceInformation"
],
"Resource": "*"
},
{
"Effect": "Allow",
"Action": [
"ec2messages:AcknowledgeMessage",
"ec2messages:DeleteMessage",
"ec2messages:FailMessage",
"ec2messages:GetEndpoint",
"ec2messages:GetMessages",
"ec2messages:SendReply"
],
"Resource": "*"
},
{
"Effect": "Allow",
"Action": [
"ec2:DescribeInstanceStatus",
"ec2:DescribeInstances"
],
"Resource": [
"*"
]
}
]
}

Refer to the superset permission/policy to execute all AWS faults.

Default validations

The EC2 instance should be in healthy state.

Fault tunables

Check the Fault Tunables

Mandatory Fields

Variables Description Notes
EC2_INSTANCE_ID ID of the target EC2 instance. For example, i-044d3cb4b03b8af1f.
REGION The AWS region ID where the EC2 instance has been created. For example, us-east-1.

Optional Fields

Variables Description Notes
TOTAL_CHAOS_DURATION Duration that you specify, through which chaos is injected into the target resource (in seconds). Defaults to 30s.
CHAOS_INTERVAL Time interval between two successive instance terminations (in seconds). Defaults to 60s.
AWS_SHARED_CREDENTIALS_FILE Provide the path for aws secret credentials. Defaults to /tmp/cloud_config.yml.
INSTALL_DEPENDENCIES Select to install dependencies used to run the io chaos. It can be either True or False. If the dependency already exists, you can turn it off. Defaults to True.
FILESYSTEM_UTILIZATION_PERCENTAGE Specify the size as percentage of free space on the file system. Default to 0%, which will result in 1 GB Utilization.
FILESYSTEM_UTILIZATION_BYTES Specify the size in GigaBytes(GB). FILESYSTEM_UTILIZATION_PERCENTAGE & FILESYSTEM_UTILIZATION_BYTES are mutually exclusive. If both are provided, FILESYSTEM_UTILIZATION_PERCENTAGE is prioritized. Default to 0GB, which will result in 1 GB Utilization.
NUMBER_OF_WORKERS It is the number of IO workers involved in IO disk stress. Default to 4.
VOLUME_MOUNT_PATH Fill the given volume mount path. Defaults to the user HOME directory.
SEQUENCE It defines the sequence of chaos execution for multiple instances. Defaults to parallel. Supports serial sequence as well.
RAMP_TIME Period to wait before and after injection of chaos (in seconds). For example, 30s.

Fault examples

Fault tunables

Refer to the common attributes to tune the common tunables for all the faults.

Filesystem utilization in megabytes

It defines the filesystem value to be utilized in megabytes on the EC2 instance. You can tune it using the FILESYSTEM_UTILIZATION_BYTES environment variable.

You can tune it using the following example:

# filesystem bytes to utilize
apiVersion: litmuschaos.io/v1alpha1
kind: ChaosEngine
metadata:
name: engine-nginx
spec:
engineState: "active"
chaosServiceAccount: litmus-admin
experiments:
- name: ec2-io-stress
spec:
components:
env:
- name: FILESYSTEM_UTILIZATION_BYTES
VALUE: '1024'
# ID of the EC2 instance
- name: EC2_INSTANCE_ID
value: 'instance-1'
# region for the EC2 instance
- name: REGION
value: 'us-east-1'

Filesystem utilization in percentage

It defines the filesystem percentage to be utilized on the EC2 instance. You can tune it using the FILESYSTEM_UTILIZATION_PERCENTAGE ENV.

You can tune it using the following example:

# filesystem percentage to utilize
apiVersion: litmuschaos.io/v1alpha1
kind: ChaosEngine
metadata:
name: engine-nginx
spec:
engineState: "active"
chaosServiceAccount: litmus-admin
experiments:
- name: ec2-io-stress
spec:
components:
env:
- name: FILESYSTEM_UTILIZATION_PERCENTAGE
VALUE: '50'
# ID of the EC2 instance
- name: EC2_INSTANCE_ID
value: 'instance-1'
# region for the EC2 instance
- name: REGION
value: 'us-east-1'

Multiple workers

It defines the CPU threads to be run to spike the file system utilization, this will increase the growth of filesystem consumption. You can tune it using the NUMBER_OF_WORKERS ENV.

You can tune it using the following example:

# multiple workers to utilize resources
apiVersion: litmuschaos.io/v1alpha1
kind: ChaosEngine
metadata:
name: engine-nginx
spec:
engineState: "active"
chaosServiceAccount: litmus-admin
experiments:
- name: ec2-io-stress
spec:
components:
env:
- name: NUMBER_OF_WORKERS
VALUE: '3'
# ID of the EC2 instance
- name: EC2_INSTANCE_ID
value: 'instance-1'
# region for the EC2 instance
- name: REGION
value: 'us-east-1'

Volume mount path

It defines the volume mount path to the target attached to the EC2 instance. You can tune it using the VOLUME_MOUNT_PATH ENV.

Use the following example to tune it:

# volume path to be used for io stress
apiVersion: litmuschaos.io/v1alpha1
kind: ChaosEngine
metadata:
name: engine-nginx
spec:
engineState: "active"
chaosServiceAccount: litmus-admin
experiments:
- name: ec2-io-stress
spec:
components:
env:
- name: VOLUME_MOUNT_PATH
VALUE: '/tmp'
# ID of the EC2 instance
- name: EC2_INSTANCE_ID
value: 'instance-1'
# region for the EC2 instance
- name: REGION
value: 'us-east-1'

Multiple EC2 instances

Multiple EC2 instances can be targeted in one chaos run. You can tune it using the EC2_INSTANCE_ID ENV.

You can tune it using the following example:

# multiple instance targets
apiVersion: litmuschaos.io/v1alpha1
kind: ChaosEngine
metadata:
name: engine-nginx
spec:
engineState: "active"
chaosServiceAccount: litmus-admin
experiments:
- name: ec2-io-stress
spec:
components:
env:
# ids of the EC2 instances
- name: EC2_INSTANCE_ID
value: 'instance-1,instance-2'
# region for the EC2 instance
- name: REGION
value: 'us-east-1'