Skip to main content

EC2 CPU hog

EC2 CPU hog disrupts the state of infrastructure resources. It induces stress on the AWS ECS container using Amazon SSM Run command, which is carried out using SSM docs which is in-built into the fault.

  • It causes CPU chaos on the containers of the ECS task using the given CLUSTER_NAME environment variable for a specific duration.

EC2 CPU Hog

Usage

View fault usage
The fault causes CPU stress on the target AWS EC2 instance(s). It simulates the situation of lack of CPU for processes running on the application, which degrades their performance. It also helps verify metrics-based horizontal pod autoscaling as well as vertical autoscale, i.e. demand based CPU addition. It helps scalability of nodes based on growth beyond budgeted pods. It verifies the autopilot functionality of (cloud) managed clusters. Injecting a rogue process into the target EC2 instance starves the main processes (or applications) (typically pid 1) of the resources allocated to it. This slows down the application traffic or exhausts the resources leading to degradation in performance of processes on the instance. These faults build resilience to such stress cases.

Prerequisites

  • Kubernetes >= 1.17
  • SSM agent is installed and running on the target EC2 instance.
  • Create a Kubernetes secret that has the AWS Access Key ID and Secret Access Key credentials in the CHAOS_NAMESPACE. A sample secret file looks like:
apiVersion: v1
kind: Secret
metadata:
name: cloud-secret
type: Opaque
stringData:
cloud_config.yml: |-
# Add the cloud AWS credentials respectively
[default]
aws_access_key_id = XXXXXXXXXXXXXXXXXXX
aws_secret_access_key = XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
  • If you change the secret key name (from experiment.yml), ensure that you update the AWS_SHARED_CREDENTIALS_FILE environment variable in the chaos experiment with the new name.

Permissions required

Here is an example AWS policy to execute the fault.

View policy for the fault
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"ssm:GetDocument",
"ssm:DescribeDocument",
"ssm:GetParameter",
"ssm:GetParameters",
"ssm:SendCommand",
"ssm:CancelCommand",
"ssm:CreateDocument",
"ssm:DeleteDocument",
"ssm:GetCommandInvocation",
"ssm:UpdateInstanceInformation",
"ssm:DescribeInstanceInformation"
],
"Resource": "*"
},
{
"Effect": "Allow",
"Action": [
"ec2messages:AcknowledgeMessage",
"ec2messages:DeleteMessage",
"ec2messages:FailMessage",
"ec2messages:GetEndpoint",
"ec2messages:GetMessages",
"ec2messages:SendReply"
],
"Resource": "*"
},
{
"Effect": "Allow",
"Action": [
"ec2:DescribeInstanceStatus",
"ec2:DescribeInstances"
],
"Resource": [
"*"
]
}
]
}

Refer to the superset permission/policy to execute all AWS faults.

Default validations

The EC2 instance should be in a healthy state.

Fault tunables

Fault tunables

Mandatory fields

Variables Description Notes
EC2_INSTANCE_ID ID of the target EC2 instance For example: i-044d3cb4b03b8af1f
REGION The AWS region ID where the EC2 instance has been created For example: us-east-1

Optional fields

Variables Description Notes
TOTAL_CHAOS_DURATION The total time duration for chaos injection (sec) Defaults to 30s
CHAOS_INTERVAL The interval (in sec) between successive chaos injection Defaults to 60s
AWS_SHARED_CREDENTIALS_FILE Provide the path for aws secret credentials Defaults to /tmp/cloud_config.yml
INSTALL_DEPENDENCIES Select to install dependencies used to run the CPU chaos. It can be either True or False Defaults to True
CPU_CORE Provide the number of CPU cores to consume Defaults to 0
CPU_LOAD Provide the percentage of a single CPU core to be consumed Defaults to 100
SEQUENCE It defines sequence of chaos execution for multiple instance Default value: parallel. Supported: serial, parallel
RAMP_TIME Period to wait before and after injecting chaos (in seconds). For example, 30s.

Fault examples

Fault tunables

Refer to the common attributes to tune the common tunables for all the faults.

CPU core

It defines the CPU core value to be utilised on the EC2 instance. You can tune it using the CPU_CORE environment variable.

Use the following example to tune it:

# CPU cores to utilize
apiVersion: litmuschaos.io/v1alpha1
kind: ChaosEngine
metadata:
name: engine-nginx
spec:
engineState: "active"
chaosServiceAccount: litmus-admin
experiments:
- name: ec2-cpu-hog
spec:
components:
env:
- name: CPU_CORE
VALUE: '2'
# ID of the EC2 instance
- name: EC2_INSTANCE_ID
value: 'instance-1'
# region for the EC2 instance
- name: REGION
value: 'us-east-1'

CPU percentage

It defines the CPU percentage value to be utilised on the EC2 instance. You can tune it using the CPU_LOAD environment variable.

Use the following example to tune it:

# CPU percentage to utilize
apiVersion: litmuschaos.io/v1alpha1
kind: ChaosEngine
metadata:
name: engine-nginx
spec:
engineState: "active"
chaosServiceAccount: litmus-admin
experiments:
- name: ec2-cpu-hog
spec:
components:
env:
- name: CPU_LOAD
VALUE: '50'
# ID of the EC2 instance
- name: EC2_INSTANCE_ID
value: 'instance-1'
# region for the EC2 instance
- name: REGION
value: 'us-east-1'

Multiple EC2 instances

Multiple EC2 instances can be targeted in one chaos run. You can tune it using the EC2_INSTANCE_ID environment variable.

Use the following example to tune it:

# mutilple instance targets
apiVersion: litmuschaos.io/v1alpha1
kind: ChaosEngine
metadata:
name: engine-nginx
spec:
engineState: "active"
chaosServiceAccount: litmus-admin
experiments:
- name: ec2-cpu-hog
spec:
components:
env:
# ids of the EC2 instances
- name: EC2_INSTANCE_ID
value: 'instance-1,instance-2,instance-3'
# region for the EC2 instance
- name: REGION
value: 'us-east-1'

CPU core with percentage consumption

It defines how many CPU cores to utilise with percentage of utilisation on the EC2 instance. You can tune it using the CPU_CORE and CPU_LOAD environment variables, respectively.

Use the following example to tune it:

# CPU core with percentage to utilize
apiVersion: litmuschaos.io/v1alpha1
kind: ChaosEngine
metadata:
name: engine-nginx
spec:
engineState: "active"
chaosServiceAccount: litmus-admin
experiments:
- name: ec2-cpu-hog
spec:
components:
env:
- name: CPU_CORE
VALUE: '2'
- name: CPU_LOAD
VALUE: '50'
# ID of the EC2 instance
- name: EC2_INSTANCE_ID
value: 'instance-1'
# region for the EC2 instance
- name: REGION
value: 'us-east-1'