Azure instance CPU hog
Azure instance CPU hog disrupts the state of infrastructure resources.
- It induces stress on the Azure instance using the Azure
Run
command. The AzureRun
command is executed using the in-built bash scripts within the fault. - It utilizes excess amounts of CPU on the Azure instance using the bash script for a specific duration.
Use cases
Azure instance CPU hog:
- Determines the resilience of an Azure instance and the application deployed on the instance during unexpected excessive utilization of the CPU resources.
- Determines how Azure scales the CPU resources to maintain the application when it is under stress.
- Causes CPU stress on the Azure instance(s).
- Simulates the situation of lack of CPU for processes running on the application, which degrades their performance.
- Verifies metrics-based horizontal pod autoscaling.
- Verifies vertical autoscale, that is, demand based CPU addition.
- Facilitates the scalability of nodes based on growth beyond budgeted pods.
- Verifies the autopilot functionality of cloud managed clusters.
- Verifies multi-tenant load issues. When the load on one container increases, the fault checks for any downtime in other containers.
Prerequisites
- Kubernetes >= 1.17
- Azure Run Command agent should be installed and running in the target Azure instance.
- Azure disk should be in a healthy state.
- Use Azure file-based authentication to connect to the instance using Azure GO SDK. To generate the auth file, run
az ad sp create-for-rbac --sdk-auth > azure.auth
Azure CLI command. - Kubernetes secret should contain the auth file created in the previous step in the
CHAOS_NAMESPACE
. Below is a sample secret file:
apiVersion: v1
kind: Secret
metadata:
name: cloud-secret
type: Opaque
stringData:
azure.auth: |-
{
"clientId": "XXXXXXXXX",
"clientSecret": "XXXXXXXXX",
"subscriptionId": "XXXXXXXXX",
"tenantId": "XXXXXXXXX",
"activeDirectoryEndpointUrl": "XXXXXXXXX",
"resourceManagerEndpointUrl": "XXXXXXXXX",
"activeDirectoryGraphResourceId": "XXXXXXXXX",
"sqlManagementEndpointUrl": "XXXXXXXXX",
"galleryEndpointUrl": "XXXXXXXXX",
"managementEndpointUrl": "XXXXXXXXX"
}
If you change the secret key name from azure.auth
to a new name, ensure that you update the AZURE_AUTH_LOCATION
environment variable in the chaos experiment with the new name.
Mandatory tunables
Tunable | Description | Notes |
---|---|---|
AZURE_INSTANCE_NAMES | Names of the target Azure instances. | Multiple values can be provided as comma-separated strings. For example, instance-1,instance-2. For more information, go to stop instances by name. |
RESOURCE_GROUP | The Azure Resource Group name where the instances will be created. | All the instances must be from the same resource group. For more information, go to resource group field in the YAML file. |
Optional tunables
Tunable | Description | Notes |
---|---|---|
TOTAL_CHAOS_DURATION | Duration that you specify, through which chaos is injected into the target resource (in seconds). | Defaults to 30s. For more information, go to duration of the chaos. |
CHAOS_INTERVAL | Time interval between two successive container kills (in seconds). | Defaults to 60s. For more information, go to chaos interval. |
AZURE_AUTH_LOCATION | Name of the Azure secret credential files. | Defaults to azure.auth . |
SCALE_SET | Check if the instance is a part of Scale Set. | Defaults to disable . Also supports enable . For more information, go to scale set instances. |
INSTALL_DEPENDENCIES | Install dependencies to run the chaos. | Defaults to true . Also supports false . |
CPU_CORES | Number of CPU cores that will be subject to stress. For more information, go to | Defaults to 0. For more information, go to CPU core. |
CPU_LOAD | Percentage load exerted on a single CPU core. | Defaults to 100. For more information, go to CPU percentage. |
DEFAULT_HEALTH_CHECK | Determines if you wish to run the default health check which is present inside the fault. | Default: 'true'. For more information, go to default health check. |
SEQUENCE | Sequence of chaos execution for multiple target pods. | Defaults to parallel. Also supports serial sequence. For more information, go to sequence of chaos execution. |
RAMP_TIME | Period to wait before and after injecting chaos (in seconds). | For example, 30s. For more information, go to ramp time. |
CPU core
It specifies the number of CPU cores utilised on the Azure instance. Tune it by using the CPU_CORE
environment variable.
Use the following example to tune it:
# CPU cores to utilize
apiVersion: litmuschaos.io/v1alpha1
kind: ChaosEngine
metadata:
name: engine-nginx
spec:
engineState: "active"
chaosServiceAccount: litmus-admin
experiments:
- name: azure-instance-cpu-hog
spec:
components:
env:
- name: CPU_CORE
VALUE: '2'
# name of the Azure instance
- name: AZURE_INSTANCE_NAMES
value: 'instance-1'
# resource group for the Azure instance
- name: RESOURCE_GROUP
value: 'rg-azure'
CPU percentage
It specifies the amount of CPU utilised (in percentage) on the Azure instance. Tune it by using the CPU_LOAD
environment variable.
Use the following example to tune it:
# CPU percentage to utilize
apiVersion: litmuschaos.io/v1alpha1
kind: ChaosEngine
metadata:
name: engine-nginx
spec:
engineState: "active"
chaosServiceAccount: litmus-admin
experiments:
- name: azure-instance-cpu-hog
spec:
components:
env:
- name: CPU_LOAD
VALUE: '50'
# name of the Azure instance
- name: AZURE_INSTANCE_NAMES
value: 'instance-1'
# resource group for the Azure instance
- name: RESOURCE_GROUP
value: 'rg-azure'
Multiple Azure instances
It specifies comma-separated Azure instance names that are subject to chaos in a single run. Tune it by using the AZURE_INSTANCE_NAMES
environment variable.
Use the following example to tune it:
# mutilple instance targets
apiVersion: litmuschaos.io/v1alpha1
kind: ChaosEngine
metadata:
name: engine-nginx
spec:
engineState: "active"
chaosServiceAccount: litmus-admin
experiments:
- name: azure-instance-cpu-hog
spec:
components:
env:
# names of the Azure instance
- name: AZURE_INSTANCE_NAMES
value: 'instance-1,instance-2'
# resource group for the Azure instance
- name: RESOURCE_GROUP
value: 'rg-azure'
CPU core with percentage consumption
It specifies the number of CPU cores utilised (in percentage) by the Azure instance. Tune it by using the CPU_CORE
and CPU_LOAD
environment variables, respectively.
Use the following example to tune it:
# CPU core with percentage to utilize
apiVersion: litmuschaos.io/v1alpha1
kind: ChaosEngine
metadata:
name: engine-nginx
spec:
engineState: "active"
chaosServiceAccount: litmus-admin
experiments:
- name: azure-instance-cpu-hog
spec:
components:
env:
- name: CPU_CORE
VALUE: '2'
- name: CPU_LOAD
VALUE: '50'
# name of the Azure instance
- name: AZURE_INSTANCE_NAMES
value: 'instance-1'
# resource group for the Azure instance
- name: RESOURCE_GROUP
value: 'rg-azure'