Skip to main content

Azure instance CPU hog

Azure instance CPU hog disrupts the state of infrastructure resources.

  • It induces stress on the Azure instance using the Azure Run command. The Azure Run command is executed using the in-built bash scripts within the fault.
  • It utilizes excess amounts of CPU on the Azure instance using the bash script for a specific duration.

Azure Instance CPU Hog

Use cases

Azure instance CPU hog:

  • Determines the resilience of an Azure instance and the application deployed on the instance during unexpected excessive utilization of the CPU resources.
  • Determines how Azure scales the CPU resources to maintain the application when it is under stress.
  • Causes CPU stress on the Azure instance(s).
  • Simulates the situation of lack of CPU for processes running on the application, which degrades their performance.
  • Verifies metrics-based horizontal pod autoscaling.
  • Verifies vertical autoscale, that is, demand based CPU addition.
  • Facilitates the scalability of nodes based on growth beyond budgeted pods.
  • Verifies the autopilot functionality of cloud managed clusters.
  • Verifies multi-tenant load issues. When the load on one container increases, the fault checks for any downtime in other containers.

Prerequisites

  • Kubernetes >= 1.17
  • Azure Run Command agent should be installed and running in the target Azure instance.
  • Azure disk should be in a healthy state.
  • Use Azure file-based authentication to connect to the instance using Azure GO SDK. To generate the auth file, run az ad sp create-for-rbac --sdk-auth > azure.auth Azure CLI command.
  • Kubernetes secret should contain the auth file created in the previous step in the CHAOS_NAMESPACE. Below is a sample secret file:
apiVersion: v1
kind: Secret
metadata:
name: cloud-secret
type: Opaque
stringData:
azure.auth: |-
{
"clientId": "XXXXXXXXX",
"clientSecret": "XXXXXXXXX",
"subscriptionId": "XXXXXXXXX",
"tenantId": "XXXXXXXXX",
"activeDirectoryEndpointUrl": "XXXXXXXXX",
"resourceManagerEndpointUrl": "XXXXXXXXX",
"activeDirectoryGraphResourceId": "XXXXXXXXX",
"sqlManagementEndpointUrl": "XXXXXXXXX",
"galleryEndpointUrl": "XXXXXXXXX",
"managementEndpointUrl": "XXXXXXXXX"
}
tip

If you change the secret key name from azure.auth to a new name, ensure that you update the AZURE_AUTH_LOCATION environment variable in the chaos experiment with the new name.

Mandatory tunables

Tunable Description Notes
AZURE_INSTANCE_NAMES Names of the target Azure instances. Multiple values can be provided as comma-separated strings. For example, instance-1,instance-2. For more information, go to stop instances by name.
RESOURCE_GROUP The Azure Resource Group name where the instances will be created. All the instances must be from the same resource group. For more information, go to resource group field in the YAML file.

Optional tunables

Tunable Description Notes
TOTAL_CHAOS_DURATION Duration that you specify, through which chaos is injected into the target resource (in seconds). Defaults to 30s. For more information, go to duration of the chaos.
CHAOS_INTERVAL Time interval between two successive container kills (in seconds). Defaults to 60s. For more information, go to chaos interval.
AZURE_AUTH_LOCATION Name of the Azure secret credential files. Defaults to azure.auth.
SCALE_SET Check if the instance is a part of Scale Set. Defaults to disable. Also supports enable. For more information, go to scale set instances.
INSTALL_DEPENDENCIES Install dependencies to run the chaos. Defaults to true. Also supports false.
CPU_CORES Number of CPU cores that will be subject to stress. For more information, go to Defaults to 0. For more information, go to CPU core.
CPU_LOAD Percentage load exerted on a single CPU core. Defaults to 100. For more information, go to CPU percentage.
DEFAULT_HEALTH_CHECK Determines if you wish to run the default health check which is present inside the fault. Default: 'true'. For more information, go to default health check.
SEQUENCE Sequence of chaos execution for multiple target pods. Defaults to parallel. Also supports serial sequence. For more information, go to sequence of chaos execution.
RAMP_TIME Period to wait before and after injecting chaos (in seconds). For example, 30s. For more information, go to ramp time.

CPU core

It specifies the number of CPU cores utilised on the Azure instance. Tune it by using the CPU_CORE environment variable.

Use the following example to tune it:

# CPU cores to utilize
apiVersion: litmuschaos.io/v1alpha1
kind: ChaosEngine
metadata:
name: engine-nginx
spec:
engineState: "active"
chaosServiceAccount: litmus-admin
experiments:
- name: azure-instance-cpu-hog
spec:
components:
env:
- name: CPU_CORE
VALUE: '2'
# name of the Azure instance
- name: AZURE_INSTANCE_NAMES
value: 'instance-1'
# resource group for the Azure instance
- name: RESOURCE_GROUP
value: 'rg-azure'

CPU percentage

It specifies the amount of CPU utilised (in percentage) on the Azure instance. Tune it by using the CPU_LOAD environment variable.

Use the following example to tune it:

# CPU percentage to utilize
apiVersion: litmuschaos.io/v1alpha1
kind: ChaosEngine
metadata:
name: engine-nginx
spec:
engineState: "active"
chaosServiceAccount: litmus-admin
experiments:
- name: azure-instance-cpu-hog
spec:
components:
env:
- name: CPU_LOAD
VALUE: '50'
# name of the Azure instance
- name: AZURE_INSTANCE_NAMES
value: 'instance-1'
# resource group for the Azure instance
- name: RESOURCE_GROUP
value: 'rg-azure'

Multiple Azure instances

It specifies comma-separated Azure instance names that are subject to chaos in a single run. Tune it by using the AZURE_INSTANCE_NAMES environment variable.

Use the following example to tune it:

# mutilple instance targets
apiVersion: litmuschaos.io/v1alpha1
kind: ChaosEngine
metadata:
name: engine-nginx
spec:
engineState: "active"
chaosServiceAccount: litmus-admin
experiments:
- name: azure-instance-cpu-hog
spec:
components:
env:
# names of the Azure instance
- name: AZURE_INSTANCE_NAMES
value: 'instance-1,instance-2'
# resource group for the Azure instance
- name: RESOURCE_GROUP
value: 'rg-azure'

CPU core with percentage consumption

It specifies the number of CPU cores utilised (in percentage) by the Azure instance. Tune it by using the CPU_CORE and CPU_LOAD environment variables, respectively.

Use the following example to tune it:

# CPU core with percentage to utilize
apiVersion: litmuschaos.io/v1alpha1
kind: ChaosEngine
metadata:
name: engine-nginx
spec:
engineState: "active"
chaosServiceAccount: litmus-admin
experiments:
- name: azure-instance-cpu-hog
spec:
components:
env:
- name: CPU_CORE
VALUE: '2'
- name: CPU_LOAD
VALUE: '50'
# name of the Azure instance
- name: AZURE_INSTANCE_NAMES
value: 'instance-1'
# resource group for the Azure instance
- name: RESOURCE_GROUP
value: 'rg-azure'