Azure instance CPU hog

Last updated on Jul 9, 2025

Azure instance CPU hog disrupts the state of infrastructure resources.

It induces stress on the Azure instance using the Azure Run command. The Azure Run command is executed using the in-built bash scripts within the fault.
It utilizes excess amounts of CPU on the Azure instance using the bash script for a specific duration.

Azure Instance CPU Hog

Use cases

Azure instance CPU hog:

Determines the resilience of an Azure instance and the application deployed on the instance during unexpected excessive utilization of the CPU resources.
Determines how Azure scales the CPU resources to maintain the application when it is under stress.
Causes CPU stress on the Azure instance(s).
Simulates the situation of lack of CPU for processes running on the application, which degrades their performance.
Verifies metrics-based horizontal pod autoscaling.
Verifies vertical autoscale, that is, demand based CPU addition.
Facilitates the scalability of nodes based on growth beyond budgeted pods.
Verifies the autopilot functionality of cloud managed clusters.
Verifies multi-tenant load issues. When the load on one container increases, the fault checks for any downtime in other containers.

Prerequisites

Kubernetes >= 1.17
Azure Run Command agent should be installed and running in the target Azure instance.
Azure disk should be in a healthy state.
Use Azure file-based authentication to connect to the instance using Azure GO SDK. To generate the auth file, run az ad sp create-for-rbac --sdk-auth > azure.auth Azure CLI command.
Kubernetes secret should contain the auth file created in the previous step in the CHAOS_NAMESPACE. Below is a sample secret file:

apiVersion: v1
kind: Secret
metadata:
  name: cloud-secret
type: Opaque
stringData:
  azure.auth: |-
    {
      "clientId": "XXXXXXXXX",
      "clientSecret": "XXXXXXXXX",
      "subscriptionId": "XXXXXXXXX",
      "tenantId": "XXXXXXXXX",
      "activeDirectoryEndpointUrl": "XXXXXXXXX",
      "resourceManagerEndpointUrl": "XXXXXXXXX",
      "activeDirectoryGraphResourceId": "XXXXXXXXX",
      "sqlManagementEndpointUrl": "XXXXXXXXX",
      "galleryEndpointUrl": "XXXXXXXXX",
      "managementEndpointUrl": "XXXXXXXXX"
    }

tip

If you change the secret key name from azure.auth to a new name, ensure that you update the AZURE_AUTH_LOCATION environment variable in the chaos experiment with the new name.

Mandatory tunables

Tunable	Description	Notes
AZURE_INSTANCE_NAMES	Names of the target Azure instances.	Multiple values can be provided as comma-separated strings. For example, `instance-1,instance-2. For more information, go to stop instances by name.`
RESOURCE_GROUP	The Azure Resource Group name where the instances will be created.	All the instances must be from the same resource group. For more information, go to resource group field in the YAML file.

Optional tunables

Tunable	Description	Notes
TOTAL_CHAOS_DURATION	Duration that you specify, through which chaos is injected into the target resource (in seconds).	Defaults to 30s. For more information, go to duration of the chaos.
CHAOS_INTERVAL	Time interval between two successive container kills (in seconds).	Defaults to 60s. For more information, go to chaos interval.
AZURE_AUTH_LOCATION	Name of the Azure secret credential files.	Defaults to `azure.auth`.
SCALE_SET	Check if the instance is a part of Scale Set.	Defaults to `disable`. Also supports `enable`. For more information, go to scale set instances.
INSTALL_DEPENDENCIES	Install dependencies to run the chaos.	Defaults to `true`. Also supports `false`.
CPU_CORES	Number of CPU cores that will be subject to stress. For more information, go to	Defaults to 0. For more information, go to CPU core.
CPU_LOAD	Percentage load exerted on a single CPU core.	Defaults to 100. For more information, go to CPU percentage.
DEFAULT_HEALTH_CHECK	Determines if you wish to run the default health check which is present inside the fault.	Default: 'true'. For more information, go to default health check.
SEQUENCE	Sequence of chaos execution for multiple target pods.	Defaults to parallel. Also supports `serial` sequence. For more information, go to sequence of chaos execution.
RAMP_TIME	Period to wait before and after injecting chaos (in seconds).	For example, 30s. For more information, go to ramp time.

CPU core

It specifies the number of CPU cores utilised on the Azure instance. Tune it by using the CPU_CORE environment variable.

Use the following example to tune it:

# CPU cores to utilize
apiVersion: litmuschaos.io/v1alpha1
kind: ChaosEngine
metadata:
  name: engine-nginx
spec:
  engineState: "active"
  chaosServiceAccount: litmus-admin
  experiments:
  - name: azure-instance-cpu-hog
    spec:
      components:
        env:
        - name: CPU_CORE
          VALUE: '2'
        # name of the Azure instance
        - name: AZURE_INSTANCE_NAMES
          value: 'instance-1'
        # resource group for the Azure instance
        - name: RESOURCE_GROUP
          value: 'rg-azure'

CPU percentage

It specifies the amount of CPU utilised (in percentage) on the Azure instance. Tune it by using the CPU_LOAD environment variable.

Use the following example to tune it:

# CPU percentage to utilize
apiVersion: litmuschaos.io/v1alpha1
kind: ChaosEngine
metadata:
  name: engine-nginx
spec:
  engineState: "active"
  chaosServiceAccount: litmus-admin
  experiments:
  - name: azure-instance-cpu-hog
    spec:
      components:
        env:
        - name: CPU_LOAD
          VALUE: '50'
        # name of the Azure instance
        - name: AZURE_INSTANCE_NAMES
          value: 'instance-1'
        # resource group for the Azure instance
        - name: RESOURCE_GROUP
          value: 'rg-azure'

Multiple Azure instances

It specifies comma-separated Azure instance names that are subject to chaos in a single run. Tune it by using the AZURE_INSTANCE_NAMES environment variable.

Use the following example to tune it:

# mutilple instance targets
apiVersion: litmuschaos.io/v1alpha1
kind: ChaosEngine
metadata:
  name: engine-nginx
spec:
  engineState: "active"
  chaosServiceAccount: litmus-admin
  experiments:
  - name: azure-instance-cpu-hog
    spec:
      components:
        env:
        # names of the Azure instance
        - name: AZURE_INSTANCE_NAMES
          value: 'instance-1,instance-2'
        # resource group for the Azure instance
        - name: RESOURCE_GROUP
          value: 'rg-azure'

CPU core with percentage consumption

It specifies the number of CPU cores utilised (in percentage) by the Azure instance. Tune it by using the CPU_CORE and CPU_LOAD environment variables, respectively.

Use the following example to tune it:

# CPU core with percentage to utilize
apiVersion: litmuschaos.io/v1alpha1
kind: ChaosEngine
metadata:
  name: engine-nginx
spec:
  engineState: "active"
  chaosServiceAccount: litmus-admin
  experiments:
  - name: azure-instance-cpu-hog
    spec:
      components:
        env:
        - name: CPU_CORE
          VALUE: '2'
        - name: CPU_LOAD
          VALUE: '50'
        # name of the Azure instance
        - name: AZURE_INSTANCE_NAMES
          value: 'instance-1'
        # resource group for the Azure instance
        - name: RESOURCE_GROUP
          value: 'rg-azure'

Use cases​

Prerequisites​

Mandatory tunables​

Optional tunables​

CPU core​

CPU percentage​

Multiple Azure instances​

CPU core with percentage consumption​

Use cases

Prerequisites

Mandatory tunables

Optional tunables

CPU core

CPU percentage

Multiple Azure instances

CPU core with percentage consumption