Skip to main content

Azure instance IO stress

Azure instance I/O stress disrupts the state of infra resources.

  • This fault induces stress on the Azure instance using the Azure Run command. The Azure Run command is executed using the in-built bash scripts within the fault.
  • It causes I/O stress on the Azure Instance using the bash script for a specific duration.

Azure Instances IO Stress

Use cases

Azure instance I/O stress:

  • Determines the resilience of an Azure instance when unexpected stress is applied on the I/O sources.
  • Determines how Azure scales the resources to maintain the application under stress.
  • Simulates slower disk operations by the application.
  • Simulates noisy neighbour problems by hogging the disk bandwidth.
  • Verifies the disk performance on increasing I/O threads and varying I/O block sizes.
  • Checks whether or not the application functions under high disk latency conditions.
  • Checks whether or not the application functions under high I/O traffic, and large I/O blocks.
  • Checks if other services monopolize the I/O disks during stress.

Prerequisites

  • Kubernetes >= 1.17
  • Azure Run Command agent is installed and running in the target Azure instance.
  • Azure instance should be in a healthy state.
  • Use Azure file-based authentication to connect to the instance using Azure GO SDK. to generate the auth file, run az ad sp create-for-rbac --sdk-auth > azure.auth Azure CLI command.
  • Kubernetes secret should contain the auth file created in the previous step in the CHAOS_NAMESPACE. Below is a sample secret file:
apiVersion: v1
kind: Secret
metadata:
name: cloud-secret
type: Opaque
stringData:
azure.auth: |-
{
"clientId": "XXXXXXXXX",
"clientSecret": "XXXXXXXXX",
"subscriptionId": "XXXXXXXXX",
"tenantId": "XXXXXXXXX",
"activeDirectoryEndpointUrl": "XXXXXXXXX",
"resourceManagerEndpointUrl": "XXXXXXXXX",
"activeDirectoryGraphResourceId": "XXXXXXXXX",
"sqlManagementEndpointUrl": "XXXXXXXXX",
"galleryEndpointUrl": "XXXXXXXXX",
"managementEndpointUrl": "XXXXXXXXX"
}
tip

If you change the secret key name from azure.auth to a new name, ensure that you update the AZURE_AUTH_LOCATION environment variable in the chaos experiment with the new name.

Mandatory tunables

Tunable Description Notes
AZURE_INSTANCE_NAMES Names of the target Azure instances. Multiple values can be provided as a comma-separated string. For example, instance-1,instance-2. For more information, go to stop instance by name.
RESOURCE_GROUP The Azure Resource Group name where the instances will be created. All the instances must be from the same resource group. For more information, go to resource group field in the YAML file.

Optional tunables

Tunable Description Notes
TOTAL_CHAOS_DURATION Duration that you specify, through which chaos is injected into the target resource (in seconds). Defaults to 30s. For more information, go to duration of the chaos.
CHAOS_INTERVAL Time interval between two successive container kills (in seconds). Defaults to 60s. For more information, go to chaos interval.
AZURE_AUTH_LOCATION Name of the Azure secret credentials files. Defaults to azure.auth.
SCALE_SET Check if the instance is a part of Scale Set. Defaults to disable. Also supports enable. For more information, go to scale set instances.
INSTALL_DEPENDENCIES Install dependencies to run I/O stress. Defaults to true. Also supports false.
FILESYSTEM_UTILIZATION_PERCENTAGE Specify the size as a percentage of free space on the file system. Defaults to 0 %, which results in 1 GB utilization. For more information, go to file system utilization in percentage.
FILESYSTEM_UTILIZATION_BYTES Specify the size of the files used per worker (in GB). FILESYSTEM_UTILIZATION_PERCENTAGE and FILESYSTEM_UTILIZATION_BYTES are mutually exclusive. If both are specified, FILESYSTEM_UTILIZATION_PERCENTAGE takes precedence. Defaults to 0 GB, which results in 1 GB utilization. For more information, go to file system utilization in gigabytes.
NUMBER_OF_WORKERS Number of I/O workers involved in I/O disk stress. Default to 4. For more information, go to multiple workers.
VOLUME_MOUNT_PATH Location that points to the volume mount path used in I/O stress. Defaults to the user HOME directory. For more information, go to volume mount path.
DEFAULT_HEALTH_CHECK Determines if you wish to run the default health check which is present inside the fault. Default: 'true'. For more information, go to default health check.
SEQUENCE Sequence of chaos execution for multiple target pods. Defaults to parallel. Also supports serial sequence. For more information, go to sequence of chaos execution.
RAMP_TIME Period to wait before and after injecting chaos (in seconds). For example, 30s. For more information, go to ramp time.

File system utilization in gigabytes

It specifies the size of file utilised by the Azure instance (in gigabytes). Tune it by using the FILESYSTEM_UTILIZATION_BYTES environment variable.

Use the following example to tune it:

# filesystem bytes to utilize
apiVersion: litmuschaos.io/v1alpha1
kind: ChaosEngine
metadata:
name: engine-nginx
spec:
engineState: "active"
chaosServiceAccount: litmus-admin
experiments:
- name: azure-instance-io-stress
spec:
components:
env:
- name: FILESYSTEM_UTILIZATION_BYTES
VALUE: '1024'
# name of the Azure instance
- name: AZURE_INSTANCE_NAMES
value: 'instance-1'
# resource group for the Azure instance
- name: RESOURCE_GROUP
value: 'rg-azure'

File system utilization in percentage

It specifies the size of files utilised on the Azure instance (in percentage). Tune it by using the FILESYSTEM_UTILIZATION_PERCENTAGE environment variable.

Use the following example to tune it:

# filesystem percentage to utilize
apiVersion: litmuschaos.io/v1alpha1
kind: ChaosEngine
metadata:
name: engine-nginx
spec:
engineState: "active"
chaosServiceAccount: litmus-admin
experiments:
- name: azure-instance-io-stress
spec:
components:
env:
- name: FILESYSTEM_UTILIZATION_PERCENTAGE
VALUE: '50'
# name of the Azure instance
- name: AZURE_INSTANCE_NAMES
value: 'instance-1'
# resource group for the Azure instance
- name: RESOURCE_GROUP
value: 'rg-azure'

Multiple workers

It specifies the CPU threads that will be run to spike the file system utilisation. As a consequence, it increases file system consumption. Tune it by using the NUMBER_OF_WORKERS environment variable.

Use the following example to tune it:

# multiple workers to utilize resources
apiVersion: litmuschaos.io/v1alpha1
kind: ChaosEngine
metadata:
name: engine-nginx
spec:
engineState: "active"
chaosServiceAccount: litmus-admin
experiments:
- name: azure-instance-io-stress
spec:
components:
env:
- name: NUMBER_OF_WORKERS
VALUE: '3'
# name of the Azure instance
- name: AZURE_INSTANCE_NAMES
value: 'instance-1'
# resource group for the Azure instance
- name: RESOURCE_GROUP
value: 'rg-azure'

Volume mount path

It specifies the location that points to the volume mount path used in I/O stress with respect to the Azure instance. Tune it by using the VOLUME_MOUNT_PATH environment variable.

Use the following example to tune it:

# volume path to be used for io stress
apiVersion: litmuschaos.io/v1alpha1
kind: ChaosEngine
metadata:
name: engine-nginx
spec:
engineState: "active"
chaosServiceAccount: litmus-admin
experiments:
- name: azure-instance-io-stress
spec:
components:
env:
- name: VOLUME_MOUNT_PATH
VALUE: '/tmp'
# name of the Azure instance
- name: AZURE_INSTANCE_NAMES
value: 'instance-1'
# resource group for the Azure instance
- name: RESOURCE_GROUP
value: 'rg-azure'

Multiple Azure instances

It specifies comma-separated Azure instance names that are subject to chaos in a single run. Tune it by using the AZURE_INSTANCE_NAMES environment variable.

Use the following example to tune it:

# mutilple instance targets
apiVersion: litmuschaos.io/v1alpha1
kind: ChaosEngine
metadata:
name: engine-nginx
spec:
engineState: "active"
chaosServiceAccount: litmus-admin
experiments:
- name: azure-instance-io-stress
spec:
components:
env:
- name: MEMORY_CONSUMPTION
VALUE: '1024'
# names of the Azure instance
- name: AZURE_INSTANCE_NAMES
value: 'instance-1,instance-2'
# resource group for the Azure instance
- name: RESOURCE_GROUP
value: 'rg-azure'