Azure instance IO stress
Azure instance I/O stress disrupts the state of infra resources.
- This fault induces stress on the Azure instance using the Azure
Run
command. The AzureRun
command is executed using the in-built bash scripts within the fault. - It causes I/O stress on the Azure Instance using the bash script for a specific duration.
Use cases
Azure instance I/O stress:
- Determines the resilience of an Azure instance when unexpected stress is applied on the I/O sources.
- Determines how Azure scales the resources to maintain the application under stress.
- Simulates slower disk operations by the application.
- Simulates noisy neighbour problems by hogging the disk bandwidth.
- Verifies the disk performance on increasing I/O threads and varying I/O block sizes.
- Checks whether or not the application functions under high disk latency conditions.
- Checks whether or not the application functions under high I/O traffic, and large I/O blocks.
- Checks if other services monopolize the I/O disks during stress.
Prerequisites
- Kubernetes >= 1.17
- Azure Run Command agent is installed and running in the target Azure instance.
- Azure instance should be in a healthy state.
- Use Azure file-based authentication to connect to the instance using Azure GO SDK. to generate the auth file, run
az ad sp create-for-rbac --sdk-auth > azure.auth
Azure CLI command. - Kubernetes secret should contain the auth file created in the previous step in the
CHAOS_NAMESPACE
. Below is a sample secret file:
apiVersion: v1
kind: Secret
metadata:
name: cloud-secret
type: Opaque
stringData:
azure.auth: |-
{
"clientId": "XXXXXXXXX",
"clientSecret": "XXXXXXXXX",
"subscriptionId": "XXXXXXXXX",
"tenantId": "XXXXXXXXX",
"activeDirectoryEndpointUrl": "XXXXXXXXX",
"resourceManagerEndpointUrl": "XXXXXXXXX",
"activeDirectoryGraphResourceId": "XXXXXXXXX",
"sqlManagementEndpointUrl": "XXXXXXXXX",
"galleryEndpointUrl": "XXXXXXXXX",
"managementEndpointUrl": "XXXXXXXXX"
}
If you change the secret key name from azure.auth
to a new name, ensure that you update the AZURE_AUTH_LOCATION
environment variable in the chaos experiment with the new name.
Mandatory tunables
Tunable | Description | Notes |
---|---|---|
AZURE_INSTANCE_NAMES | Names of the target Azure instances. | Multiple values can be provided as a comma-separated string. For example, instance-1,instance-2 . For more information, go to stop instance by name. |
RESOURCE_GROUP | The Azure Resource Group name where the instances will be created. | All the instances must be from the same resource group. For more information, go to resource group field in the YAML file. |
Optional tunables
Tunable | Description | Notes |
---|---|---|
TOTAL_CHAOS_DURATION | Duration that you specify, through which chaos is injected into the target resource (in seconds). | Defaults to 30s. For more information, go to duration of the chaos. |
CHAOS_INTERVAL | Time interval between two successive container kills (in seconds). | Defaults to 60s. For more information, go to chaos interval. |
AZURE_AUTH_LOCATION | Name of the Azure secret credentials files. | Defaults to azure.auth . |
SCALE_SET | Check if the instance is a part of Scale Set. | Defaults to disable . Also supports enable . For more information, go to scale set instances. |
INSTALL_DEPENDENCIES | Install dependencies to run I/O stress. | Defaults to true . Also supports false . |
FILESYSTEM_UTILIZATION_PERCENTAGE | Specify the size as a percentage of free space on the file system. | Defaults to 0 %, which results in 1 GB utilization. For more information, go to file system utilization in percentage. |
FILESYSTEM_UTILIZATION_BYTES | Specify the size of the files used per worker (in GB). FILESYSTEM_UTILIZATION_PERCENTAGE and FILESYSTEM_UTILIZATION_BYTES are mutually exclusive. If both are specified, FILESYSTEM_UTILIZATION_PERCENTAGE takes precedence. | Defaults to 0 GB, which results in 1 GB utilization. For more information, go to file system utilization in gigabytes. |
NUMBER_OF_WORKERS | Number of I/O workers involved in I/O disk stress. | Default to 4. For more information, go to multiple workers. |
VOLUME_MOUNT_PATH | Location that points to the volume mount path used in I/O stress. | Defaults to the user HOME directory. For more information, go to volume mount path. |
DEFAULT_HEALTH_CHECK | Determines if you wish to run the default health check which is present inside the fault. | Default: 'true'. For more information, go to default health check. |
SEQUENCE | Sequence of chaos execution for multiple target pods. | Defaults to parallel . Also supports serial sequence. For more information, go to sequence of chaos execution. |
RAMP_TIME | Period to wait before and after injecting chaos (in seconds). | For example, 30s. For more information, go to ramp time. |
File system utilization in gigabytes
It specifies the size of file utilised by the Azure instance (in gigabytes). Tune it by using the FILESYSTEM_UTILIZATION_BYTES
environment variable.
Use the following example to tune it:
# filesystem bytes to utilize
apiVersion: litmuschaos.io/v1alpha1
kind: ChaosEngine
metadata:
name: engine-nginx
spec:
engineState: "active"
chaosServiceAccount: litmus-admin
experiments:
- name: azure-instance-io-stress
spec:
components:
env:
- name: FILESYSTEM_UTILIZATION_BYTES
VALUE: '1024'
# name of the Azure instance
- name: AZURE_INSTANCE_NAMES
value: 'instance-1'
# resource group for the Azure instance
- name: RESOURCE_GROUP
value: 'rg-azure'
File system utilization in percentage
It specifies the size of files utilised on the Azure instance (in percentage). Tune it by using the FILESYSTEM_UTILIZATION_PERCENTAGE
environment variable.
Use the following example to tune it:
# filesystem percentage to utilize
apiVersion: litmuschaos.io/v1alpha1
kind: ChaosEngine
metadata:
name: engine-nginx
spec:
engineState: "active"
chaosServiceAccount: litmus-admin
experiments:
- name: azure-instance-io-stress
spec:
components:
env:
- name: FILESYSTEM_UTILIZATION_PERCENTAGE
VALUE: '50'
# name of the Azure instance
- name: AZURE_INSTANCE_NAMES
value: 'instance-1'
# resource group for the Azure instance
- name: RESOURCE_GROUP
value: 'rg-azure'
Multiple workers
It specifies the CPU threads that will be run to spike the file system utilisation. As a consequence, it increases file system consumption. Tune it by using the NUMBER_OF_WORKERS
environment variable.
Use the following example to tune it:
# multiple workers to utilize resources
apiVersion: litmuschaos.io/v1alpha1
kind: ChaosEngine
metadata:
name: engine-nginx
spec:
engineState: "active"
chaosServiceAccount: litmus-admin
experiments:
- name: azure-instance-io-stress
spec:
components:
env:
- name: NUMBER_OF_WORKERS
VALUE: '3'
# name of the Azure instance
- name: AZURE_INSTANCE_NAMES
value: 'instance-1'
# resource group for the Azure instance
- name: RESOURCE_GROUP
value: 'rg-azure'
Volume mount path
It specifies the location that points to the volume mount path used in I/O stress with respect to the Azure instance. Tune it by using the VOLUME_MOUNT_PATH
environment variable.
Use the following example to tune it:
# volume path to be used for io stress
apiVersion: litmuschaos.io/v1alpha1
kind: ChaosEngine
metadata:
name: engine-nginx
spec:
engineState: "active"
chaosServiceAccount: litmus-admin
experiments:
- name: azure-instance-io-stress
spec:
components:
env:
- name: VOLUME_MOUNT_PATH
VALUE: '/tmp'
# name of the Azure instance
- name: AZURE_INSTANCE_NAMES
value: 'instance-1'
# resource group for the Azure instance
- name: RESOURCE_GROUP
value: 'rg-azure'
Multiple Azure instances
It specifies comma-separated Azure instance names that are subject to chaos in a single run. Tune it by using the AZURE_INSTANCE_NAMES
environment variable.
Use the following example to tune it:
# mutilple instance targets
apiVersion: litmuschaos.io/v1alpha1
kind: ChaosEngine
metadata:
name: engine-nginx
spec:
engineState: "active"
chaosServiceAccount: litmus-admin
experiments:
- name: azure-instance-io-stress
spec:
components:
env:
- name: MEMORY_CONSUMPTION
VALUE: '1024'
# names of the Azure instance
- name: AZURE_INSTANCE_NAMES
value: 'instance-1,instance-2'
# resource group for the Azure instance
- name: RESOURCE_GROUP
value: 'rg-azure'