Azure Instance IO Stress
Introduction
- Azure Instance IO Stress contains chaos to disrupt the state of infra resources. The fault can induce stress chaos on Azure Instance using Azure Run Command, this is carried out by using bash scripts which are in-built in the fault for the given chaos scenario.
- It causes IO Stress chaos on Azure Instance using an bash script for a certain chaos duration.
Uses
Uses of the fault
- Filesystem read and write is another very common and frequent scenario we find with processes/applications that can result in the impact on its delivery. These problems are generally referred to as "Noisy Neighbour" problems.
- Injecting a rogue process into a target Azure instance, we starve the main processes/applications (typically pid 1) of the resources allocated to it (where limits are defined) causing slowness in application traffic or in other cases unrestrained use can cause instance to exhaust resources leading to degradation in performance of processes/applications present on the instance. So this category of chaos fault helps to build the immunity on the application undergoing any such stress scenario.
Prerequisites
Verify the prerequisites
- Ensure that Kubernetes Version >= 1.17
Azure Access Requirement:
- Ensure that Azure Run Command agent is installed and running in the target Azure instance.
- We will use Azure file-based authentication to connect with the instance using Azure GO SDK in the fault. For generating auth file run
az ad sp create-for-rbac --sdk-auth > azure.auth
Azure CLI command. - Ensure to create a Kubernetes secret having the auth file created in the step in
CHAOS_NAMESPACE
. A sample secret file looks like:
apiVersion: v1
kind: Secret
metadata:
name: cloud-secret
type: Opaque
stringData:
azure.auth: |-
{
"clientId": "XXXXXXXXX",
"clientSecret": "XXXXXXXXX",
"subscriptionId": "XXXXXXXXX",
"tenantId": "XXXXXXXXX",
"activeDirectoryEndpointUrl": "XXXXXXXXX",
"resourceManagerEndpointUrl": "XXXXXXXXX",
"activeDirectoryGraphResourceId": "XXXXXXXXX",
"sqlManagementEndpointUrl": "XXXXXXXXX",
"galleryEndpointUrl": "XXXXXXXXX",
"managementEndpointUrl": "XXXXXXXXX"
}
- If you change the secret key name (from
azure.auth
) please also update theAZURE_AUTH_LOCATION
ENV value in the ChaosExperiment CR with the same name.
Default Validations
- Azure instance should be in healthy state.
Fault Tunables
Check the Fault Tunables
Mandatory Fields
Variables | Description | Notes |
---|---|---|
AZURE_INSTANCE_NAMES | Names of the target Azure instances | Multiple values can be provided as comma-separated string. Eg: instance-1,instance-2 |
RESOURCE_GROUP | The Azure Resource Group name where the instances has been created | All the instances must be from the same resource group |
Optional Fields
Variables | Description | Notes |
---|---|---|
TOTAL_CHAOS_DURATION | The total time duration for chaos injection (sec) | Defaults to 30s |
CHAOS_INTERVAL | The interval (in sec) between successive chaos injection | Defaults to 60s |
AZURE_AUTH_LOCATION | Provide the name of the Azure secret credentials files | Defaults to azure.auth |
SCALE_SET | Whether the Instance are part of ScaleSet or not. It can be either disable or enable | Defaults to disable |
INSTALL_DEPENDENCIES | Select to install dependencies used to run the io chaos. It can be either True or False | Defaults to True |
FILESYSTEM_UTILIZATION_PERCENTAGE | Specify the size as percentage of free space on the file system | Default to 0%, which will result in 1 GB Utilization |
FILESYSTEM_UTILIZATION_BYTES | Specify the size in GigaBytes(GB). FILESYSTEM_UTILIZATION_PERCENTAGE & FILESYSTEM_UTILIZATION_BYTES are mutually exclusive. If both are provided, FILESYSTEM_UTILIZATION_PERCENTAGE is prioritized. | Default to 0GB, which will result in 1 GB Utilization |
NUMBER_OF_WORKERS | It is the number of IO workers involved in IO disk stress | Default to 4 |
VOLUME_MOUNT_PATH | Fill the given volume mount path | Defaults to the user HOME directory |
SEQUENCE | It defines sequence of chaos execution for multiple instance | Default value: parallel. Supported: serial, parallel |
RAMP_TIME | Period to wait before and after injection of chaos in sec | Eg: 30 |
Fault Examples
Common Fault Tunables
Refer the common attributes to tune the common tunables for all the faults.
FILESYSTEM UTILIZATION IN MEGABYTES
It defines the filesytem value to be utilised in megabytes on the Azure instance. It can be tuned via FILESYSTEM_UTILIZATION_BYTES
ENV.
Use the following example to tune this:
# filesystem bytes to utilize
apiVersion: litmuschaos.io/v1alpha1
kind: ChaosEngine
metadata:
name: engine-nginx
spec:
engineState: "active"
chaosServiceAccount: litmus-admin
experiments:
- name: azure-instance-io-stress
spec:
components:
env:
- name: FILESYSTEM_UTILIZATION_BYTES
VALUE: '1024'
# name of the Azure instance
- name: AZURE_INSTANCE_NAMES
value: 'instance-1'
# resource group for the Azure instance
- name: RESOURCE_GROUP
value: 'rg-azure'
FILESYSTEM UTILIZATION IN PERCENTAGE
It defines the filesytem percentage to be utilised on the Azure instance. It can be tuned via FILESYSTEM_UTILIZATION_PERCENTAGE
ENV.
Use the following example to tune this:
# filesystem percentage to utilize
apiVersion: litmuschaos.io/v1alpha1
kind: ChaosEngine
metadata:
name: engine-nginx
spec:
engineState: "active"
chaosServiceAccount: litmus-admin
experiments:
- name: azure-instance-io-stress
spec:
components:
env:
- name: FILESYSTEM_UTILIZATION_PERCENTAGE
VALUE: '50'
# name of the Azure instance
- name: AZURE_INSTANCE_NAMES
value: 'instance-1'
# resource group for the Azure instance
- name: RESOURCE_GROUP
value: 'rg-azure'
MULTIPLE WORKERS
It defines the CPU threads to be run to spike the filesystem utilisation, this will increase the growth of filesystem consumption. It can be tuned via NUMBER_OF_WORKERS
ENV.
Use the following example to tune this:
# multiple workers to utilize resources
apiVersion: litmuschaos.io/v1alpha1
kind: ChaosEngine
metadata:
name: engine-nginx
spec:
engineState: "active"
chaosServiceAccount: litmus-admin
experiments:
- name: azure-instance-io-stress
spec:
components:
env:
- name: NUMBER_OF_WORKERS
VALUE: '3'
# name of the Azure instance
- name: AZURE_INSTANCE_NAMES
value: 'instance-1'
# resource group for the Azure instance
- name: RESOURCE_GROUP
value: 'rg-azure'
VOLUME MOUNT PATH
It defines volume mount path to target attached to the Azure instance. It can be tuned via VOLUME_MOUNT_PATH
ENV.
Use the following example to tune this:
# volume path to be used for io stress
apiVersion: litmuschaos.io/v1alpha1
kind: ChaosEngine
metadata:
name: engine-nginx
spec:
engineState: "active"
chaosServiceAccount: litmus-admin
experiments:
- name: azure-instance-io-stress
spec:
components:
env:
- name: VOLUME_MOUNT_PATH
VALUE: '/tmp'
# name of the Azure instance
- name: AZURE_INSTANCE_NAMES
value: 'instance-1'
# resource group for the Azure instance
- name: RESOURCE_GROUP
value: 'rg-azure'
MULTIPLE Azure INSTANCES
Multiple Azure instances can be targeted in one chaos run. It can be tuned via AZURE_INSTANCE_NAMES
ENV.
Use the following example to tune this:
# mutilple instance targets
apiVersion: litmuschaos.io/v1alpha1
kind: ChaosEngine
metadata:
name: engine-nginx
spec:
engineState: "active"
chaosServiceAccount: litmus-admin
experiments:
- name: azure-instance-io-stress
spec:
components:
env:
- name: MEMORY_CONSUMPTION
VALUE: '1024'
# names of the Azure instance
- name: AZURE_INSTANCE_NAMES
value: 'instance-1,instance-2'
# resource group for the Azure instance
- name: RESOURCE_GROUP
value: 'rg-azure'