Azure AKS node down

Last updated on Mar 25, 2026

Azure AKS node down fault deallocates nodes in an Azure Kubernetes Service (AKS) cluster for a certain chaos duration.

It helps to check the resilience of your applications when AKS nodes become unavailable.
It targets VMSS (Virtual Machine Scale Set) instances in the AKS node pools and temporarily deallocates them.
You can filter target nodes by node pool name, availability zone, and percentage of nodes to affect.

Use cases

Azure AKS node down:

Determines the resilience of applications when AKS cluster nodes become unavailable.
Validates that workloads are properly distributed across nodes and can handle node failures gracefully.
Tests the behavior of Kubernetes scheduling and auto-scaling when nodes are deallocated.
Simulates availability zone (AZ) failures by targeting nodes in specific zones.
Verifies that critical applications have proper pod disruption budgets and replica counts.
Validates monitoring and alerting systems properly detect node failures.
Ensures that stateful applications handle node loss without data corruption.

Prerequisites

Kubernetes >= 1.17
Azure authentication configured for chaos faults. Refer to Azure authentication methods for setup instructions.
The target AKS cluster should be in a running state before chaos injection.

Required Azure permissions

The service principal needs the following permissions:

Reader role on the AKS cluster's resource group
Virtual Machine Contributor role on the AKS cluster's node resource group (auto-generated resource group containing VMSS instances)
Or custom role with these permissions:
- Microsoft.ContainerService/managedClusters/read (on AKS cluster resource group)
- Microsoft.Compute/virtualMachineScaleSets/read (on node resource group)
- Microsoft.Compute/virtualMachineScaleSets/virtualMachines/read (on node resource group)
- Microsoft.Compute/virtualMachineScaleSets/virtualMachines/deallocate/action (on node resource group)
- Microsoft.Compute/virtualMachineScaleSets/virtualMachines/powerOff/action (on node resource group - for ephemeral OS disk VMs)
- Microsoft.Compute/virtualMachineScaleSets/virtualMachines/start/action (on node resource group)

Mandatory tunables

Tunable	Description	Notes
AKS_CLUSTER_NAME	Name of the Azure Kubernetes Service (AKS) cluster.	For example, `my-aks-cluster`. For more information, go to AKS cluster name.
AKS_RESOURCE_GROUP	Resource group of the AKS cluster.	For example, `rg-aks-cluster`. For more information, go to resource group field in the YAML file.

Optional tunables

Tunable	Description	Notes
TOTAL_CHAOS_DURATION	Duration that you specify, through which chaos is injected into the target resource (in seconds).	Defaults to 30s. For more information, go to duration of the chaos.
CHAOS_INTERVAL	Time interval between successive chaos iterations (in seconds).	Defaults to 30s. For more information, go to chaos interval.
TARGET_NODE_POOL_NAMES	Comma-separated list of node pool names to target.	Empty means all node pools. For example, `nodepool1,nodepool2`. For more information, go to target node pools.
TARGET_ZONES	Comma-separated list of availability zones to target.	Empty means all zones. For example, `1,2,3`. For more information, go to target zones.
NODE_AFFECTED_PERCENTAGE	Percentage of nodes to affect.	Defaults to 0 (corresponds to 1 instance). For more information, go to node affected percentage.
SEQUENCE	Sequence of chaos execution for multiple nodes.	Defaults to `parallel`. Also supports `serial` sequence. For more information, go to sequence of chaos execution.
RAMP_TIME	Period to wait before and after injecting chaos (in seconds).	For example, 30s. For more information, go to ramp time.
DEFAULT_HEALTH_CHECK	Determines if you wish to run the default health check which is present inside the fault.	Default: 'false'. For more information, go to default health check.

Deallocate AKS nodes

It deallocates AKS cluster nodes for a specific chaos duration. Tune it by using the AKS_CLUSTER_NAME, AKS_RESOURCE_GROUP, and NODE_AFFECTED_PERCENTAGE environment variables.

Use the following example to tune it:

# deallocate AKS nodes for a certain chaos duration
apiVersion: litmuschaos.io/v1alpha1
kind: ChaosEngine
metadata:
  name: engine-nginx
spec:
  engineState: "active"
  annotationCheck: "false"
  chaosServiceAccount: litmus-admin
  experiments:
  - name: azure-aks-node-down
    spec:
      components:
        env:
        # name of the AKS cluster
        - name: AKS_CLUSTER_NAME
          value: 'my-aks-cluster'
        # resource group of the AKS cluster
        - name: AKS_RESOURCE_GROUP
          value: 'rg-aks-cluster'
        # percentage of nodes to affect
        - name: NODE_AFFECTED_PERCENTAGE
          value: '100'
        - name: TOTAL_CHAOS_DURATION
          value: '60'

Target specific node pools

It targets nodes from specific node pools in the AKS cluster. Tune it by using the TARGET_NODE_POOL_NAMES environment variable.

Use the following example to tune it:

# target specific node pools in AKS cluster
apiVersion: litmuschaos.io/v1alpha1
kind: ChaosEngine
metadata:
  name: engine-nginx
spec:
  engineState: "active"
  annotationCheck: "false"
  chaosServiceAccount: litmus-admin
  experiments:
  - name: azure-aks-node-down
    spec:
      components:
        env:
        # name of the AKS cluster
        - name: AKS_CLUSTER_NAME
          value: 'my-aks-cluster'
        # resource group of the AKS cluster
        - name: AKS_RESOURCE_GROUP
          value: 'rg-aks-cluster'
        # comma-separated list of node pool names to target
        - name: TARGET_NODE_POOL_NAMES
          value: 'nodepool1,nodepool2'
        # percentage of nodes to affect
        - name: NODE_AFFECTED_PERCENTAGE
          value: '50'
        - name: TOTAL_CHAOS_DURATION
          value: '60'

Target nodes by availability zone

It targets nodes from specific availability zones in the AKS cluster. Tune it by using the TARGET_ZONES environment variable.

Use the following example to tune it:

# target nodes in specific availability zones
apiVersion: litmuschaos.io/v1alpha1
kind: ChaosEngine
metadata:
  name: engine-nginx
spec:
  engineState: "active"
  annotationCheck: "false"
  chaosServiceAccount: litmus-admin
  experiments:
  - name: azure-aks-node-down
    spec:
      components:
        env:
        # name of the AKS cluster
        - name: AKS_CLUSTER_NAME
          value: 'my-aks-cluster'
        # resource group of the AKS cluster
        - name: AKS_RESOURCE_GROUP
          value: 'rg-aks-cluster'
        # comma-separated list of availability zones to target
        - name: TARGET_ZONES
          value: '1,2'
        # percentage of nodes to affect
        - name: NODE_AFFECTED_PERCENTAGE
          value: '50'
        - name: TOTAL_CHAOS_DURATION
          value: '60'

Node affected percentage

It specifies the percentage of nodes to be affected in the target AKS cluster. Tune it by using the NODE_AFFECTED_PERCENTAGE environment variable.

Use the following example to tune it:

# affect a specific percentage of nodes
apiVersion: litmuschaos.io/v1alpha1
kind: ChaosEngine
metadata:
  name: engine-nginx
spec:
  engineState: "active"
  annotationCheck: "false"
  chaosServiceAccount: litmus-admin
  experiments:
  - name: azure-aks-node-down
    spec:
      components:
        env:
        # name of the AKS cluster
        - name: AKS_CLUSTER_NAME
          value: 'my-aks-cluster'
        # resource group of the AKS cluster
        - name: AKS_RESOURCE_GROUP
          value: 'rg-aks-cluster'
        # percentage of nodes to affect (0-100), where 0 means 1 instance
        - name: NODE_AFFECTED_PERCENTAGE
          value: '30'
        # sequence of chaos execution
        - name: SEQUENCE
          value: 'parallel'
        - name: TOTAL_CHAOS_DURATION
          value: '60'

Use cases​

Prerequisites​

Required Azure permissions​

Mandatory tunables​

Optional tunables​

Deallocate AKS nodes​

Target specific node pools​

Target nodes by availability zone​

Node affected percentage​

Use cases

Prerequisites

Required Azure permissions

Mandatory tunables

Optional tunables

Deallocate AKS nodes

Target specific node pools

Target nodes by availability zone

Node affected percentage