Skip to main content

Pod DNS error

Pod DNS error is a Kubernetes pod-level chaos fault that injects chaos to disrupt DNS resolution in pods. It removes access to services by blocking the DNS resolution of host names or domains.

Pod DNS Error

Use cases

Pod DNS error:

  • Determines the resilience of an application to DNS errors.
  • Determines how quickly an application can resolve the host names and recover from the failure.
  • Simulates unavailability of DNS server (loss of access to any external domain from a given microservice).
  • Simulates malfunctioning of DNS server (loss of access to specific domains from a given microservice).
  • Simulates access to the cloud provider dependencies, and access to specific third party services.

Permissions required

Below is a sample Kubernetes role that defines the permissions required to execute the fault.

apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
namespace: hce
name: pod-dns-error
spec:
definition:
scope: Cluster # Supports "Namespaced" mode too
permissions:
- apiGroups: [""]
resources: ["pods"]
verbs: ["create", "delete", "get", "list", "patch", "deletecollection", "update"]
- apiGroups: [""]
resources: ["events"]
verbs: ["create", "get", "list", "patch", "update"]
- apiGroups: [""]
resources: ["pods/log"]
verbs: ["get", "list", "watch"]
- apiGroups: [""]
resources: ["deployments, statefulsets"]
verbs: ["get", "list"]
- apiGroups: [""]
resources: ["replicasets, daemonsets"]
verbs: ["get", "list"]
- apiGroups: [""]
resources: ["chaosEngines", "chaosExperiments", "chaosResults"]
verbs: ["create", "delete", "get", "list", "patch", "update"]
- apiGroups: ["batch"]
resources: ["jobs"]
verbs: ["create", "delete", "get", "list", "deletecollection"]

Prerequisites

  • Kubernetes > 1.16
  • The application pods should be in the running state before and after injecting chaos.

Optional tunables

Tunable Description Notes
TARGET_CONTAINER Name of the container subject to DNS error. None. For more information, go to target specific container
NODE_LABEL Node label used to filter the target node if TARGET_NODE environment variable is not set. It is mutually exclusive with the TARGET_NODE environment variable. If both are provided, the fault uses TARGET_NODE. For more information, go to node label.
TOTAL_CHAOS_DURATION Duration for which to insert chaos (in seconds). Default: 60 s. For more information, go to duration of the chaos
TARGET_PODS Comma-separated list of application pod names subject to chaos. If it is not provided, it selects target pods based on provided appLabels. For more information, go to target pods
TARGET_HOSTNAMES List of the target host names or keywords. For example, '["litmuschaos","chaosnative.com"]'. If this is not provided, all host names/domains become the target. For more information, go to target host names
MATCH_SCHEME Determines whether the DNS query exactly matches one of the targets or uses any of the targets as a substring. It can be exact or substring. If this is not provided, it is set to exact. For more information, go to match scheme
TRANSACTION_PERCENTAGE Percentage of the DNS queries to be affected. It supports values in range (0,100]. It targets all requests if not provided. For more information, go to transaction percentage .
PODS_AFFECTED_PERC Percentage of total pods to target. Provide numeric values. Default: 0 (corresponds to 1 replica). For more information, go to pod affected percentage
CONTAINER_RUNTIME Container runtime interface for the cluster. Default: containerd. Supports docker, containerd and crio. For more information, go to container runtime
SOCKET_PATH Path of the docker socket file. Default: /run/containerd/containerd.sock. For more information, go to socket path
RAMP_TIME Period to wait before and after injecting chaos (in seconds). For example, 30 s. For more information, go to ramp time
LIB_IMAGE Image used to inject chaos. Default: harness/chaos-go-runner:main-latest. For more information, go to image used by the helper pod.
SEQUENCE Sequence of chaos execution for multiple target pods. Default: parallel. Supports serial and parallel. For more information, go to sequence of chaos execution

Target pods

Comma-separated names of the pods that are subject to chaos. Tune it by using the TARGET_PODS environment variable.

## contains comma-separated target pod names
apiVersion: litmuschaos.io/v1alpha1
kind: ChaosEngine
metadata:
name: engine-nginx
spec:
engineState: "active"
annotationCheck: "false"
appinfo:
appns: "default"
applabel: "app=nginx"
appkind: "deployment"
chaosServiceAccount: litmus-admin
experiments:
- name: pod-dns-error
spec:
components:
env:
## comma-separated target pod names
- name: TARGET_PODS
value: 'pod1,pod2'

Target host names

Comma-separated names of the target hosts. Tune it by using the TARGET_HOSTNAMES environment variable. If TARGET_HOSTNAMES is not provided, all host names or domains will be targeted.

The following YAML snippet illustrates the use of this environment variable:

# contains the target host names for the dns error
apiVersion: litmuschaos.io/v1alpha1
kind: ChaosEngine
metadata:
name: engine-nginx
spec:
engineState: "active"
annotationCheck: "false"
appinfo:
appns: "default"
applabel: "app=nginx"
appkind: "deployment"
chaosServiceAccount: litmus-admin
experiments:
- name: pod-dns-error
spec:
components:
env:
## comma separated list of host names
## if not provided, all hostnames/domains will be targeted
- name: TARGET_HOSTNAMES
value: '["litmuschaos","chaosnative.com"]'
- name: TOTAL_CHAOS_DURATION
value: '60'

Match scheme

Specifies whether the DNS query should exactly match one of the targets or can have any of the targets as a substring. It supports exact or substring values. Tune it by using the MATCH_SCHEME environment variable.

The following YAML snippet illustrates the use of this environment variable:

# contains match scheme for the dns error
apiVersion: litmuschaos.io/v1alpha1
kind: ChaosEngine
metadata:
name: engine-nginx
spec:
engineState: "active"
annotationCheck: "false"
appinfo:
appns: "default"
applabel: "app=nginx"
appkind: "deployment"
chaosServiceAccount: litmus-admin
experiments:
- name: pod-dns-error
spec:
components:
env:
## it supports 'exact' and 'substring' values
- name: MATCH_SCHEME
value: 'exact'
- name: TOTAL_CHAOS_DURATION
value: '60'

Transaction Percentage

The percentage of DNS queries that should be affected, with supported values in the range (0, 100]. If not specified, the fault targets all queries. Tune it by using the TRANSACTION_PERCENTAGE environment variable.

The following YAML snippet illustrates the use of this environment variable:

# contains transaction percentage for the dns error
apiVersion: litmuschaos.io/v1alpha1
kind: ChaosEngine
metadata:
name: engine-nginx
spec:
engineState: "active"
annotationCheck: "false"
appinfo:
appns: "default"
applabel: "app=nginx"
appkind: "deployment"
chaosServiceAccount: litmus-admin
experiments:
- name: pod-dns-error
spec:
components:
env:
# provide the transaction percentage within (0,100] range
# for example, it will affect 50% of the total dns queries
- name: TRANSACTION_PERCENTAGE
value: '50'
- name: TOTAL_CHAOS_DURATION
value: '60'

Container runtime and socket path

The CONTAINER_RUNTIME and SOCKET_PATH environment variables to set the container runtime and socket file path, respectively.

  • CONTAINER_RUNTIME: It supports docker, containerd, and crio runtimes. The default value is containerd.
  • SOCKET_PATH: It contains path of containerd socket file by default(/run/containerd/containerd.sock). For docker, specify path as /var/run/docker.sock. For crio, specify path as /var/run/crio/crio.sock.

The following YAML snippet illustrates the use of these environment variables:

## provide the container runtime and socket file path
apiVersion: litmuschaos.io/v1alpha1
kind: ChaosEngine
metadata:
name: engine-nginx
spec:
engineState: "active"
annotationCheck: "false"
appinfo:
appns: "default"
applabel: "app=nginx"
appkind: "deployment"
chaosServiceAccount: litmus-admin
experiments:
- name: pod-dns-error
spec:
components:
env:
# runtime for the container
# supports docker
- name: CONTAINER_RUNTIME
value: 'containerd'
# path of the socket file
- name: SOCKET_PATH
value: '/run/containerd/containerd.sock'
- name: TOTAL_CHAOS_DURATION
VALUE: '60'