Node network latency
Node network latency is a Kubernetes node-level chaos fault that induces packet latency across the entire node. Similar to pod network latency, this fault uses traffic control (tc) along with netem rules to inject network latency.
Use cases
Node network latency:
- Simulates a degraded network at the node level, causing potential disruptions to all pods running on the affected node.
- Tests the node and inter-node communication resilience against packet latency.
- Simulates scenarios where specific nodes might experience network problems due to issues like faulty NICs or network misconfigurations.
Permissions required
Below is a sample Kubernetes role that defines the permissions required to execute the fault.
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
namespace: hce
name: node-network-latency
spec:
definition:
scope: Cluster
permissions:
- apiGroups: [""]
resources: ["pods"]
verbs: ["create", "delete", "get", "list", "patch", "deletecollection", "update"]
- apiGroups: [""]
resources: ["events"]
verbs: ["create", "get", "list", "patch", "update"]
- apiGroups: [""]
resources: ["chaosEngines", "chaosExperiments", "chaosResults"]
verbs: ["create", "delete", "get", "list", "patch", "update"]
- apiGroups: [""]
resources: ["pods/log"]
verbs: ["get", "list", "watch"]
- apiGroups: [""]
resources: ["pods/exec"]
verbs: ["get", "list", "create"]
- apiGroups: ["batch"]
resources: ["jobs"]
verbs: ["create", "delete", "get", "list", "deletecollection"]
- apiGroups: [""]
resources: ["nodes"]
verbs: ["get", "list"]
Prerequisites
- Kubernetes > 1.16
- Nodes should be in a healthy state before and after injecting chaos.
Optional tunables
Tunable | Description | Notes |
---|---|---|
NODE_LABEL | Label of the node on which to induce network latency. | If not provided, the chaos operator selects nodes based on other criteria. For more information, go to target nodes with labels. |
TARGET_NODES | Comma-separated list of nodes subject to chaos. | For example, node-1,node-2 . For more information, go to target nodes. |
NODES_AFFECTED_PERC | Percentage of the total nodes to target. It takes numeric values only. | Default: 0 (corresponds to 1 node). For more information, go to node affected percentage. |
SOCKET_PATH | Path of the containerd or crio or docker socket file. | Defaults to /run/containerd/containerd.sock . For other runtimes, refer to the respective socket paths. For more information, go to socket path. |
CONTAINER_RUNTIME | Container runtime interface for the cluster. | Default: containerd. Supports docker, containerd, and crio. For more information, go to container runtime. |
DESTINATION_HOSTS | DNS names or FQDN names of the services and ports whose accessibility is impacted. | If not provided, network chaos is induced for all destinations or DESTINATION_IPS if defined. For more information, go to destination hosts. |
DESTINATION_IPS | Comma-separated IP addresses and ports of services, pods, or CIDR blocks whose accessibility is impacted. | If not provided, network chaos is induced for all destinations. For more information, go to destination IPs. |
NETWORK_INTERFACE | Name of the ethernet interface considered for shaping traffic. | Default is typically eth0 . For more information, go to network interface. |
NETWORK_LATENCY | Packet latency (in ms) across the node. | Default: 2000. For more information, go to network packet latency. |
RAMP_TIME | Period to wait before and after injecting chaos (in seconds). | For example, 30 s. For more information, go to ramp time. |
If the environment variables DESTINATION_HOSTS
or DESTINATION_IPS
are left empty, the default behaviour is to target all hosts. To limit the impact on all the hosts, you can specify the IP addresses of the service (use commas to separate multiple values) or the DNS or the FQDN names of the services in DESTINATION_HOSTS
.
Network packet latency
Network packet latency (in ms) injected into the entire node. Tune it by using the NETWORK_LATENCY
environment variable.
The following YAML snippet illustrates the use of these environment variables:
# it injects network-latency for the egress traffic
apiVersion: litmuschaos.io/v1alpha1
kind: ChaosEngine
metadata:
name: engine-nginx
spec:
engineState: "active"
annotationCheck: "false"
appinfo:
appns: "default"
applabel: "app=nginx"
appkind: "deployment"
chaosServiceAccount: litmus-admin
experiments:
- name: node-network-latency
spec:
components:
env:
# network packet latency percentage
- name: NETWORK_LATENCY
value: '2000'
- name: TOTAL_CHAOS_DURATION
value: '60'
Destination IPs and destination hosts
Default IPs and hosts affected by the network fault. Use DESTINATION_IPS
and DESTINATION_HOSTS
environment variables to specify the IPs and hosts.
DESTINATION_IPS
: IP addresses of the services or pods or the CIDR blocks (range of IPs) whose accessibility is impacted.DESTINATION_HOSTS
: DNS names or FQDN names of the services and ports whose accessibility is impacted.
The following YAML snippet illustrates the use of these environment variables:
# it injects the chaos for the egress traffic for specific ips/hosts
apiVersion: litmuschaos.io/v1alpha1
kind: ChaosEngine
metadata:
name: engine-nginx
spec:
engineState: "active"
annotationCheck: "false"
appinfo:
appns: "default"
applabel: "app=nginx"
appkind: "deployment"
chaosServiceAccount: litmus-admin
experiments:
- name: node-network-latency
spec:
components:
env:
# supports comma separated destination ips
- name: DESTINATION_IPS
value: '8.8.8.8,192.168.5.6'
# supports comma separated destination hosts
- name: DESTINATION_HOSTS
value: 'nginx.default.svc.cluster.local'
- name: TOTAL_CHAOS_DURATION
Network interface
Name of the ethernet interface considered to shape the traffic. Its default value is eth0
. Tune it by using the NETWORK_INTERFACE
environment variable.
The following YAML snippet illustrates the use of this environment variable:
# provide the network interface
apiVersion: litmuschaos.io/v1alpha1
kind: ChaosEngine
metadata:
name: engine-nginx
spec:
engineState: "active"
annotationCheck: "false"
appinfo:
appns: "default"
applabel: "app=nginx"
appkind: "deployment"
chaosServiceAccount: litmus-admin
experiments:
- name: node-network-latency
spec:
components:
env:
# name of the network interface
- name: NETWORK_INTERFACE
value: 'eth0'
- name: TOTAL_CHAOS_DURATION
value: '60'
Container runtime and socket path
The CONTAINER_RUNTIME
and SOCKET_PATH
environment variables to set the container runtime and socket file path, respectively.
CONTAINER_RUNTIME
: Supportsdocker
,containerd
, andcrio
runtimes. The default value iscontainerd
.SOCKET_PATH
: Contains path of containerd socket file by default(/run/containerd/containerd.sock
). Fordocker
, specify the path as/var/run/docker.sock
. Forcrio
, specify the path as/var/run/crio/crio.sock
.
The following YAML snippet illustrates the use of these environment variables:
## provide the container runtime and socket file path
apiVersion: litmuschaos.io/v1alpha1
kind: ChaosEngine
metadata:
name: engine-nginx
spec:
engineState: "active"
annotationCheck: "false"
appinfo:
appns: "default"
applabel: "app=nginx"
appkind: "deployment"
chaosServiceAccount: litmus-admin
experiments:
- name: node-network-latency
spec:
components:
env:
# runtime for the container
# supports docker, containerd, crio
- name: CONTAINER_RUNTIME
value: 'containerd'
# path of the socket file
- name: SOCKET_PATH
value: '/run/containerd/containerd.sock'
- name: TOTAL_CHAOS_DURATION
VALUE: '60'