Skip to main content

EC2 network latency

EC2 network latency causes flaky access to the application (or services) by injecting network packet latency to EC2 instance(s).

  • It determines the performance of the application (or process) running on the EC2 instances.

EC2 Network Latency

Usage

View fault usage
This fault degrades the network without the EC2 instance being marked as unhealthy (or unworthy) of traffic. This can be resolved by using a middleware that switches traffic based on some SLOs (performance parameters). The EC2 instance may stall or get corrupted while waiting endlessly for a packet. This fault limits the impact (blast radius) to only the traffic that you wish to test, by specifying the IP addresses. This fault will help to improve the resilience of your services over time. It simulates a consistently slow network connection between microservices (for example, cross-region connectivity between active-active peers of a given service or across services or poor cni-performance in the inter-pod-communication network). It also simulates jittery connection with transient latency spikes between microservices, slow response on specific third party (or dependent) components (or services), and degraded data-plane of service-mesh infrastructure.

Prerequisites

  • Kubernetes > 1.16
  • SSM agent is installed and running on the target EC2 instance.
  • Create a Kubernetes secret that has the AWS Access Key ID and Secret Access Key credentials in the CHAOS_NAMESPACE. A sample secret file looks like:
apiVersion: v1
kind: Secret
metadata:
name: cloud-secret
type: Opaque
stringData:
cloud_config.yml: |-
# Add the cloud AWS credentials respectively
[default]
aws_access_key_id = XXXXXXXXXXXXXXXXXXX
aws_secret_access_key = XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
  • If you change the secret name, update the experiment.yml environment variable to derive the respective data from the secret. Account for the path where this secret is mounted as a file manifest in the AWS_SHARED_CREDENTIALS_FILE environment variable.

Note

You can pass the VM credentials as secrets or as a ChaosEngine environment variable.

Permissions required

Here is an example AWS policy to execute the fault.

View policy for the fault
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"ssm:GetDocument",
"ssm:DescribeDocument",
"ssm:GetParameter",
"ssm:GetParameters",
"ssm:SendCommand",
"ssm:CancelCommand",
"ssm:CreateDocument",
"ssm:DeleteDocument",
"ssm:GetCommandInvocation",
"ssm:UpdateInstanceInformation",
"ssm:DescribeInstanceInformation"
],
"Resource": "*"
},
{
"Effect": "Allow",
"Action": [
"ec2messages:AcknowledgeMessage",
"ec2messages:DeleteMessage",
"ec2messages:FailMessage",
"ec2messages:GetEndpoint",
"ec2messages:GetMessages",
"ec2messages:SendReply"
],
"Resource": "*"
},
{
"Effect": "Allow",
"Action": [
"ec2:DescribeInstanceStatus",
"ec2:DescribeInstances"
],
"Resource": [
"*"
]
}
]
}

Refer to the superset permission/policy to execute all AWS faults.

Default validations

The EC2 instance should be in healthy state.

Fault tunables

Fault tunables

Mandatory fields

Variables Description Notes
EC2_INSTANCE_ID ID of the target EC2 instance. For example, i-044d3cb4b03b8af1f.
REGION The AWS region ID where the EC2 instance has been created. For example, us-east-1.

Optional fields

Variables Description Notes
TOTAL_CHAOS_DURATION Duration that you specify, through which chaos is injected into the target resource (in seconds). Defaults to 30s.
CHAOS_INTERVAL Time interval between two successive instance terminations (in seconds). Defaults to 30s.
AWS_SHARED_CREDENTIALS_FILE Provide the path for AWS secret credentials. Defaults to /tmp/cloud_config.yml.
INSTALL_DEPENDENCY Select to install dependencies used to run the network chaos. It can be either True or False. If the dependency already exists, you can turn it off. Defaults to True.
NETWORK_LATENCY The latency/delay in milliseconds. Default to 2000, provide numeric value only.
JITTER The network jitter value in ms. Defaults to 0, provide numeric value only.
DESTINATION_IPS IP addresses of the services or the CIDR blocks(range of IPs), the accessibility to which is impacted. comma-separated IP(S) or CIDR(S) can be provided. If not provided, it will induce network chaos for all ips/destinations.
DESTINATION_HOSTS DNS names of the services, the accessibility to which, is impacted. If not provided, it will induce network chaos for all ips/destinations or DESTINATION_IPS if already defined.
NETWORK_INTERFACE Name of ethernet interface considered for shaping traffic. Defaults to `eth0`.
SEQUENCE It defines the sequence of chaos execution for multiple instances. Defaults to parallel. Supports serial sequence as well.
RAMP_TIME Period to wait before and after injecting chaos (in seconds). For example, 30s.

Fault examples

Common fault tunables

Refer to the common attributes to tune the common tunables for all the faults.

Network packet latency

It defines the network packet latency to be injected on the EC2 instances. You can tune it using the NETWORK_LATENCY environment variable.

You can tune it using the following example:

# it injects the chaos into the egress traffic
apiVersion: litmuschaos.io/v1alpha1
kind: ChaosEngine
metadata:
name: engine-nginx
spec:
engineState: "active"
chaosServiceAccount: litmus-admin
experiments:
- name: ec2-network-latency
spec:
components:
env:
# network packet latency
- name: NETWORK_LATENCY
value: '2000'
- name: EC2_INSTANCE_ID
value: 'instance-1'
- name: REGION
value: 'us-west-2'

Run with jitter

It defines the jitter (in ms), a parameter that allows introducing a network delay variation. You can tune it using the JITTER environment variable. Its default value is 0.

You can tune it using the following example:

# it injects the chaos into the egress traffic
apiVersion: litmuschaos.io/v1alpha1
kind: ChaosEngine
metadata:
name: engine-nginx
spec:
engineState: "active"
chaosServiceAccount: litmus-admin
experiments:
- name: ec2-network-latency
spec:
components:
env:
# value of the network latency jitter (in ms)
- name: JITTER
value: '200'
- name: NETWORK_LATENCY
value: '2000'
- name: EC2_INSTANCE_ID
value: 'instance-1'
- name: REGION
value: 'us-west-2'

Run with destination IPs and destination hosts

The network faults interrupt traffic for all the IPs/hosts by default. The interruption of specific IPs/Hosts can be tuned via DESTINATION_IPS and DESTINATION_HOSTS environment variable.

DESTINATION_IPS: It contains the IP addresses of the services or the CIDR blocks(range of IPs), the accessibility to which is impacted. DESTINATION_HOSTS: It contains the DNS Names of the services, the accessibility to which, is impacted

You can tune it using the following example:

# it injects the chaos into the egress traffic for specific IPs/hosts
apiVersion: litmuschaos.io/v1alpha1
kind: ChaosEngine
metadata:
name: engine-nginx
spec:
engineState: "active"
chaosServiceAccount: litmus-admin
experiments:
- name: ec2-network-latency
spec:
components:
env:
# supports comma-separated destination ips
- name: DESTINATION_IPS
value: '8.8.8.8,192.168.5.6'
# supports comma-separated destination hosts
- name: DESTINATION_HOSTS
value: 'google.com'
- name: EC2_INSTANCE_ID
value: 'instance-1'
- name: REGION
value: 'us-west-2'

Network interface

The defined name of the ethernet interface, which is considered for shaping traffic. You can tune it using the NETWORK_INTERFACE environment variable. Its default value is eth0.

You can tune it using the following example:

# it injects the chaos into the egress traffic for specific network interface
apiVersion: litmuschaos.io/v1alpha1
kind: ChaosEngine
metadata:
name: engine-nginx
spec:
engineState: "active"
chaosServiceAccount: litmus-admin
experiments:
- name: ec2-network-latency
spec:
components:
env:
# name of the network interface
- name: NETWORK_INTERFACE
value: 'eth0'
- name: EC2_INSTANCE_ID
value: 'instance-1'
- name: REGION
value: 'us-west-2'