EC2 network loss
EC2 network loss causes flaky access to the application (or services) by injecting network packet loss to EC2 instance(s).
- It checks the performance of the application (or process) running on the EC2 instances.
Usage
View fault usage
It simulates degraded network with varied percentages of dropped packets between microservices, loss of access to specific third party (or dependent) services (or components), blackhole against traffic to a given AZ (failure simulation of availability zones), and network partitions (split-brain) between peer replicas for a stateful application.
This fault helps improve the resilience of your services over time.
Prerequisites
- Kubernetes > 1.16
- SSM agent is installed and running on the target EC2 instance.
- Ensure to create a Kubernetes secret having the AWS Access Key ID and Secret Access Key credentials in the
CHAOS_NAMESPACE
. Below is the sample secret file:
apiVersion: v1
kind: Secret
metadata:
name: cloud-secret
type: Opaque
stringData:
cloud_config.yml: |-
# Add the cloud AWS credentials respectively
[default]
aws_access_key_id = XXXXXXXXXXXXXXXXXXX
aws_secret_access_key = XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
- If you change the secret name then please also update the
experiment.yml
environment variable for deriving the respective data from the secret. Also account for the path at which this secret is mounted as a file in the manifest environment variableAWS_SHARED_CREDENTIALS_FILE
.
Note
You can pass the VM credentials as secrets or as a ChaosEngine
environment variable.
Permissions required
Here is an example AWS policy to execute the fault.
View policy for the fault
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"ssm:GetDocument",
"ssm:DescribeDocument",
"ssm:GetParameter",
"ssm:GetParameters",
"ssm:SendCommand",
"ssm:CancelCommand",
"ssm:CreateDocument",
"ssm:DeleteDocument",
"ssm:GetCommandInvocation",
"ssm:UpdateInstanceInformation",
"ssm:DescribeInstanceInformation"
],
"Resource": "*"
},
{
"Effect": "Allow",
"Action": [
"ec2messages:AcknowledgeMessage",
"ec2messages:DeleteMessage",
"ec2messages:FailMessage",
"ec2messages:GetEndpoint",
"ec2messages:GetMessages",
"ec2messages:SendReply"
],
"Resource": "*"
},
{
"Effect": "Allow",
"Action": [
"ec2:DescribeInstanceStatus",
"ec2:DescribeInstances"
],
"Resource": [
"*"
]
}
]
}
Refer to the superset permission/policy to execute all AWS faults.
Default validations
The EC2 instance should be in healthy state.
Fault tunables
Fault tunables
Mandatory fields
Variables | Description | Notes |
---|---|---|
EC2_INSTANCE_ID | ID of the target EC2 instance. | For example, i-044d3cb4b03b8af1f . |
REGION | The AWS region ID where the EC2 instance has been created. | For example, us-east-1 . |
Optional fields
Variables | Description | Notes |
---|---|---|
TOTAL_CHAOS_DURATION | Duration that you specify, through which chaos is injected into the target resource (in seconds). | Defaults to 30s. |
CHAOS_INTERVAL | Time interval between two successive instance terminations (in seconds). | Defaults to 30s. |
AWS_SHARED_CREDENTIALS_FILE | Provide the path for aws secret credentials. | Defaults to /tmp/cloud_config.yml . |
INSTALL_DEPENDENCY | Select to install dependencies used to run the network chaos. It can be either True or False. | If the dependency already exists, you can turn it off. Defaults to True. |
NETWORK_PACKET_LOSS_PERCENTAGE | The packet loss in percentage. | Default to 100 percentage. |
DESTINATION_IPS | IP addresses of the services or the CIDR blocks(range of IPs), the accessibility to which is impacted. | comma-separated IP(S) or CIDR(S) can be provided. If not provided, it will induce network chaos for all ips/destinations. |
DESTINATION_HOSTS | DNS Names of the services, the accessibility to which, is impacted. | if not provided, it will induce network chaos for all ips/destinations or DESTINATION_IPS if already defined. |
NETWORK_INTERFACE | Name of ethernet interface considered for shaping traffic. | Defaults to `eth0`. |
SEQUENCE | It defines sequence of chaos execution for multiple instance. | Defaults to parallel. Supports serial sequence as well. |
RAMP_TIME | Period to wait before and after injecting chaos (in seconds). | For example, 30s |
Fault examples
Fault tunables
Refer to the common attributes to tune the common tunables for all the faults.
Network packet loss
It defines the network packet loss percentage to be injected on the EC2 instances. You can tune it using the NETWORK_PACKET_LOSS_PERCENTAGE
environment variable.
You can tune it using the following example:
# it injects the chaos into the egress traffic
apiVersion: litmuschaos.io/v1alpha1
kind: ChaosEngine
metadata:
name: engine-nginx
spec:
engineState: "active"
chaosServiceAccount: litmus-admin
experiments:
- name: ec2-network-loss
spec:
components:
env:
# network packet loss percentage
- name: NETWORK_PACKET_LOSS_PERCENTAGE
value: '100'
- name: EC2_INSTANCE_ID
value: 'instance-1'
- name: REGION
value: 'us-west-2'
Run with destination IPs and destination hosts
The network faults interrupt traffic for all the IPs/hosts by default. The interruption of specific IPs/Hosts can be tuned via DESTINATION_IPS
and DESTINATION_HOSTS
environment variable.
DESTINATION_IPS
: It contains the IP addresses of the services or the CIDR blocks(range of IPs), the accessibility to which is impacted.
DESTINATION_HOSTS
: It contains the DNS Names of the services, the accessibility to which, is impacted
You can tune it using the following example:
# it injects the chaos into the egress traffic for specific IPs/hosts
apiVersion: litmuschaos.io/v1alpha1
kind: ChaosEngine
metadata:
name: engine-nginx
spec:
engineState: "active"
chaosServiceAccount: litmus-admin
experiments:
- name: ec2-network-loss
spec:
components:
env:
# supports comma-separated destination ips
- name: DESTINATION_IPS
value: '8.8.8.8,192.168.5.6'
# supports comma-separated destination hosts
- name: DESTINATION_HOSTS
value: 'google.com'
- name: EC2_INSTANCE_ID
value: 'instance-1'
- name: REGION
value: 'us-west-2'
Network interface
The defined name of the ethernet interface, which is considered for shaping traffic. You can tune it using the NETWORK_INTERFACE
environment variable. Its default value is eth0
.
You can tune it using the following example:
# it injects the chaos into the egress traffic for specific network interface
apiVersion: litmuschaos.io/v1alpha1
kind: ChaosEngine
metadata:
name: engine-nginx
spec:
engineState: "active"
chaosServiceAccount: litmus-admin
experiments:
- name: ec2-network-loss
spec:
components:
env:
# name of the network interface
- name: NETWORK_INTERFACE
value: 'eth0'
- name: EC2_INSTANCE_ID
value: 'instance-1'
- name: REGION
value: 'us-west-2'