ECS update container timeout
ECS update container timeout modifies the start and stop timeouts for ECS containers in Amazon ECS clusters. The fault allows you to specify the duration for which the containers should be allowed to start or stop before they are considered failed.
ECS update container timeout:
- Tests the resilience of ECS tasks and their containers to timeouts during updates or deployments.
- Verifies the behavior of ECS tasks and their containers when the start or stop timeout is exceeded during updates or deployments.
- Tests the recovery mechanisms of the ECS service and container instances in case of timeouts.
- Simulates scenarios where containers take longer than expected to start or stop.
- Evaluates the impact of above-mentioned scenarios on the overall application availability and performance.
- Kubernetes >= 1.17
- ECS cluster running with the desired tasks and containers and familiarity with ECS service update and deployment concepts.
- Create a Kubernetes secret that has the AWS access configuration(key) in the
CHAOS_NAMESPACE. Below is a sample secret file:
# Add the cloud AWS credentials respectively
aws_access_key_id = XXXXXXXXXXXXXXXXXXX
aws_secret_access_key = XXXXXXXXXXXXXXX
- It is recommended to use the same secret name, that is,
cloud-secret. Otherwise, you will need to update the
AWS_SHARED_CREDENTIALS_FILEenvironment variable in the fault template and you may be unable to use the default health check probes.
Here is an example AWS policy to execute the fault.
- Refer to AWS named profile for chaos to use a different profile for AWS faults.
- The ECS containers should be in a healthy state before and after introducing chaos.
- Refer to the common attributes and AWS-specific tunables to tune the common tunables for all faults and AWS-specific tunables.
- Refer to the superset permission/policy to execute all AWS faults.
|CLUSTER_NAME||Name of the target ECS cluster.|| For example, |
|SERVICE_NAME||Name of the ECS service under chaos.|| For example, |
|REGION||Region name of the target ECS cluster|| For example, |
|TOTAL_CHAOS_DURATION||Duration that you specify, through which chaos is injected into the target resource (in seconds).||Defaults to 30s.|
|CHAOS_INTERVAL||Interval between successive instance terminations (in seconds).||Defaults to 30s.|
|AWS_SHARED_CREDENTIALS_FILE||Path to the AWS secret credentials.|| Defaults to |
|START_TIMEOUT||This is the maximum amount of time that ECS allows for a container to start successfully. If the container fails to start within this timeout period, ECS marks the task as failed and may trigger a restart or rescheduling of the task.||It is specified in seconds, and its default value is set to 3,600 seconds if not provided.|
|STOP_TIMEOUT|| This is the maximum amount of time that ECS allows for a container to stop gracefully. If the container does not stop within the ||It is specified in seconds, and its default value is set to 3,600 seconds if not provided.|
|RAMP_TIME||Period to wait before and after injecting chaos (in seconds).||For example, 30s.|
Start and stop timeout
The start and stop timeout for the task containers. Tune it by using the
STOP_TIMEOUT environment variable. Its default value is set to 3,600 seconds.
The following YAML snippet illustrates the use of this environment variable:
# set start and stop timeout for the target container
- name: ecs-update-container-timeout
# Provide the start and stop timeout for the ecs container
- name: START_TIMEOUT
- name: STOP_TIMEOUT
- name: REGION
- name: TOTAL_CHAOS_DURATION