Create experiment
This topic describes how you can create chaos experiments that consist of chaos faults and execute them to build and improve the resilience of your application.
Before you begin
- What is a chaos experiment?
- What are chaos faults?
- What are resilience probes?
- How to create a resilience probe?
Create Environment
Before you create an experiment, you need to create an environment where you have to enable a chaos infrastructure.
-
To create an environment, go to Chaos module, and click Environments. Select New Environment.
-
Provide a name, and click Create. This creates an environment.
To edit or delete the environment, select the ⋮
icon against the name of the environment.
Create a Chaos Experiment
You can add one or more chaos faults to a chaos experiment and execute it. Follow the interactive guide or the step-by-step guide below create a chaos experiment with one chaos fault, namely, pod delete, which has one resilience probe associated with it.
Step-by-step guide
To add a chaos experiment:
-
In Harness, navigate to Chaos > Chaos Experiments. Click + New Experiment.
-
In the Experiment Overview, enter the experiment Name and optional Description and Tags. In Select a Chaos Infrastructure, select the infrastructure where the target resources reside, then click Next.
For more information on infrastructure, see Connect chaos infrastructures.
-
This takes you to the Experiment Builder tab, where you can start building your experiment.
-
Choose how you want to build the experiment. The options, explained later, are:
- Blank Canvas - Lets you build the experiment from scratch, adding the specific faults you want.
- Templates from ChaosHubs - Lets you preview and select and experiment from pre-curated experiment templates available in ChaosHubs.
- Upload YAML - Lets you upload an experiment manifest YAML file.
These options are explained below.
Using Blank Canvas
-
On the Experiment Builder tab, click Add to add a fault to the experiment.
-
Select the fault you want to add to the experiment.
-
For each fault, tune the properties. The properties will vary depending on the faults.
-
To tune each fault:
-
Specify the target application (only for pod-level Kubernetes faults): This allows the corresponding pods of the application to be targeted.
-
Tune fault parameters: Each fault has a set of common parameters, like chaos duration and ramp time, and unique parameters that you can customize as needed.
-
Add chaos probes: (Optional) On the Probes tab, add resilience probes to automate the chaos hypothesis checks for a fault during the experiment execution. Probes are declarative checks that validate specific criteria, that help determine if an experiment passed.
-
-
Tune Fault Weightage: Set the weight for the fault, which determines its importance relative to other faults in the experiment. This weight is used to calculate the experiment's resilience score.
-
Using Templates from ChaosHubs
-
Select an experiment template from a ChaosHub.
-
Choose Experiment Type to see the available ChaosHubs.
-
Select a template to preview the faults included.
-
You can edit the template to add more faults or update the existing faults.
Upload YAML
- Upload an experiment manifest YAML file to create the experiment.
You can edit the experiment to update the existing faults or add more.
After constructing the chaos experiment using one of the three options, save the experiment.
- Click Save to save the experiment to the Chaos Experiments page. You can add it to a ChaosHub later.
- Select Add Experiment to ChaosHub to save this experiment as a template in a selected ChaosHub.
Create Experiment as a Pipeline
-
Go to Chaos module and select Pipelines and click +Create a Pipeline.
-
Provide a name, and click Start.
-
Click the + (the stage type), and select Custom Stage.
-
Provide a name for the stage, and click Set Up Stage.
-
Click Add Step and choose the Add Step option.
-
Choose Chaos from the Step Library.
-
Provide a name, select the chaos experiment.
-
Choose from the list of chaos experiments, and click Add to Pipeline.
-
Click Apply Changes.
-
Click Save.
For more information, go to Pipeline concepts and Pipeline Modeling Overview.
Run or Schedule the Experiment
You can choose to run the experiment immediately by clicking the Run button, or schedule it to run at a specific time by selecting the Schedule tab.
Execute Experiment Once
-
To execute the experiment once, select Non-Cron (Single run), click Set Schedule, and then select Run.
-
To run the experiment once, and at a specific time, select the Run Once at a specific time, choose the date and time, click apply, and select Set Schedule.
Execute Experiment on a Schedule
-
To schedule the experiment to run periodically, select Cron (Recurring run), and set the schedule using the Minutes, Hourly, Daily, Monthly or Yearly options. The Cron Expression will be automatically generated.
-
Click Set Schedule.
Advanced Experiment Setup Options
On the Experiment Builder tab, you can click Advanced Options to configure the following advanced options when creating an experiment for a Kubernetes chaos infrastructure:
General Options
Node Selector
Specify the node on which the experiment pods will be scheduled by providing the node label as a key-value pair.
-
This can be used with node-level faults to avoid scheduling the experiment pod on the target node(s).
-
It can also be used to limit the scheduling of experiment pods on nodes with an unsupported OS.
Toleration
Specify the tolerations that must be satisfied by a tainted node to schedule the experiment pods. For more information on taints and tolerations, refer to the Kubernetes documentation.
-
This can be used with node-level faults to avoid scheduling the experiment pod on the target node(s).
-
It can also be used to limit the scheduling of the experiment pods on nodes with an unsupported OS.
Annotations
Specify the annotations to be added to the experiment pods by providing them as key-value pairs. For more information on annotations, refer to the Kubernetes documentation.
Annotations can be used to bypass network proxies enforced by service mesh tools like Istio.
Security Options
Enable runAsUser
Specify the user ID to start all the processes in the experiment pod containers. By default, the user ID 1000
is used.
This option allows privileged or restricted access for experiment pods.
Enable runAsGroup
Specify the group ID to start all the processes in the experiment pod containers instead of a user ID. This option allows privileged or restricted access for experiment pods.
Add serial and parallel faults
You can add multiple faults in a single chaos experiment that is scaled efficiently by HCE during execution.
Consider the overall impact that these faults have on the application. Your experience in production environments may differ due to lack of resources when a number of parallel faults are being executed.
-
To add a fault that runs in parallel to another fault, point your mouse below an existing fault, and then select Add. You can follow the same process to add a serial fault.
For Linux, experiments with a parallel fault are currently not supported.
The image below shows a single experiment that consists of serial and parallel faults.
-
Faults A, B, and C are parallel faults. They begin execution at the same time.
-
Faults A, B, C and faults D and E are serial. A, B, and C complete execution and then D and E begin execution.
-
Similarly, faults H and I are serial faults, where H completes execution, and I begins.
Analyze experiment
You can observe the status of execution of fault/s of a chaos experiment during its run. The screen shows the experiment pipeline on the right hand side, and details such as Environment, Infrastructure Name, and the runs that have passed and failed on the left hand side.
When the experiment completes execution, it displays the Resilience Score. This score describes how resilient your application is to unplanned failures. The probe success percentage helps determine the outcome of every fault in the chaos experiment. Probes (if any) associated with the experiment are used to understand how the application fared.
If any of the faults fail, you can find the Fail Step that elaborates on the reason why the fault failed.