Skip to main content

Flow of control in a chaos experiment

Experiment execution

The below diagram shows the flow of control when a user creates a new chaos experiment.

Experiment Execution

  1. The user attempts to create a new chaos experiment in the Chaos Control Plane.

  2. The Control plane prompts the user to input the required information for the creation a new experiment, such as

    • Chaos infrastructure: Which chaos infrastructure will be targeted as part of the experiment.
    • Fault and fault tunables: The fault templates can be fetched from any connected chaos hubs, where the tunables can be modified wherever necessary. Multiple faults can be added in any desired order.
    • Fault probes: Optionally, additional probes can be defined on top of the default health check probe for a fault, to validate custom hypothesis conditions as part of the experiment.
    • Fault weights: Fault weights define the importance of a fault with respect to other faults present in an experiment. More formally, it is used for calculating the experiment's resilience score, a quantitative measure of the target environment's resilience when the respective experiment is performed.

    The experiment is now created and ready to be executed.

  3. When the user attempts to run the experiment, the control plane relays the experiment data to the target chaos infrastructure, which undertakes four distinct responsibilities during the experiment execution:

    • Inject faults: Faults received as part of the experiment are interpreted and injected into the target resource. Depending on the experiment, multiple faults might be injected simultaneously.
    • Execute probes: Respective fault probes are executed as and when the faults execute and their result is stored.
    • Stream logs: When an experiment is being executed, real-time logs can be streamed, and retrieved when required. Experiment execution logs are streamed and accessible in real-time in the chaos control plane.
    • Send result: Finally, the experiment execution result, including the probe execution result, is sent back to the chaos control plane.

    The chaos experiment execution is now concluded.

Chaos rollback

Chaos rollback causes all the target resources in an experiment to re-attain their steady state after the execution of the experiment, which ensures the safety of all the applications deployed on your Linux machine.

  • Chaos rollback is performed at the end of each experiment execution. On-the-fly experiments can be safely aborted and the chaos is reverted.
  • In case of a network disruption between the control plane and execution plane during the execution of an experiment, it is gracefully aborted and the chaos is reverted.
  • In case of an abrupt exit of the chaos infrastructure process during the execution of an experiment, the daemon service reverts the chaos before restarting the process.
  • In case of an abrupt reboot of the machine, after the reboot, the daemon service checks and reverts any remnant inconsistency due to the prior execution of chaos, before starting the chaos infrastructure process.
  • In the rare scenario where the revert of chaos itself also leads to an error, an appropriate error message is logged in the experiment log for the manual intervention of the user.