Skip to main content

Run chaos experiments as GitLab pipelines

This tutorial explains how you can create chaos experiments using Harness Chaos Engineering (HCE) and run them in GitLab pipelines. Chaos experiments in Harness are created the same way in the chaos engineering module, irrespective of where they are invoked from.

  1. Create a chaos experiment in the Harness Chaos Engineering module. Execute this experiment to verify the configuration and ensure that the resilience probes are working as expected. The experiment ID and resilience score determined from this experiment run will be used to integrate the experiment with GitLab.

    chaos experiment with ID and resilience score

  2. Create a launch script. HCE APIs are used to invoke or launch a chaos experiment from the pipeline.

    To simplify creating an API call with the required secure parameters and data, a CLI tool is provided. Use this tool to create an appropriate API command to include in the pipeline script.

    Below is a sample launch script.


    set -e

    curl -sL -o hce-cli

    chmod +x hce-cli

    output=$(./hce-cli generate --api launch-experiment --account-id=${ACCOUNT_ID} \
    --project-id ${PROJECT_ID} --workflow-id ${WORKFLOW_ID} \
    --api-key ${API_KEY} --file-name | jq -r '.data.runChaosExperiment.notifyID')

    echo ${output}

    Go to GitLab demo for a sample configuration of the chaos launch script. You can include this script in the GitLab YAML file. This is a sample to include one single chaos experiment, but the same can be repeated so as to be included in multiple chaos experiments.

  3. Insert chaos experiments into .gitlab-ci.yaml. You can include the above-mentioned launch script in the GitLab pipeline as a stage or a step. In the script section, add the scripts for launching, monitoring and retrieving results. For example:

    # Insert a chaos stage where each chaos experiment is inserted as a launch script. 

    chaos-job: # This job runs in the deploy stage.
    stage: chaos # It only runs when *both* jobs in the test stage complete successfully.
    environment: production
    WORKFLOW_ID: "d7c9d243-0219-4f7c-84c2-3004e59e4505"
    - apt-get update; apt-get -y install jq
    - echo "Launching Chaos Experiment.."; EXPERIMENT_NOTIFY_ID=$(sh scripts/
    - echo "Monitoring Chaos Experiment.."; sh scripts/ ${EXPERIMENT_NOTIFY_ID}
    - echo "Deriving Resilience Score.."; ACTUAL_RESILIENCE_SCORE=$(sh scripts/ ${EXPERIMENT_NOTIFY_ID} | tr -d '"')
    - echo "Obtained Resilience Score is ${ACTUAL_RESILIENCE_SCORE}"
    - if [ ${ACTUAL_RESILIENCE_SCORE} -lt ${EXPECTED_RESILIENCE_SCORE} ]; then exit 1; fi

    stage: rollback
    environment: production
    name: bitnami/kubectl:latest
    entrypoint: ['']
    - *prepare_kubecontext
    - echo "Attempting Rollback.."; sh scripts/ #write your own rollback logic here
    needs: ["chaos-job"]
    when: on_failure

    The resilience score is the result of the experiment, and it helps decide if a rollback job needs to be invoked.

  4. Retrieve the resilience score using the Harness Chaos API and take appropriate action in the pipeline. An example of how to use the Harness Chaos API is shown below.


    set -e

    curl -sL -o hce-cli

    chmod +x hce-cli

    resiliencyScore=$(./hce-cli generate --api validate-resilience-score --account-id=${ACCOUNT_ID} \
    --project-id ${PROJECT_ID} --notifyID=$1 \
    --api-key ${API_KEY} --file-name

    echo "${resiliencyScore}"