Skip to main content

Prometheus probe

The Prometheus probe allows users to run Prometheus queries and match the resulting output against specific conditions. The intent behind this probe is to allow users to define metrics-based SLOs in a declarative way and determine the experiment verdict based on their success. The probe runs the query on a Prometheus server defined by the endpoint and checks whether the output satisfies the specified criteria. The outcome of a PromQL query (that is provided) is used for probe validation.

YAML only feature

In case of complex queries that span multiple lines, the queryPath attribute can be used to provide the link to a file consisting of the query. This file can be made available in the experiment pod via a ConfigMap resource, with the ConfigMap being passed in the ChaosEngine or the ChaosExperiment CR. Also, query and queryPath attributes are mutually exclusive. Refer to the probe schema here.

Probe definition

You can define the probes at .spec.experiments[].spec.probe path inside the chaos engine.

kind: Workflow
apiVersion: argoproj.io/v1alpha1
spec:
templates:
- inputs:
artifacts:
- raw:
data: |
apiVersion: litmuschaos.io/v1alpha1
kind: ChaosEngine
spec:
experiments:
- spec:
probe:
####################################
Probes are defined here
####################################
tip

The Prometheus probe expects you to provide a PromQL query along with Prometheus service endpoints to check for specific criteria.

Schema

Listed below is the probe schema for the Prometheus probe, with properties shared across all the probes and properties unique to the Prometheus probe.

FieldDescriptionTypeRangeNotes
nameFlag to hold the name of the probeMandatoryN/A type: stringThe name holds the name of the probe. It can be set based on the use case
typeFlag to hold the type of the probeMandatoryhttpProbe, k8sProbe, cmdProbe, promProbe, and datadogProbeThe type supports five types of probes: httpProbe, k8sProbe, cmdProbe, promProbe, and datadogProbe.
modeFlag to hold the mode of the probeMandatorySOT, EOT, Edge, Continuous, OnChaosThe mode supports five modes of probes: SOT, EOT, Edge, Continuous, and OnChaos. Datadog probe supports EOT mode only.
endpointFlag to hold the prometheus endpoints for the promProbeMandatoryN/A type: stringThe endpoint contains the prometheus endpoints
queryFlag to hold the promql query for the promProbeMandatoryN/A type: stringThe query contains the promql query to extract out the desired prometheus metrics via running it on the given prometheus endpoint
queryPathFlag to hold the path of the promql query for the promProbeOptionalN/A type: stringThe queryPath field is used in case of complex queries that spans multiple lines, the queryPath attribute can be used to provide the path to a file consisting of the same. This file can be made available to the experiment pod via a ConfigMap resource, with the ConfigMap name being defined in the ChaosEngine OR the ChaosExperiment CR.

Comparator

FieldDescriptionTypeRangeNotes
typeFlag to hold type of the data used for comparisonMandatoryfloatThe type contains type of data, which should be compared as part of comparison operation. Prometheus probe only compares with float data.
criteriaFlag to hold criteria for the comparisonMandatoryIt supports <, >, <=, >=, !=, ==, oneOf, between for int and float type. And equal, notEqual, contains, matches, notMatches, oneOf for string type.The criteria contains criteria of the comparison, as a part of comparison operation.
valueFlag to hold value for the comparisonMandatoryN/A type: stringThe value contains value of the comparison, which should follow the given criteria as part of comparison operation.

Authentication

This establishes a fundamental authentication mechanism for the Prometheus server. The "username:password", encoded in base64, should be placed either within the credentials field or as a file path in the credentialsFile field.

tip

The credentials and credentialsFile are two options that can't be used simultaneously.

Field Description Type Range Notes
type Flag to hold the authentication type Optional string The type encompasses the authentication method, which includes support for both Basic and Bearer authentication types
credentials Flag to hold the basic auth credentials in base64 format or bearer token Optional string The credentials consists of the basic authentication credentials, either as username:password encoded in base64 format or as a bearer token, depending on the authentication type
credentialsFile Flag to hold the basic auth credentials or bearer token file path Optional string The credentials consists of file path for basic authentication credentials or a bearer token, which are then attached to the experiment pod as volume secrets. These secret resources contain either the username:password encoded in base64 format or a bearer token, depending on the authentication type

TLS

It offers a mechanism to validate TLS certifications for the Prometheus server. You can supply the cacert or the client certificate and client key to perform the validation. Alternatively, you have the option to enable the insecureSkipVerify check to bypass certificate validation.

FieldDescriptionTypeRangeNotes
caFileFlag to hold the ca file pathOptionalstringThe caFile holds the file path of the CA certificates utilized for server TLS verification
certFileFlag to hold the client cert file pathOptionalstringThe certFile holds the file path of the client certificates utilized for TLS verification
keyFileFlag to hold the client key file pathOptionalstringThe keyFile holds the file path of the client key utilized for TLS verification
insecureSkipVerifyFlag to skip the tls certificates checksOptionalbooleanThe insecureSkipVerify skip the tls certificates checks
serverNameFlag to hold the server nameOptionalstringThe serverName name of the server

Run properties

Field Description Type Range Notes
probeTimeout Flag to hold the timeout of the probe Mandatory N/A type: string The probeTimeout represents the time limit for the probe to execute the specified check and return the expected data
attempt Flag to hold the attempt of the probe Mandatory N/A type: integer The attempt contains the number of times a check is run upon failure in the previous attempts before declaring the probe status as failed.
interval Flag to hold the interval of the probe Mandatory N/A type: string The interval contains the interval for which probes waits between subsequent retries
probePollingInterval Flag to hold the polling interval for the probes (applicable for all modes) Optional N/A type: string The probePollingInterval contains the time interval for which continuous and onchaos probe should be sleep after each iteration
initialDelaySeconds Flag to hold the initial delay interval for the probes Optional N/A type: integer The initialDelaySeconds represents the initial waiting time interval for the probes.
stopOnFailure Flags to hold the stop or continue the experiment on probe failure Optional N/A type: boolean The stopOnFailure can be set to true/false to stop or continue the experiment execution after probe fails

Definition

In the case of Dedicated Chaos Infrastructure, the following apply:

  • The mode and type are mandatory fields in the probe schema when you define the entire configuration of the probe in the manifest (for Kubernetes (Legacy), Linux, and Windows infrastructure).
  • The name, mode, type and other input properties (depending on the probe) is required to rightly configure the resilience probe. If all the necessary details are not provided, the probe will not execute.

In the case of Harness Delegate, the following apply:

  • For Kubernetes (Harness Infrastructure) (also known as DDCR), the mandatory fields are mode and probeID, and the type field is derived. These fields are generated and patched in the backend to the same manifest. However, in the UI, you will only see the mode and probeID fields when configuring your experiment. This is because the manifest is minified in the UI.
  • If you define the entire probe in task.definition.chaos.probes, the entire configuration is required. If you use the task.probeRef, you only need to specify probeID and mode fields.
probe:
- name: "check-probe-success"
type: "promProbe"
promProbe/inputs:
endpoint: "prometheus-server.prometheus.svc.cluster.local:9090"
query: "sum(rate(http_requests_total{code=~\"2..\"}[1m])) by (job)"
comparator:
criteria: ">" #supports >=,<=,>,<,==,!= comparison
value: "0"
auth:
credentials: "base64(<username:password>)"
tlsConfig:
insecureSkipVerify: true
mode: "Edge"
runProperties:
probeTimeout: 5s
interval: 2s
attempt: 1

Prometheus query (simple query)

This section holds the PromQL query used to extract the desired Prometheus metrics by executing it on the specified Prometheus endpoint. You can input the Prometheus query in the 'query' field, and this can be initiated by configuring the .promProbe/inputs.query field.

Use the following example to tune this:

apiVersion: litmuschaos.io/v1alpha1
kind: ChaosEngine
metadata:
name: engine-nginx
spec:
engineState: "active"
appinfo:
appns: "default"
applabel: "app=nginx"
appkind: "deployment"
chaosServiceAccount: litmus-admin
experiments:
- name: pod-delete
spec:
probe:
- name: "check-probe-success"
type: "promProbe"
promProbe/inputs:
# endpoint for the promethus service
endpoint: "prometheus-server.prometheus.svc.cluster.local:9090"
# promql query, which should be executed
query: "sum(rate(http_requests_total{code=~\"2..\"}[1m])) by (job)"
comparator:
# criteria which should be followed by the actual output and the expected output
#supports >=,<=,>,<,==,!= comparision
criteria: ">"
# expected value, which should follow the specified criteria
value: "0"
mode: "Edge"
runProperties:
probeTimeout: 5s
interval: 2s
attempt: 1

Prometheus query (complex query)

For intricate queries that extend across multiple lines, you can use the 'queryPath' attribute to specify the path to a file containing the query. This file can be accessed by the experiment pod through a ConfigMap resource, with the ConfigMap name defined in either the ChaosEngine or the ChaosExperiment CR. To set this up, configure the promProbe/inputs.queryPath field.

tip

The fields queryPath and query are mutually exclusive. If query is specified, it is used for the query; otherwise, queryPath is used.

Use the following example to tune this:

# contains the prom probe which execute the query and match for the expected criteria
apiVersion: litmuschaos.io/v1alpha1
kind: ChaosEngine
metadata:
name: engine-nginx
spec:
engineState: "active"
appinfo:
appns: "default"
applabel: "app=nginx"
appkind: "deployment"
chaosServiceAccount: litmus-admin
experiments:
- name: pod-delete
spec:
probe:
- name: "check-probe-success"
type: "promProbe"
promProbe/inputs:
# endpoint for the promethus service
endpoint: "prometheus-server.prometheus.svc.cluster.local:9090"
# the configMap should be mounted to the experiment which contains promql query
# use the mounted path here
queryPath: "/etc/config/prometheus-query"
comparator:
# criteria which should be followed by the actual output and the expected output
#supports >=,<=,>,<,==,!= comparision
criteria: ">"
# expected value, which should follow the specified criteria
value: "0"
mode: "Edge"
runProperties:
probeTimeout: 5s
interval: 2s
attempt: 1

Authentication

This establishes a fundamental authentication mechanism for the Prometheus server. The "username:password" encoded in base64 or bearer token, should be placed either within the credentials field or as a file path in the credentialsFile field.

tip

The credentials and credentialsFile are mutually exclusive, that is, these fields can't be used simultaneously.

Use the following example to tune this:

apiVersion: litmuschaos.io/v1alpha1
kind: ChaosEngine
metadata:
name: engine-nginx
spec:
engineState: "active"
appinfo:
appns: "default"
applabel: "app=nginx"
appkind: "deployment"
chaosServiceAccount: litmus-admin
experiments:
- name: pod-delete
spec:
probe:
- name: "check-probe-success"
type: "promProbe"
promProbe/inputs:
# endpoint for the promethus service
endpoint: "prometheus-server.prometheus.svc.cluster.local:9090"
# promql query, which should be executed
query: "sum(rate(http_requests_total{code=~\"2..\"}[1m])) by (job)"
comparator:
# criteria which should be followed by the actual output and the expected output
#supports >=,<=,>,<,==,!= comparison
criteria: ">"
# expected value, which should follow the specified criteria
value: "0"
auth:
type: Basic
credentials: "base64(<username:password>)"
mode: "Edge"
runProperties:
probeTimeout: 5s
interval: 2s
attempt: 1

TLS with custom certificates

It offers a mechanism to validate TLS certifications for the Prometheus server. You can supply the cacert or the client certificate and client key to perform the validation.

tip

The CA certificate file must be incorporated into the experiment pod either as a configMap or a secret. The volume name (configMap or secret) and mountPath should be specified within the chaosengine at the spec.components.secrets path.

Use the following example to tune this:

# contains the prom probe which execute the query and match for the expected criteria
apiVersion: litmuschaos.io/v1alpha1
kind: ChaosEngine
metadata:
name: engine-nginx
spec:
engineState: "active"
appinfo:
appns: "default"
applabel: "app=nginx"
appkind: "deployment"
chaosServiceAccount: litmus-admin
experiments:
- name: pod-delete
spec:
components:
secrets:
- name: ca-cert
mountPath: /etc/config
probe:
- name: "check-probe-success"
type: "promProbe"
promProbe/inputs:
# endpoint for the promethus service
endpoint: "https://prometheus-server.harness.io"
# promql query, which should be executed
query: "sum(rate(http_requests_total{code=~\"2..\"}[1m])) by (job)"
comparator:
# criteria which should be followed by the actual output and the expected output
#supports >=,<=,>,<,==,!= comparision
criteria: ">"
# expected value, which should follow the specified criteria
value: "0"
tlsConfig:
caFile: "/etc/config/ca.crt"
mode: "Edge"
runProperties:
probeTimeout: 5s
interval: 2s
attempt: 1

TLS skip certificate verification

You can bypass the TLS certificate checks by enabling the insecureSkipVerify option.

Use the following example to tune this:

apiVersion: litmuschaos.io/v1alpha1
kind: ChaosEngine
metadata:
name: engine-nginx
spec:
engineState: "active"
appinfo:
appns: "default"
applabel: "app=nginx"
appkind: "deployment"
chaosServiceAccount: litmus-admin
experiments:
- name: pod-delete
spec:
probe:
- name: "check-probe-success"
type: "promProbe"
promProbe/inputs:
# endpoint for the promethus service
endpoint: "https://prometheus-server.harness.io"
# promql query, which should be executed
query: "sum(rate(http_requests_total{code=~\"2..\"}[1m])) by (job)"
comparator:
# criteria which should be followed by the actual output and the expected output
#supports >=,<=,>,<,==,!= comparision
criteria: ">"
# expected value, which should follow the specified criteria
value: "0"
tlsConfig:
insecureSkipVerify: true
mode: "Edge"
runProperties:
probeTimeout: 5s
interval: 2s
attempt: 1