Datadog probe
Datadog probe allows you to query raw metrics or a Synthetic test and use its results to evaluate the probe outcome.
- Synthetics probe query is supported for both API tests and Browser tests.
- Synthetics probe query may only be executed in the EOT mode, as the probe evaluation is based on the result of all the test iterations executed through the fault chaos duration. Metrics querying is supported for all the probe modes.
- If there are no iterations of the synthetics test through the chaos duration of the fault, the probe is marked as failed.
- Raw metrics are not yet available for Linux chaos infrastructure.
Providing secrets
Datadog secret keys need to be provided prior to using the probe, which are used to authenticate with the Datadog APIs. This includes an API key and an Application key.
- Kubernetes
- Linux
For a Kubernetes chaos infrastructure, the secrets shall be provided using a Kubernetes secret of the following format:
apiVersion: v1
kind: Secret
metadata:
name: datadog-secret
type: Opaque
stringData:
DD_API_KEY: "xxxxxxxxxxxxxxxxxxxx"
DD_APP_KEY: "xxxxxxxxxxxxxxxxxxxx"
The secret name, that is, datadog-secret has to be provided while configuring the probe using the datadogCredentialsSecretName
field.
For a Linux chaos infrastructure, the secrets are provided using an environment file at the following path, which is located on the machine where the infrastructure executes:
DD_API_KEY="xxxxxxxxxxxxxxxxxxxx"
DD_APP_KEY="xxxxxxxxxxxxxxxxxxxx"
Probe definition
- Kubernetes
- Linux
For a Kubernetes chaos infrastructure, the probe is defined at .spec.experiments[].spec.probe
path in the chaos engine manifest:
kind: Workflow
apiVersion: argoproj.io/v1alpha1
spec:
templates:
- inputs:
artifacts:
- raw:
data: |
apiVersion: litmuschaos.io/v1alpha1
kind: ChaosEngine
spec:
experiments:
- spec:
probe:
####################################
Probes are defined here
####################################
For a Linux chaos infrastructure, the probe is defined at .spec.tasks[].definition.chaos.probes
path in the Linux chaos experiment manifest:
apiVersion: litmuschaos.io/v1alpha1
kind: LinuxChaosExperiment
spec:
tasks:
- name: task-1
definition:
chaos:
probes:
####################################
Probes are defined here
####################################
Schema
Listed below is the Datadog Probe schema with common properties shared across all probes and properties unique to Datadog probe.
Field | Description | Type | Range | Notes |
---|---|---|---|---|
name | Flag to hold the name of the probe | Mandatory | N/A type: string | The name holds the name of the probe. It can be set based on the use-case |
type | Flag to hold the type of the probe | Mandatory | httpProbe, k8sProbe, cmdProbe, promProbe, and datadogProbe | The type supports four types of probes. It can one of the httpProbe, k8sProbe, cmdProbe, promProbe, and datadogProbe |
mode | Flag to hold the mode of the probe | Mandatory | SOT, EOT, Edge, Continuous, OnChaos | The mode supports five modes of probes. It can one of the SOT, EOT, Edge, Continuous, and OnChaos. For Datadog probe only EOT mode is supported |
datadogSite | Site for datadog probe | Mandatory | datadoghq.com, us3.datadoghq.com, us5.datadoghq.com, datadoghq.eu, ddog-gov.com, and ap1.datadoghq.com | The datadogSite supports six values. Refer here for details |
datadogCredentialsSecretName | Name of the secret having datadog probe secret keys | Optional | N/A type: string | Name of the Kubernetes secret containing the Datadog secret keys. Only required for Kubernetes chaos infrastructure |
syntheticsTest | Synthetic test details for the probe | Optional | type: syntheticsTest | Provide the Synthetic test details. It could be an API or a Browser test |
metrics | Metrics details for the probe | Optional | type: metrics | Provide the Datadog metrics details |
Synthetics test
Field | Description | Type | Range | Notes |
---|---|---|---|---|
publicId | Public ID of the synthetic test | Mandatory | N/A type: string | The publicId holds the ID of the synthetic test. |
testType | Type of the synthetic test | Mandatory | api, browser | The testType holds the type of the synthetic test. It can one of api and browser |
Metrics
Field | Description | Type | Range | Notes |
---|---|---|---|---|
query | Datadog metrics query | Mandatory | N/A type: string | |
timeFrame | The time frame through which the metrics should be queried. It is relative to the present time and hence it must be expressed as now-'timeFrameValue' . | Mandatory | type: string | Average or min or max of the timeframe specified. For example, now-5m provides average, minvaluefrom(now-5m) provides the minimum and maxvaluefrom(now-5m) provides the maximum value. |
comparator | Checks for the correctness of the probe output | Mandatory | type: comparator | Various fields to compare the desired and obtained data, includes type, criteria and value. |
Comparator
Field | Description | Type | Range | Notes |
type | Flag to hold type of the data used for comparison | Mandatory | float | The type contains type of data, which should be compared as part of comparison operation. |
criteria | Flag to hold criteria for the comparison | Mandatory | It supports >=, <=, ==, >, <, !=, oneOf, between for int and float type. And equal, notEqual, contains, matches, notMatches, oneOf for string type. | The criteria contains criteria of the comparison, which should be fulfill as part of comparison operation. |
value | Flag to hold value for the comparison | Mandatory | N/A type: string | The value contains value of the comparison, which should follow the given criteria as part of comparison operation. |
Run properties
Field | Description | Type | Range | Notes |
probeTimeout | Flag to hold the timeout of the probe | Mandatory | N/A type: string | The probeTimeout represents the time limit for the probe to execute the specified check and return the expected data |
attempt | Flag to hold the attempt of the probe | Mandatory | N/A type: integer | The attempt contains the number of times a check is run upon failure in the previous attempts before declaring the probe status as failed. |
interval | Flag to hold the interval of the probe | Mandatory | N/A type: string | The interval contains the interval for which probes waits between subsequent retries |
probePollingInterval | Flag to hold the polling interval for the probes (applicable for all modes) | Optional | N/A type: string | The probePollingInterval contains the time interval for which continuous and onchaos probe should be sleep after each iteration |
initialDelaySeconds | Flag to hold the initial delay interval for the probes | Optional | N/A type: integer | The initialDelaySeconds represents the initial waiting time interval for the probes. |
stopOnFailure | Flags to hold the stop or continue the experiment on probe failure | Optional | N/A type: boolean | The stopOnFailure can be set to true/false to stop or continue the experiment execution after probe fails |
Definition
In the case of Dedicated Chaos Infrastructure, the following apply:
- The
mode
andtype
are mandatory fields in the probe schema when you define the entire configuration of the probe in the manifest (for Kubernetes (Legacy), Linux, and Windows infrastructure). - The
name
,mode
,type
and other input properties (depending on the probe) is required to rightly configure the resilience probe. If all the necessary details are not provided, the probe will not execute.
In the case of Harness Delegate, the following apply:
- For Kubernetes (Harness Infrastructure) (also known as DDCR), the mandatory fields are
mode
andprobeID
, and thetype
field is derived. These fields are generated and patched in the backend to the same manifest. However, in the UI, you will only see themode
andprobeID
fields when configuring your experiment. This is because the manifest is minified in the UI. - If you define the entire probe in
task.definition.chaos.probes
, the entire configuration is required. If you use thetask.probeRef
, you only need to specifyprobeID
andmode
fields.
- Kubernetes
- Linux
probe:
- name: datadog-probe
type: "DatadogProbe"
mode: "EOT"
datadogProbe/inputs:
datadogSite: us5.datadoghq.com
syntheticsTest:
publicId: zgs-mq8-pgy
testType: api
datadogCredentialsSecretName: dd-secret
runProperties:
probeTimeout: 2s
attempt: 1
interval: 3s
stopOnFailure: false
probes:
- name: datadog-probe
type: "DatadogProbe"
mode: "EOT"
datadogProbe/inputs:
datadogSite: us5.datadoghq.com
syntheticsTest:
publicId: zgs-mq8-pgy
testType: api
metrics:
query: avg:system.load.1{*}
timeFrame: now-5m
comparator:
type: "float"
criteria: "<="
value: "100"
runProperties:
probeTimeout: 2s
attempt: 1
interval: 3s
stopOnFailure: false
Metrics
To trigger a probe that queries Datadog metrics, specify the metrics
properties.
Use the following example to tune this:
apiVersion: litmuschaos.io/v1alpha1
kind: ChaosEngine
metadata:
name: engine-nginx
spec:
engineState: "active"
appinfo:
appns: "default"
applabel: "app=nginx"
appkind: "deployment"
chaosServiceAccount: litmus-admin
experiments:
- name: pod-delete
spec:
probe:
- name: "datadog-probe"
type: "datadogProbe"
datadogProbe/inputs:
datadogSite: us5.datadoghq.com
metrics:
query: avg:system.load.1{*}
timeFrame: now-5m
comparator:
type: "float"
criteria: "<="
value: "100"
datadogCredentialsSecretName: dd-secret
mode: "EOT"
runProperties:
probeTimeout: 5s
interval: 2s
attempt: 1
stopOnFailure: false
API test
To trigger an API test, specify the syntheticsTest.testType
as api
.
Use the following example to tune this:
- Kubernetes
- Linux
apiVersion: litmuschaos.io/v1alpha1
kind: ChaosEngine
metadata:
name: engine-nginx
spec:
engineState: "active"
appinfo:
appns: "default"
applabel: "app=nginx"
appkind: "deployment"
chaosServiceAccount: litmus-admin
experiments:
- name: pod-delete
spec:
probe:
- name: "datadog-probe"
type: "datadogProbe"
datadogProbe/inputs:
datadogSite: us5.datadoghq.com
syntheticsTest:
publicId: zgs-mq8-pgy
testType: api
datadogCredentialsSecretName: dd-secret
mode: "EOT"
runProperties:
probeTimeout: 5s
interval: 2s
attempt: 1
stopOnFailure: false
apiVersion: litmuschaos.io/v1alpha1
kind: LinuxChaosExperiment
metadata:
name: process-kill
labels:
context: process-kill
name: process-kill
spec:
steps:
- - name: process-kill-task
tasks:
- name: process-kill-task
taskType: "chaos"
weight: 10
definition:
chaos:
probes:
- name: datadog-probe
type: "DatadogProbe"
mode: "EOT"
datadogProbe/inputs:
datadogSite: us5.datadoghq.com
syntheticsTest:
publicId: zgs-mq8-pgy
testType: api
runProperties:
probeTimeout: 2s
attempt: 1
interval: 3s
stopOnFailure: false
experiment: linux-process-kill
processKillChaos/inputs:
duration: 30
processNames: "nginx"
forceKill: false
Browser test
To trigger a browser test, specify the syntheticsTest.testType
as browser
.
Use the following example to tune this:
- Kubernetes
- Linux
apiVersion: litmuschaos.io/v1alpha1
kind: ChaosEngine
metadata:
name: engine-nginx
spec:
engineState: "active"
appinfo:
appns: "default"
applabel: "app=nginx"
appkind: "deployment"
chaosServiceAccount: litmus-admin
experiments:
- name: pod-delete
spec:
probe:
- name: "datadog-probe"
type: "datadogProbe"
datadogProbe/inputs:
datadogSite: us5.datadoghq.com
syntheticsTest:
publicId: zgs-mq8-pgy
testType: browser
datadogCredentialsSecretName: dd-secret
mode: "EOT"
runProperties:
probeTimeout: 5s
interval: 2s
attempt: 1
stopOnFailure: false
apiVersion: litmuschaos.io/v1alpha1
kind: LinuxChaosExperiment
metadata:
name: process-kill
labels:
context: process-kill
name: process-kill
spec:
steps:
- - name: process-kill-task
tasks:
- name: process-kill-task
taskType: "chaos"
weight: 10
definition:
chaos:
probes:
- name: datadog-probe
type: "DatadogProbe"
mode: "EOT"
datadogProbe/inputs:
datadogSite: us5.datadoghq.com
syntheticsTest:
publicId: zgs-mq8-pgy
testType: browser
runProperties:
probeTimeout: 2s
attempt: 1
interval: 3s
stopOnFailure: false
experiment: linux-process-kill
processKillChaos/inputs:
duration: 30
processNames: "nginx"
forceKill: false