Use Investigator Agent Pipelines

Early Access Feature

Investigator Agent Pipelines are currently in Early Access. The core end-to-end flow is functional, but full productionization features (UI pipeline builder, pipeline marketplace, enhanced error handling) are planned for Q2 2026. Contact your Harness representative to enable this feature for your organization.

Investigator Agent Pipelines enable teams to extend the AI Investigator with custom investigation steps that run alongside the built-in RCA Change Agent. Use pipelines to connect domain-specific data sources, execute custom analysis logic, and enrich incident investigations with context unique to your infrastructure.

Overview

The AI Investigator analyzes incidents using multiple data sources to identify root cause candidates. While the built-in RCA Change Agent provides out-of-the-box investigation capabilities (deployments, pull requests, ServiceNow changes), many teams have unique infrastructure, internal tools, or domain-specific knowledge that requires custom investigation logic.

Investigator Agent Pipelines solve this by:

Connecting custom data sources (internal wikis, observability tools, vendor APIs)
Running domain-specific analysis logic (canary metric checks, change database queries, compliance validations)
Enriching investigations with infrastructure-specific context (cloud provider metadata, network topology, service dependencies)

What Are Agent Pipelines?

Agent pipelines are custom investigation workflows built using Harness pipeline stages. When an incident is created (or manually triggered), the pipeline executes and returns investigation results that appear alongside built-in RCA theories in the Investigator panel.

Key Characteristics

Build once, run on every incident: Pipelines run automatically without per-incident manual steps
Appear alongside built-in theories: Custom pipeline results display in the same Investigator panel as RCA Change Agent theories
Flexible data sources: Connect any API, database, or internal tool accessible from Harness pipelines
Domain-specific logic: Implement analysis specific to your infrastructure (e.g., query deployment canary metrics, check feature flag states, validate compliance)

How It Works

Pipeline Execution Flow

Incident Created: An incident is created in Harness AI SRE (manually or via alert rule)
Pipeline Triggered: Configured agent pipelines trigger automatically on incident creation
Investigation Runs: Pipeline stages execute custom investigation logic (API calls, data fetches, analysis)
Results Returned: Pipeline outputs are captured and formatted as investigation results
Display in Investigator: Results appear in the Investigator panel alongside built-in RCA theories

Input Contract

Agent pipelines receive incident context as input variables:

Variable	Description	Example
`incident_id`	Unique identifier for the incident	`INC-12345`
`incident_title`	Incident title	`High API latency in production`
`incident_severity`	Severity level	`critical`, `high`, `medium`, `low`
`affected_services`	List of impacted services	`["api-gateway", "auth-service"]`
`incident_start_time`	Incident start timestamp (ISO 8601)	`2026-04-30T14:23:00Z`
`environment`	Affected environment (if available)	`production`, `staging`

Pipelines can use these variables to scope investigation queries and analysis.

Output Contract

Pipelines return investigation results in a structured format:

{
  "theories": [
    {
      "title": "Recent deployment to api-gateway",
      "description": "Deployment api-gateway-v2.3.1 rolled out 5 minutes 
         before incident start. Canary metrics show 
         p99 latency increased from 200ms to 1500ms.",
      "confidence": 85,
      "evidence": [
        {
          "type": "deployment",
          "source": "Internal Deployment Tracker",
          "timestamp": "2026-04-30T14:18:00Z",
          "details": "api-gateway-v2.3.1 deployed to production"
        },
        {
          "type": "metric",
          "source": "Canary Dashboard",
          "metric_name": "p99_latency_ms",
          "value": 1500,
          "threshold": 500
        }
      ]
    }
  ]
}

Field Definitions:

title: Short summary of the theory (shown in Investigator panel)
description: Detailed explanation with supporting evidence
confidence: Confidence score (0-100) indicating likelihood this is the root cause
evidence: Array of supporting data points (deployments, metrics, logs, configuration changes)

Supported Step Types

Agent pipelines use standard Harness pipeline stages to execute investigation logic:

Data Fetch Steps

Query external data sources and APIs:

HTTP Step: Call REST APIs (internal wikis, custom change databases, vendor APIs)
Shell Script Step: Execute custom scripts (query internal databases, parse logs, check file systems)
Container Step: Run containerized analysis tools (custom Python/Go scripts, ML models, domain-specific analyzers)

Analysis Steps

Process data and generate insights:

Shell Script Step: Run custom analysis logic (correlate metrics, parse deployment logs, calculate impact)
Container Step: Execute complex analysis (ML inference, anomaly detection, pattern matching)

Enrichment Steps

Add context from infrastructure-specific sources:

HTTP Step: Fetch metadata from cloud providers (AWS, GCP, Azure), service meshes (Istio, Linkerd), or custom CMDBs
Shell Script Step: Query internal documentation wikis, runbook repositories, or compliance databases

Creating an Agent Pipeline

Step 1: Define the Pipeline

Navigate to Pipelines in your Harness project.
Click Create Pipeline.
Name the pipeline descriptively (e.g., Investigate Canary Metrics, Query Internal Change DB).
Select Pipeline type.

Step 2: Add Investigation Stages

Add stages to fetch data, run analysis, and format results:

Example Stage: Query Internal Deployment API

- stage:
    name: Fetch Recent Deployments
    identifier: fetch_deployments
    type: Custom
    spec:
      execution:
        steps:
          - step:
              type: Http
              name: Query Deployment API
              identifier: query_deployment_api
              spec:
                url: https://internal-api.company.com/deployments
                method: GET
                headers:
                  - key: Authorization
                    value: Bearer <+secrets.getValue("deployment_api_token")>
                requestBody: |
                  {
                    "services": <+pipeline.variables.affected_services>,
                    "start_time": <+pipeline.variables.incident_start_time>,
                    "lookback_minutes": 30
                  }
                outputVariables:
                  - name: deployments
                    type: String
                    value: <+httpResponseBody>

Example Stage: Analyze Canary Metrics

- stage:
    name: Analyze Canary Metrics
    identifier: analyze_canary
    type: Custom
    spec:
      execution:
        steps:
          - step:
              type: ShellScript
              name: Check Canary Health
              identifier: check_canary
              spec:
                shell: Bash
                onDelegate: true
                source:
                  type: Inline
                  spec:
                    script: |
                      #!/bin/bash
                      # Parse deployment data
                      DEPLOYMENTS='<+execution.steps.query_deployment_api.
                        output.outputVariables.deployments>'
                      
                      # Query Prometheus for canary metrics
                      for deployment in $(echo $DEPLOYMENTS | jq -r '.[] | @base64'); do
                        SERVICE=$(echo $deployment | base64 -d | jq -r '.service')
                        DEPLOY_TIME=$(echo $deployment | base64 -d | jq -r '.timestamp')
                        
                        # Query p99 latency 10 minutes before and after deployment
                        BEFORE=$(curl -s "https://prometheus.company.com/api/v1/
                           query?query=p99_latency{service='$SERVICE'}[10m]&time=$DEPLOY_TIME")
                        AFTER=$(curl -s "https://prometheus.company.com/api/v1/query?
                           query=p99_latency{service='$SERVICE'}[10m]&time=$((DEPLOY_TIME + 600))")
                        
                        # Compare and output if latency increased > 50%
                        # ... analysis logic ...
                      done
                outputVariables:
                  - name: canary_results
                    type: String
                    value: <+execution.steps.check_canary.output.outputVariables.canary_results>

Step 3: Format Results

Add a final stage to format investigation results according to the output contract:

- stage:
    name: Format Investigation Results
    identifier: format_results
    type: Custom
    spec:
      execution:
        steps:
          - step:
              type: ShellScript
              name: Build Theory JSON
              identifier: build_theory
              spec:
                shell: Bash
                onDelegate: true
                source:
                  type: Inline
                  spec:
                    script: |
                      #!/bin/bash
                      cat <<EOF > /harness/output.json
                      {
                        "theories": [
                          {
                            "title": "Recent deployment to api-gateway",
                            "description": "Deployment api-gateway-v2.3.1 rolled out 
                               5 minutes before incident start. Canary metrics show 
                               p99 latency increased 
                               from 200ms to 1500ms.",
                            "confidence": 85,
                            "evidence": [
                              {
                                "type": "deployment",
                                "source": "Internal Deployment Tracker",
                                "timestamp": "<+pipeline.variables.incident_start_time>",
                                "details": "api-gateway-v2.3.1 deployed to production"
                              }
                            ]
                          }
                        ]
                      }
                      EOF

Step 4: Configure Triggers

Set up the pipeline to run automatically on incident creation:

Click the Triggers tab in your pipeline.
Click Add Trigger.
Select Incident Created as the trigger type.
Configure filter conditions (optional):
- Trigger only for specific severity levels
- Trigger only for specific incident types
- Trigger only for specific services
Save the trigger.

Connecting Custom Data Sources

Internal APIs

Use the HTTP Step to query internal APIs:

Authentication: Store API tokens in Harness Secrets and reference via <+secrets.getValue("token_name")>
Request body: Pass incident context as JSON payload
Response parsing: Use jq in Shell Script steps to parse JSON responses

Observability Tools

Query metrics, logs, and traces from observability platforms:

Prometheus: Query metrics using PromQL via HTTP API
Grafana: Fetch dashboard snapshots or panel data via Grafana API
Elasticsearch: Query logs using Elasticsearch Query DSL
Custom Observability: Call vendor-specific APIs (Honeycomb, Lightstep, etc.)

Internal Databases

Use the Shell Script Step to query databases:

PostgreSQL: Use psql CLI or connection libraries
MySQL: Use mysql CLI or connection libraries
MongoDB: Use mongosh or language-specific drivers
Redis: Use redis-cli or connection libraries

Cloud Provider APIs

Fetch infrastructure metadata:

AWS: Query EC2, ECS, Lambda metadata using AWS CLI or SDKs
GCP: Query GCE, GKE, Cloud Run metadata using gcloud CLI or SDKs
Azure: Query VM, AKS, Functions metadata using az CLI or SDKs

Viewing Investigation Results

In the Investigator Panel

When agent pipelines complete, their results appear in the Investigator panel alongside built-in RCA Change Agent theories:

Open the Incident Details page.
Click the Investigator tab (or similar panel).
View theories from multiple sources:
- RCA Change Agent: Deployments, pull requests, ServiceNow changes
- Custom Agent Pipelines: Results from your configured pipelines
Each theory displays:
- Title: Short summary
- Confidence score: 0-100
- Evidence: Supporting data points
- Source: Which agent or pipeline generated the theory

Pipeline Execution Status

Check which pipelines ran and their outputs:

Open the Incident Details page.
Click the Activity or Timeline tab.
Look for events indicating pipeline execution:
- "Investigator pipeline started: [pipeline name]"
- "Investigator pipeline completed: [pipeline name]"
- "Investigator pipeline failed: [pipeline name]"
Click the event to view detailed pipeline execution logs.

Use Cases

Query Internal Change Databases

Scenario: Your organization tracks changes in an internal database not covered by built-in integrations.

Solution: Create a pipeline that queries the change database for recent changes affecting incident-related services and returns them as theories.

Check Deployment Canary Metrics

Scenario: Deployments include canary analysis, but Harness AI SRE does not automatically check canary health.

Solution: Create a pipeline that fetches recent deployments, queries canary metrics from Prometheus, and flags deployments with degraded canary health.

Validate Feature Flag States

Scenario: Feature flags control critical behavior, and incidents may be caused by recent flag changes.

Solution: Create a pipeline that queries your feature flag platform (LaunchDarkly, Split, etc.) for flags changed near the incident start time.

Enrich with Vendor-Specific Data

Scenario: You use a vendor-specific tool (e.g., proprietary observability platform, custom CMDB) with no native Harness integration.

Solution: Create a pipeline that calls the vendor API, fetches relevant data, and includes it as investigation evidence.

Early Access Limitations

The current Early Access release supports the core end-to-end flow but has these limitations:

What Works Today

✅ Pipeline execution on incident creation
✅ Input/output contracts for investigation data
✅ Results display in Investigator panel
✅ Manual pipeline triggers
✅ Integration with Harness Secrets for authentication

Coming in Q2 2026

⏳ UI Pipeline Builder: Visual editor for building agent pipelines without YAML
⏳ Pipeline Marketplace: Pre-built pipelines for common integrations (AWS, GCP, Datadog, custom APIs)
⏳ Enhanced Error Handling: Retry logic, timeout configuration, partial result handling
⏳ Result Deduplication: Automatic deduplication when multiple pipelines return similar theories
⏳ Confidence Score Calibration: UI tools for tuning confidence score weights

Workarounds for Early Access

No UI builder: Write pipelines in YAML or use Harness Pipeline Studio
No marketplace: Copy example pipelines from this documentation and adapt to your infrastructure
Limited error handling: Implement retry logic and timeouts in pipeline stages manually
Manual deduplication: Design pipelines to avoid overlapping investigation scopes

Best Practices

For Pipeline Design

Keep pipelines focused: One pipeline per investigation type (e.g., separate pipelines for deployments vs. feature flags)
Set timeouts: Configure stage-level timeouts to prevent hanging investigations (recommend 5-10 minutes)
Use secrets management: Store API tokens and credentials in Harness Secrets, never hardcode
Log extensively: Output detailed logs in Shell Script steps for troubleshooting
Test before production: Run pipelines manually on test incidents before enabling automatic triggers

For Investigation Quality

Include evidence: Provide specific data points (timestamps, metric values, deployment IDs) in theory evidence
Calculate confidence accurately: Higher confidence for stronger correlations (e.g., deployment 2 minutes before incident start = high confidence)
Avoid false positives: Filter out unrelated changes (e.g., changes to unaffected services)
Surface actionable insights: Theory descriptions should guide responders toward next steps

For Performance

Parallelize data fetching: Use parallel stages to query multiple data sources simultaneously
Cache frequently accessed data: Store static data (service topology, configuration) in pipeline variables
Limit lookback windows: Query only relevant time ranges (e.g., 30 minutes before incident start)
Paginate large result sets: Fetch only the most recent N records to avoid overwhelming the Investigator panel

Troubleshooting

Pipeline executes but no results appear in Investigator panel

Possible causes:

Output JSON does not match the expected contract
Pipeline completed with errors
Results were empty (no theories generated)

Resolution:

Check pipeline execution logs for errors
Verify the output JSON matches the structure in the Output Contract section
Ensure at least one theory is returned in the theories array
Check the incident's Activity tab for pipeline execution events

Pipeline fails with authentication errors

Possible causes:

API token or secret is expired or incorrect
Secret reference syntax is wrong
Insufficient permissions on the external API

Resolution:

Verify the secret exists in Harness Secrets
Check the secret reference syntax: <+secrets.getValue("secret_name")>
Test API authentication outside the pipeline (e.g., using cURL)
Rotate the API token if expired

Pipeline times out or takes too long

Possible causes:

External API is slow or unresponsive
Query result set is too large
Missing stage-level timeout configuration

Resolution:

Set stage-level timeouts (recommend 5-10 minutes)
Optimize queries to fetch only necessary data
Implement pagination for large result sets
Consider caching frequently accessed data

Next Steps

Go to Use RCA Change Agent to learn about the built-in investigation capabilities.
Go to Use AI Scribe Agent to understand how the AI Scribe captures incident context used by agent pipelines.
Go to Create Runbooks to automate response actions based on investigation findings.

Feedback and Support

Investigator Agent Pipelines are in Early Access. Share feedback or request features:

Email: ai-sre-support@harness.io
Slack: #ai-sre-early-access (Harness Community Slack)
GitHub: harness/developer-hub (file feature requests)

Contact your Harness representative to enable this feature for your organization.

Overview​

What Are Agent Pipelines?​

Key Characteristics​

How It Works​

Pipeline Execution Flow​

Input Contract​

Output Contract​

Supported Step Types​

Data Fetch Steps​

Analysis Steps​

Enrichment Steps​

Creating an Agent Pipeline​

Step 1: Define the Pipeline​

Step 2: Add Investigation Stages​

Step 3: Format Results​

Step 4: Configure Triggers​

Connecting Custom Data Sources​

Internal APIs​

Observability Tools​

Internal Databases​

Cloud Provider APIs​

Viewing Investigation Results​

In the Investigator Panel​

Pipeline Execution Status​

Use Cases​

Query Internal Change Databases​

Check Deployment Canary Metrics​

Validate Feature Flag States​

Enrich with Vendor-Specific Data​

Early Access Limitations​

What Works Today​

Coming in Q2 2026​

Workarounds for Early Access​

Best Practices​

For Pipeline Design​

For Investigation Quality​

For Performance​

Troubleshooting​

Next Steps​

Feedback and Support​

Overview

What Are Agent Pipelines?

Key Characteristics

How It Works

Pipeline Execution Flow

Input Contract

Output Contract

Supported Step Types

Data Fetch Steps

Analysis Steps

Enrichment Steps

Creating an Agent Pipeline

Step 1: Define the Pipeline

Step 2: Add Investigation Stages

Step 3: Format Results

Step 4: Configure Triggers

Connecting Custom Data Sources

Internal APIs

Observability Tools

Internal Databases

Cloud Provider APIs

Viewing Investigation Results

In the Investigator Panel

Pipeline Execution Status

Use Cases

Query Internal Change Databases

Check Deployment Canary Metrics

Validate Feature Flag States

Enrich with Vendor-Specific Data

Early Access Limitations

What Works Today

Coming in Q2 2026

Workarounds for Early Access

Best Practices

For Pipeline Design

For Investigation Quality

For Performance

Troubleshooting

Next Steps

Feedback and Support