AI SRE onboarding guide
This guide introduces you to the powerful capabilities of Harness AI SRE, providing a comprehensive approach to proactively managing and resolving incidents with real-time insights, alerts, and seamless integration. When you configure AI SRE in Harness, we orchestrate intelligent incident detection, automated response workflows, and collaborative resolution processes across your monitoring and communication tools.
Prerequisites
Before beginning the walkthroughs in this guide, ensure you have:
| Item | Details / Link |
|---|---|
| Harness account | AI SRE Feature flag enabled (contact your sales representative or reach out to the team at ai-sre-support@harness.io) |
| Monitoring tools | Integration with monitoring systems like Datadog, New Relic, or Grafana |
| Communication platforms | Slack, Microsoft Teams, or Zoom for incident collaboration |
| On-call management | PagerDuty, OpsGenie, or similar on-call scheduling tools (optional) |
Go to What's supported with Harness AI SRE for a full list of supported monitoring & observability tools, communication & collaboration platforms, and on-call & escalation management tools.
1. Integrate your collaboration and monitoring tools
- Interactive Guide
- Step by Step
Use connectors to integrate with Teams, Slack, ServiceNow, and other monitoring tools for real-time incident alerts.
AI SRE works best when integrated with your existing monitoring and collaboration tools. This enables real-time incident detection and seamless team coordination.
For detailed integration guides, refer to AI SRE Integrations documentation.
-
Navigate to Organization Settings in the Left Panel.
-
Head over to Third Party Integrations (AI SRE).
-
By default you will see some connectors. Click on Connect.
-
Sign into SSO or whatever authentication method is required.
-
For Slack, select the required Workspace from your list of workspaces.
-
Click on Install Harness AI SRE.
-
For monitoring tool integrations (Datadog, New Relic, Grafana, etc.):
- Name - A descriptive name for the integration
- Webhook URL - Copy the provided webhook URL to your monitoring tool
- Authentication - Configure API keys or tokens as required
-
Set up additional communication integrations:
- Microsoft Teams - Configure the Teams connector
- Zoom - Set up meeting automation for incident bridges
Start with your primary monitoring tool and main communication channel. You can add more integrations later as needed.
2. Set up your incident types
- Interactive Guide
- Step by Step
Define incident types to standardize severity levels, responders, and escalation paths.
Define Incident Types for your teams to standardize your response process by defining severity levels, response teams, and escalation procedures.
-
Navigate to Incidents.
-
Click on Incident Types.
-
Click on Create Incident Type.
-
Fill details in the form with the incident type information.
-
Click on Save.
-
Once Incident type is created, you can configure the incident fields:
- Check the Default Fields and Custom fields, and update them as per the requirements.
- Click on the edit icon to check the default fields.
- You can set Optional fields as Required.
- Click on Save.
-
Click on Add Custom Field to add any extra fields as part of Incident creation form:
- Fill in the details of the additional field and hit Save.
-
Click on Creation Form:
- By default, you will have a creation form for the selected fields from the left pane.
- Click on the checkbox to add any more fields to the form as per your requirements.
-
Test your incident type:
- Fill in the details of the incident in text fields.
- Type or fill from the dropdown options.
- Click on Create to make a new incident of the new incident type.
- Hit Save from the top right.
-
Additionally, you can add runbooks to your incident type for automated response workflows.
3. Configure your first webhook
- Interactive Guide
- Step by Step
Send events from external tools, like alerts, builds, deployments, and config changes. Categorize them to track and respond effectively.
Webhooks enable external tools to automatically create alerts and incidents in AI SRE.
-
Click on Integrations.
-
Click on New Integration.
-
Fill in the details for the Webhook:
- You can select the type - Incident, Alert, Deployment, Build
- Select the Template type from the dropdown list
- Click on Save
-
Once the integration is saved, you will receive a URL that you can configure on the application with which you want the integration to happen. This step varies from tool to tool, and can be checked in the documentation of those applications.
-
Next, Click on Payload Configuration:
- You will get the default values of the payload configuration for the template you have selected.
- You can wish to add more data from the configuration by clicking on the checkbox and extracting it.
- Click on Next on the bottom of the page.
-
You will be able to view the Mapped Fields which you have selected in the previous step:
- You can fill in the values on the mapped fields simply by dragging and dropping from the saved fields pane.
-
You can also add any custom fields that you want in your integration payload:
- Just simply scroll down and select Add Field.
- Fill the details of the custom field and hit Save.
- Drag and drop the values from the saved fields to the custom field placeholder added. You can choose to manually add it or use the Data picker too.
- Click on Next.
-
You can now test the integration with the cURL command, the POST request contains the endpoint URL which will be used for the integration.
-
Finally, click on Save on the top right, the integration is ready.
4. Create your first runbook
- Interactive Guide
- Step by Step
Automate response actions and guide responders step-by-step during incidents.
Runbooks automate incident response actions and provide step-by-step guidance for responders.
-
Navigate to Runbooks from the left pane.
-
Click on New Runbook.
-
Fill the creation form with the details.
- Fill in the runbook details
- Click on Save
-
Click on New Action.
-
Select the action from the categories you want to add in your runbook. The actions have been classified into different categories based on the use case.
-
Select the action:
- Select the action
- Click on Select
-
Fill the details for the selected action.
-
Click on Save.
-
Once saved, you can add more actions:
- Click on New Action
- Then, Select Action and repeat the steps and save.
-
Additionally, you can add triggers for your runbooks. Click on Triggers (This step is optional).
-
Click on New Trigger.
-
Select the Incident type from the dropdown with which you want to attach the runbook.
-
Define the condition of the trigger.
-
Click on Save from the top right.
Next steps
This guide introduced you to the core functionalities and setup of Harness AI SRE, from integrating monitoring tools to creating automated runbooks. To enhance your incident response capabilities and team efficiency, get the most out of Harness AI SRE's advanced features, including:
- Advanced Runbooks: Build sophisticated automation workflows with multiple actions, triggers, and conditional logic.
- Integration Library: Connect with ServiceNow, Jira, and other ITSM tools for seamless incident management workflows.
- AI Scribe Agent: Leverage AI-powered documentation and insights to capture incident communications automatically.