Skip to main content

AI SRE onboarding guide

This guide introduces you to the powerful capabilities of Harness AI SRE, providing a comprehensive approach to proactively managing and resolving incidents with real-time insights, alerts, and seamless integration. When you configure AI SRE in Harness, we orchestrate intelligent incident detection, automated response workflows, and collaborative resolution processes across your monitoring and communication tools.

Prerequisites

Before beginning the walkthroughs in this guide, ensure you have:

ItemDetails / Link
Harness accountAI SRE Feature flag enabled (contact your sales representative or reach out to the team at ai-sre-support@harness.io)
Monitoring toolsIntegration with monitoring systems like Datadog, New Relic, or Grafana
Communication platformsSlack, Microsoft Teams, or Zoom for incident collaboration
On-call managementPagerDuty, OpsGenie, or similar on-call scheduling tools (optional)
supported tools & platforms

Go to What's supported with Harness AI SRE for a full list of supported monitoring & observability tools, communication & collaboration platforms, and on-call & escalation management tools.

1. Integrate your collaboration and monitoring tools

Use connectors to integrate with Teams, Slack, ServiceNow, and other monitoring tools for real-time incident alerts.

2. Set up your incident types

Define incident types to standardize severity levels, responders, and escalation paths.

3. Configure your first webhook

Send events from external tools, like alerts, builds, deployments, and config changes. Categorize them to track and respond effectively.

4. Create your first runbook

Automate response actions and guide responders step-by-step during incidents.

Next steps

This guide introduced you to the core functionalities and setup of Harness AI SRE, from integrating monitoring tools to creating automated runbooks. To enhance your incident response capabilities and team efficiency, get the most out of Harness AI SRE's advanced features, including:

  • Advanced Runbooks: Build sophisticated automation workflows with multiple actions, triggers, and conditional logic.
  • Integration Library: Connect with ServiceNow, Jira, and other ITSM tools for seamless incident management workflows.
  • AI Scribe Agent: Leverage AI-powered documentation and insights to capture incident communications automatically.