AI SRE Onboarding Guide for Incident Responders
This guide walks you through the essentials of using Harness AI SRE as a responder or engineer.
You'll learn how to navigate the dashboard, respond to incidents, collaborate with your team, and leverage runbooks and AI-powered tools to resolve issues faster.
Your administrator has already configured the integrations and incident types — this guide focuses on what you need to know to be effective as an incident responder from day one.
Prerequisites
Before getting started, confirm the following with your administrator:
| Item | Details |
|---|---|
| Harness account access | You have been added to your organization's Harness account with appropriate permissions |
| Slack / Teams connected | The Harness AI SRE bot is installed in your team's Slack workspace or Microsoft Teams environment |
| Monitoring tools configured | Your organization's monitoring tools (Datadog, New Relic, Grafana, etc.) are already integrated |
| On-call schedule (if applicable) | You've been added to your team's on-call rotation in PagerDuty, OpsGenie, or a similar tool |
If your organization hasn't configured AI SRE yet, share the Administrator Onboarding Guide with your platform team to get started.
1. Explore the AI SRE dashboard
- Interactive Guide
- Step by Step
Get familiar with the dashboard layout, active incidents, alerts, and key metrics at a glance.
The AI SRE dashboard is your central hub for situational awareness during on-call shifts and day-to-day operations.
- Log in to your Harness account.
- Navigate to AI SRE from the left navigation panel.
- On the dashboard, review:
- Active Incidents — Any ongoing incidents that need attention.
- Recent Alerts — The latest alerts ingested from your monitoring tools.
- Metrics & Trends — Key reliability metrics like MTTR (Mean Time to Resolve) and incident volume.
- Use the filters at the top to narrow by incident type, severity, status, or assigned team.
Bookmark the AI SRE dashboard for quick access during on-call shifts. The active incidents panel updates in real time so you always know the current state of your services.
2. Respond to an incident
- Interactive Guide
- Step by Step
Learn how to acknowledge, triage, and begin working on an incident when you're paged or alerted.
When an incident is created — either automatically from a monitoring alert or manually by a teammate — here's how to respond.
- You'll receive a notification via Slack, Microsoft Teams, or your on-call tool (PagerDuty, OpsGenie, etc.).
- Click the notification link to open the incident detail page in Harness.
- Review the incident summary:
- Severity and incident type — Understand the scope and priority.
- Timeline — See the sequence of alerts and events that triggered the incident.
- Related alerts — View correlated monitoring data and affected services.
- Acknowledge the incident to let your team know you're on it.
- Update the status as you work through it (e.g., Investigating → Identified → Monitoring → Resolved).
- Use the incident channel (auto-created in Slack or Teams) to collaborate with other responders in real time.
- Add notes and updates directly in the incident timeline to maintain a clear record of actions taken.
You can manage incidents without leaving Slack. Use /harness slash commands to acknowledge, update status, add notes, and more. See Managing Incidents in Slack for the full command reference.
3. Create an incident manually
- Interactive Guide
- Step by Step
Sometimes you'll spot an issue before automated monitoring catches it. Learn how to declare an incident manually.
Not every incident starts from an automated alert. If you notice a problem — customer reports, degraded performance you've observed, or a teammate flagging something — you can create an incident manually.
- Navigate to Incidents from the left panel.
- Click Create Incident.
- Select the appropriate Incident Type from the dropdown (your admin has configured these).
- Fill in the incident details:
- Title — A clear, concise summary (e.g., "Elevated error rates on checkout API").
- Severity — Choose the appropriate level based on impact.
- Description — Provide context: what you're observing, when it started, and any initial hypotheses.
- Fill in any additional required fields or custom fields specific to your incident type.
- Click Create.
- An incident channel will be automatically created in your communication tool, and relevant team members will be notified based on the incident type's configuration.
You can also create incidents directly from Slack using the /harness create command. This is especially useful during on-call when you want to stay in your communication tool.
4. Use runbooks during an incident
- Interactive Guide
- Step by Step
Runbooks guide you through predefined response steps and can automate common actions during an incident.
Runbooks are predefined playbooks that guide you through incident response. Some runbooks run automatically when certain conditions are met; others can be triggered manually.
- Open the incident detail page for an active incident.
- Navigate to the Runbooks tab within the incident.
- You'll see any runbooks that have been auto-attached based on the incident type and trigger conditions.
- To manually attach a runbook:
- Click Add Runbook.
- Search for or browse available runbooks.
- Select the appropriate runbook and confirm.
- Execute the runbook step by step:
- Each action in the runbook will be displayed in order.
- Some steps may be automated (e.g., restarting a service, scaling infrastructure) — these will run and report their results.
- Other steps may be manual — follow the instructions provided and mark each step complete as you go.
- Runbook execution progress is logged in the incident timeline for full visibility.
If you're unsure which runbook applies, check the incident type — your administrator has likely associated recommended runbooks with each type. You can also browse all available runbooks under Runbooks in the left navigation.
5. Use the AI Scribe Agent
The AI Scribe Agent works alongside you during incidents to reduce manual overhead and improve post-incident learning.
- Automatic Summaries — The AI Scribe monitors your incident channel conversations and generates real-time summaries of key decisions, actions, and findings.
- Timeline Generation — It constructs a structured timeline of the incident based on channel activity, status changes, and runbook execution.
- Post-Incident Reports — After resolution, the AI Scribe drafts a post-incident report pulling from the incident timeline, channel discussions, and metadata — giving you a head start on your retrospective.
To access AI Scribe outputs, navigate to the incident detail page and look for the AI Summary and Timeline sections.
See the full AI Scribe Agent documentation for details on how AI-powered documentation works and how to get the most out of it.
Next steps
You're now equipped to respond to incidents effectively with Harness AI SRE. To deepen your skills and get even more out of the platform, explore:
- Slack Commands Reference: Master the full set of slash commands for managing incidents directly from Slack.
- Understanding Incident Types: Learn how your organization's incident types map to severity levels, responder teams, and escalation paths.
- Browsing Runbooks: Explore the runbook library to understand the automated playbooks available to you.
- Integration Overview: See which monitoring, communication, and ITSM tools are connected to your AI SRE environment.
- AI Scribe Agent: Dive deeper into AI-powered incident documentation and insights.