Root Cause Analysis

Overview

When an alert fires, Annie automatically correlates infrastructure changes, monitoring data, and dependencies to identify the root cause and suggest actionable fixes—no more jumping between AWS console, Datadog, and Terraform files.

How to Start

In your Slack incident channel, run:

/register_annie_on_call @your_bot_name

Example: /register_annie_on_call @Datadog

Wait for an alert

When an alert fires in your channel, Annie automatically picks it up and starts the investigation.

Get root cause and fix

Annie analyzes the alert, correlates with your infrastructure, and provides the root cause with actionable remediation steps.

Want to customize how Annie responds to specific alerts? See Annie Instructions.

Incident Sources

Annie can receive incidents from:

PagerDuty

Automatic RCA when incidents are created. Results posted as incident notes.

Incident.io

Webhook integration for automatic RCA. Results posted as comments.

Slack

Mention @Annie with incident details for on-demand investigation.

MCP Tools

Trigger RCA from your IDE during development.

What Annie Provides

When Annie completes an RCA, you receive:

Executive Summary

A concise summary suitable for stakeholder communication:

“The checkout API latency spike was caused by DynamoDB read throttling after a 5x traffic increase from the marketing campaign. Immediate mitigation: increase read capacity to 500 RCU.”

Timeline of Events

Chronological sequence leading to the incident:

09:55 - Marketing campaign email sent
10:02 - Traffic increases 5x
10:05 - DynamoDB throttling begins
10:07 - P99 latency exceeds threshold
10:08 - Alert fires

Root Cause Details

Technical details with evidence from your systems:

What happened
Why it happened
Supporting evidence from logs, metrics, and configuration history

Affected Resources

List of impacted infrastructure with the specific impact on each.

Remediation Steps

Actionable fixes organized by urgency:

Immediate: Resolve the incident now
Short-term: Prevent recurrence this sprint
Long-term: Systemic improvements

Key Benefits

Reduce MTTR

Cut mean time to resolution from hours to minutes by automating the investigation.

Less On-Call Stress

Engineers get root cause and fix suggestions immediately instead of scrambling through dashboards.

Consistent Investigation

Every incident gets the same thorough analysis, regardless of who’s on call.

Actionable Fixes

Annie provides specific commands and code changes, not just diagnoses.

Real-World Examples

Database Connection Failures

“RDS connection timeout on prod-api service”

Root Cause: Security group sg-prod-db was modified at 14:32, removing the inbound rule for the application subnet (10.0.1.0/24). Evidence:

Security group change detected 15 minutes before alert

No changes to RDS instance itself

Application logs show “connection refused” starting at 14:35

Kubernetes Pods CrashLooping

“Pod restarts exceeding threshold for payment-service”

Root Cause: Deployment payment-service:v2.3.0 was deployed 1 hour ago and has a memory leak. Pods are being OOMKilled. Evidence:

New image deployed at 10:00

Memory usage increased from ~300Mi to 600Mi under load

Pod memory limit is 512Mi

OOMKilled events in Kubernetes

API Latency Spike

“P99 latency > 2s on checkout API”

Root Cause: DynamoDB table checkout-sessions is throttling due to exceeded read capacity. A marketing campaign at 10:00 AM increased traffic 5x. Evidence:

Traffic increased from 100 req/s to 500 req/s at 10:00

DynamoDB throttled requests spiked at 10:05

Provisioned RCU (100) is insufficient

Get Started

Create Account

Request Demo

See RCA in action

Get Started

Product

Privacy & Security

Overview

How to Start

Incident Sources

PagerDuty

Incident.io

Slack

MCP Tools

What Annie Provides

Key Benefits

Reduce MTTR

Less On-Call Stress

Consistent Investigation

Actionable Fixes

Real-World Examples

Get Started

Create Account

Request Demo

Get Started

Product

Privacy & Security

​Overview

​How to Start

​Incident Sources

PagerDuty

Incident.io

Slack

MCP Tools

​What Annie Provides

​Key Benefits

Reduce MTTR

Less On-Call Stress

Consistent Investigation

Actionable Fixes

​Real-World Examples

​Get Started

Create Account

Request Demo

Overview

How to Start

Incident Sources

What Annie Provides

Key Benefits

Real-World Examples

Get Started