Datadog alert deduplication, correlation, and LLM RCA

Saneops ingests Datadog monitor webhooks via the standard @webhook-saneops mention, correlates duplicate firings, deduplicates flapping monitors, and drafts a first-pass root-cause summary using your own LLM key.

Saneops ingests Datadog monitor webhooks, correlates duplicate firings across services, deduplicates flapping monitors, and drafts a first-pass RCA using your own LLM key. Result: a fraction of the pages, each one already enriched with the failing service, common labels across alerts, and a starting hypothesis your on-call can verify or reject.

Why Datadog alone isn't enough

Datadog's monitors are excellent at detecting a metric crossing a threshold. They are deliberately not in the business of correlating — every monitor fires independently, every firing is a separate notification. A single bad Postgres replica can fan out to 30+ Datadog notifications spanning every service that talks to it. Saneops adds the missing layer: cluster the 30 firings into one incident, deduplicate the redundant ones, and page once.

How the Datadog → Saneops pipeline works

Setup

1. Create a Saneops tenant

Sign up at app.saneops.in/signup. Three fields, no card.

2. Add a Datadog webhook

In Datadog: Integrations → Webhooks → New Webhook. Name it saneops, paste your tenant URL, leave the default POST payload (Saneops parses the standard Datadog format).

URL: https://app.saneops.in/webhooks/datadog/<your-token>
Name: saneops
Payload: leave default — Saneops parses Datadog's standard JSON

3. Reference the webhook from monitors

In any monitor's notification message, add @webhook-saneops on its own line. Bulk-edit your existing monitors via the Datadog API to add this — the migration is non-destructive (you keep your existing PagerDuty/Slack notifications).

4. Verify

Force a test notification (Test Notifications button). The Saneops Webhook Inspector shows the exact payload received; the corresponding incident appears in the Incidents view.

Tag-driven correlation tuning

Saneops correlation reads the same Datadog tags you already use. A few that make a big difference:

Frequently asked questions

Does Saneops replace Datadog?
No. Datadog is your observability platform — metrics, traces, logs, dashboards. Saneops sits at the alert layer, downstream of Datadog monitors, replacing the 1:1 monitor → page model with a correlated incident model.
Does this work with Datadog Synthetics, APM error tracking, log monitors?
Yes. Anything that emits a Datadog event with a webhook destination works — synthetic test failures, APM error rate breaches, log search monitors, all routed through the same @webhook-saneops mention.
Where does the LLM call go?
To whichever provider you configure — Anthropic, OpenAI, OpenAI-compatible (Together, Mistral, Groq), Gemini, DeepSeek, Grok, or self-hosted Ollama. BYOK; Saneops stores your key encrypted via Fernet and never logs it.
Will Saneops affect my Datadog billing?
No. Saneops consumes Datadog webhooks, which are free at any Datadog tier. Your monitor count, host count, and event ingestion stay unchanged.
Can I bulk-add @webhook-saneops to existing monitors?
Yes. Use Datadog's bulk-edit API or the Datadog Terraform provider to append the @webhook-saneops mention to every monitor's notification message. Migration is non-destructive — existing PagerDuty/Slack notifications continue to fire.

Try Saneops free

1,000 alerts/month, no credit card. Self-host the Docker image or use our cloud. BYOK LLM.