Saneops ingests Datadog monitor webhooks, correlates duplicate firings across services, deduplicates flapping monitors, and drafts a first-pass RCA using your own LLM key. Result: a fraction of the pages, each one already enriched with the failing service, common labels across alerts, and a starting hypothesis your on-call can verify or reject.
Why Datadog alone isn't enough
Datadog's monitors are excellent at detecting a metric crossing a threshold. They are deliberately not in the business of correlating — every monitor fires independently, every firing is a separate notification. A single bad Postgres replica can fan out to 30+ Datadog notifications spanning every service that talks to it. Saneops adds the missing layer: cluster the 30 firings into one incident, deduplicate the redundant ones, and page once.
How the Datadog → Saneops pipeline works
- Webhook receiver: Each Saneops tenant has a unique inbound URL:
https://app.saneops.in/webhooks/datadog/<tenant-token>. - Payload parsing: The Datadog webhook integration sends a JSON body with monitor name, alert title, transition (
Triggered/Recovered/No Data), tags, scoped host or service, and event message. Saneops normalises this into the sameNormalizedAlertshape it uses for every source. - Severity mapping: Datadog priority (P1–P5) maps cleanly to Saneops severity (critical/high/warning/info/low). You can override per-monitor via tags.
- Correlation by tag: Datadog tags become Saneops labels. Strong-label matches on
service,env,cluster,kube_namespace,hostdrive the correlation decision. - Auto-resolve: Datadog's
RecoveredandNo Datatransitions close the corresponding Saneops incident. Idle incidents auto-close after 24 hours regardless. - LLM RCA: The drafted RCA references the failing tags, recent transition history, and any related alerts in the same cluster — surfacing a likely cause your on-call can verify in 30 seconds.
Setup
1. Create a Saneops tenant
Sign up at app.saneops.in/signup. Three fields, no card.
2. Add a Datadog webhook
In Datadog: Integrations → Webhooks → New Webhook. Name it saneops, paste your tenant URL, leave the default POST payload (Saneops parses the standard Datadog format).
URL: https://app.saneops.in/webhooks/datadog/<your-token>
Name: saneops
Payload: leave default — Saneops parses Datadog's standard JSON3. Reference the webhook from monitors
In any monitor's notification message, add @webhook-saneops on its own line. Bulk-edit your existing monitors via the Datadog API to add this — the migration is non-destructive (you keep your existing PagerDuty/Slack notifications).
4. Verify
Force a test notification (Test Notifications button). The Saneops Webhook Inspector shows the exact payload received; the corresponding incident appears in the Incidents view.
Tag-driven correlation tuning
Saneops correlation reads the same Datadog tags you already use. A few that make a big difference:
service:<name>— tightest correlation signal. Two firings on the same service in the same window cluster.env:prod— keeps prod and staging incidents separate even when the service tag matches.kube_cluster_name:<cluster>,kube_namespace:<ns>— multi-tenant Kubernetes clusters cluster correctly.team:<name>— pairs nicely with Saneops notification rules so the right Slack channel is paged.
FAQ
Does Saneops replace Datadog?
No. Datadog is your observability platform — metrics, traces, logs, dashboards. Saneops sits at the alert layer, downstream of Datadog monitors, replacing the 1:1 monitor → page model with a correlated incident model.
Does this work with Datadog Synthetics, APM error tracking, log monitors?
Yes. Anything that emits a Datadog event with a webhook destination works — synthetic test failures, APM error rate breaches, log search monitors, all routed through the same @webhook-saneops mention.
Where does the LLM call go?
To whichever provider you configure — Anthropic, OpenAI, OpenAI-compatible (Together, Mistral, Groq), Gemini, DeepSeek, Grok, or self-hosted Ollama. BYOK; Saneops stores your key encrypted via Fernet and never logs it.