Grafana Alert Integration: Cut Noise ~80%

Saneops ingests Grafana Alerting webhooks, correlates related firings into a single incident, deduplicates flapping alerts, and drafts a first-pass root-cause analysis (RCA) using your own LLM API key. The result: roughly 80% fewer pages reaching humans, with each remaining page already enriched with context.

How the Grafana → Saneops pipeline works

Webhook receiver: Each Saneops tenant has a unique inbound URL of the form https://app.saneops.in/webhooks/grafana/<tenant-token>. Add it as a Contact Point in Grafana Alerting under Alerts & IRM → Contact points → Webhook.
Payload normalisation: Saneops parses the standard Alertmanager-format payload Grafana sends — status, labels, annotations, fingerprint, startsAt, endsAt. Both firing and resolved transitions are honoured.
Correlation: Two alerts within a configurable time window (default 10 minutes) sharing strong labels (service, namespace, cluster, deployment, job, app, pod) cluster into one incident. Semantic similarity over the alert text adds a fallback signal.
Deduplication: Identical fingerprints arriving in close succession (flapping) are merged. Saneops counts the dedup hits but doesn't page on them.
Auto-resolve: If Grafana sends status: resolved (requires send_resolved: true on the receiver), Saneops closes the incident automatically. As a safety net, idle incidents are also auto-closed after 24 hours by default.
LLM RCA: Once an incident reaches a configurable severity threshold, Saneops asks your tenant's LLM (Anthropic, OpenAI, Gemini, Grok, DeepSeek, or local Ollama) for a first-draft root-cause summary. The full prompt is auditable.

Setup — five minutes end-to-end

1. Create a Saneops tenant

Sign up at app.saneops.in/signup. The signup form gives you a slug, an admin user, and a tenant-scoped API key on submit. No credit card.

2. Copy your Grafana webhook URL

In the Saneops UI: Integrations → Grafana → Connect. The page displays your tenant's exact webhook URL plus a one-paste sample payload you can curl to verify connectivity before wiring Grafana.

3. Add the contact point in Grafana

Grafana > Alerts & IRM > Contact points > New contact point
Name: Saneops
Integration: Webhook
URL: https://app.saneops.in/webhooks/grafana/<your-token>
HTTP Method: POST
Max alerts: 0 (no truncation)
Disable resolved messages: NO  ← important; without this, Saneops can't auto-close

4. Route alert rules to it

Create or edit a notification policy that points to the Saneops contact point. You can route everything, or scope by labels — Saneops handles either.

5. Verify

Trigger a test alert from Grafana (Test button on the contact point). Within seconds the Saneops Webhook inspector shows the raw payload, and a corresponding incident appears under Incidents.

What you get back from Saneops

One incident per real outage — not one per Grafana alert rule firing.
Common-label distillation — the labels every clustered alert agrees on, surfaced at the incident level.
Service rollup — incidents tagged with the affected services so blast-radius is obvious.
LLM-drafted RCA — a 3-bullet first-draft summary at the top of every incident with severity ≥ high.
Outbound notifications — fan back out to Slack, Microsoft Teams, email, PagerDuty Events v2, OpsGenie, Zenduty, or generic webhook only when an incident actually warrants paging a human.
Workflow automation — runbook DAGs that fire on incident events: auto-acknowledge low severity, post Slack messages with templated context, page on-call only when severity ≥ critical, etc.

Tip: Even with Saneops in front, keep send_resolved: true on your Grafana receiver. Without it, Saneops has to fall back to the 24-hour idle-resolve sweep — incidents stay visible longer than they should.

Frequently asked questions

Does Saneops replace Grafana Alerting?

No. Grafana stays your alert source — it owns metric thresholds and rule evaluation. Saneops sits downstream, eating the Grafana webhook firehose and turning it into a smaller stream of correlated incidents.

Will my Grafana data leave my network?

Only the alert webhook payload Grafana already sends. Your raw metrics and dashboards stay where they are. If even the alert payload is too sensitive for SaaS, self-host Saneops via Docker — the same code runs on your own cluster.

Does it handle Grafana OnCall?

Yes. Grafana OnCall webhooks use the same Alertmanager-format envelope; point an OnCall outbound webhook at the Saneops Grafana endpoint and it works without configuration changes.

What about the alert volume my tenant sees?

Free tier covers 1,000 alerts/month per tenant. Beta-cohort design partners get a higher cap during the 60-day evaluation. Paid tiers will be announced before beta close.

What if my Alertmanager doesn't send `send_resolved`?

Saneops auto-closes idle incidents after 24 hours by default (configurable per tenant). Source-side resolves are honoured immediately when sent.

Cut Grafana alert noise ~80% with Saneops