Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
18 commits
Select commit Hold shift + click to select a range
4c8b6da
docs(observe-two): rewrite pages 1-7 with verified screenshots
abhijaisrivastava15 May 6, 2026
623d9ad
docs(observe-two): rewrite voice.mdx with verified screenshots
abhijaisrivastava15 May 6, 2026
852fa89
docs(observe-two): note LiveKit dashboard attribute limitation (TH-4660)
abhijaisrivastava15 May 7, 2026
626e963
docs(observe): replace old Observability docs with the rewritten ones
abhijaisrivastava15 May 7, 2026
defb68d
docs(observe): document LiveKit attribute support in Dashboards (TH-4…
abhijaisrivastava15 May 7, 2026
a00706c
docs(observe): swap light-mode hero image on Overview for dark-mode
abhijaisrivastava15 May 7, 2026
0aeaa44
docs(observe): address Suhani's review — zoom modal screenshots and a…
abhijaisrivastava15 May 7, 2026
2d6f421
docs(observe): remove Charts page
abhijaisrivastava15 May 11, 2026
528f815
merge: bring dev into docs/observe-two-rewrite
abhijaisrivastava15 May 11, 2026
af716a6
docs(observe): playbook alignment + rendering fixes + accuracy pass
abhijaisrivastava15 May 27, 2026
f78876d
docs(tracing): fix span nesting diagram on Spans concept page
abhijaisrivastava15 May 27, 2026
7fa53ab
fix(docs): center rendered Mermaid diagrams in their container
abhijaisrivastava15 May 27, 2026
dc6bd11
docs(observe): lazy-load non-hero images + trim duplicate sentences
abhijaisrivastava15 May 27, 2026
7521b10
docs(tracing): fix retriever nesting in the Traces concept diagram
abhijaisrivastava15 May 27, 2026
0cf7cb2
docs(observe): correct eval row-limit + voice providers from backend …
abhijaisrivastava15 May 27, 2026
fe613da
docs(observe): complete voice column list (add Cost)
abhijaisrivastava15 May 27, 2026
cdad123
docs(observe): address review feedback on concepts + overview
abhijaisrivastava15 May 28, 2026
a160547
docs(observe): document eval levels (span/trace/session) + context
abhijaisrivastava15 May 29, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Binary file removed public/images/docs/observe/1.png
Binary file not shown.
Binary file removed public/images/docs/observe/2.png
Binary file not shown.
Binary file removed public/images/docs/observe/3.png
Binary file not shown.
Binary file removed public/images/docs/observe/4.png
Binary file not shown.
Binary file removed public/images/docs/observe/5.png
Binary file not shown.
Binary file removed public/images/docs/observe/5.webp
Binary file not shown.
Binary file added public/images/docs/observe/alerts-create.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added public/images/docs/observe/alerts-create.webp
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added public/images/docs/observe/alerts-overview.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added public/images/docs/observe/alerts-overview.webp
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added public/images/docs/observe/evals-create.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added public/images/docs/observe/evals-create.webp
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added public/images/docs/observe/evals-overview.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added public/images/docs/observe/evals-overview.webp
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added public/images/docs/observe/sessions-detail.png
Binary file added public/images/docs/observe/sessions-display.png
Binary file added public/images/docs/observe/sessions-filter.png
Binary file added public/images/docs/observe/sessions-overview.png
Binary file added public/images/docs/observe/users-date-range.png
Binary file added public/images/docs/observe/users-detail.png
Binary file added public/images/docs/observe/users-detail.webp
Binary file added public/images/docs/observe/users-display.png
Binary file added public/images/docs/observe/users-filter.png
Binary file added public/images/docs/observe/users-overview.png
Binary file added public/images/docs/observe/users-overview.webp
Binary file added public/images/docs/observe/voice-call-detail.png
Binary file added public/images/docs/observe/voice-create-form.png
Binary file added public/images/docs/observe/voice-create-form.webp
5 changes: 5 additions & 0 deletions src/components/docs/CodeGroup.astro
Original file line number Diff line number Diff line change
Expand Up @@ -40,6 +40,11 @@ const id = `code-group-${Math.random().toString(36).slice(2, 9)}`;
</div>

<style is:global>
/* Before JS initializes, hide all but the first code block so a multi-language
group doesn't flash every language's code stacked together (FOUC). */
[data-code-group]:not([data-cgp-init]) .code-panels > *:not(:first-child) {
display: none;
}
/* Hide all code panel items; JS adds .cgp-active to show the selected one */
.code-group-panel {
display: none;
Expand Down
38 changes: 38 additions & 0 deletions src/components/docs/Mermaid.astro
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
---
/**
* Mermaid diagram, rendered client-side from CDN.
* Usage in MDX: <Mermaid chart={`flowchart LR\n A --> B`} />
*
* The script is `is:inline` so Astro does not bundle it — the CDN module import
* then runs natively in the browser (matching how the rest of this repo loads
* third-party client scripts). The diagram source lives in the page, per the
* docs playbook. In the full monorepo, `pnpm add mermaid` and swap the CDN
* import for `import mermaid from 'mermaid'`.
*/
interface Props {
chart: string;
}
const { chart } = Astro.props;
---

<pre class="mermaid not-prose" style="background: transparent; text-align: center;">{chart}</pre>

<style is:global>
/* Center the rendered diagram inside its container. Mermaid outputs a
fixed-width block <svg>, which would otherwise sit flush left. */
pre.mermaid > svg {
display: block;
margin-inline: auto;
}
</style>

<script is:inline type="module">
import mermaid from 'https://cdn.jsdelivr.net/npm/mermaid@11/dist/mermaid.esm.min.mjs';
mermaid.initialize({ startOnLoad: false, theme: 'dark', securityLevel: 'loose' });
const run = () => {
try { mermaid.run({ querySelector: 'pre.mermaid:not([data-processed="true"])' }); }
catch (err) { console.error('Mermaid render failed:', err); }
};
run();
document.addEventListener('astro:after-swap', run);
</script>
27 changes: 23 additions & 4 deletions src/lib/navigation.ts
Original file line number Diff line number Diff line change
Expand Up @@ -348,7 +348,7 @@ export const tabNavigation: NavTab[] = [
]
},
{
group: 'Observability',
group: 'traceAI',
icon: 'eye',
items: [
{ title: 'Overview', href: '/docs/observe' },
Expand All @@ -358,6 +358,7 @@ export const tabNavigation: NavTab[] = [
{ title: 'Understanding Observability', href: '/docs/tracing/concepts' },
{ title: 'What are Traces?', href: '/docs/tracing/concepts/traces' },
{ title: 'What are Spans?', href: '/docs/tracing/concepts/spans' },
{ title: 'Sessions and Users', href: '/docs/tracing/concepts/sessions-and-users' },
{ title: 'What is OpenTelemetry?', href: '/docs/tracing/concepts/otel' },
{ title: 'What is traceAI?', href: '/docs/tracing/concepts/traceai' },
]
Expand All @@ -366,12 +367,13 @@ export const tabNavigation: NavTab[] = [
title: 'Features',
items: [
{ title: 'Set Up Observability', href: '/docs/observe/features/quickstart' },
{ title: 'Run Evals on Traces', href: '/docs/observe/features/evals' },
{ title: 'LLM Tracing', href: '/docs/observe/features/llm-tracing' },
{ title: 'Sessions', href: '/docs/observe/features/session' },
{ title: 'Users', href: '/docs/observe/features/users' },
{ title: 'Run Evals on Traces', href: '/docs/observe/features/evals' },
{ title: 'Dashboards', href: '/docs/observe/features/dashboard' },
{ title: 'Alerts & Monitors', href: '/docs/observe/features/alerts' },
{ title: 'Voice Observability', href: '/docs/observe/features/voice' },
{ title: 'Dashboards', href: '/docs/observe/features/dashboard' },
{
title: 'Manual Tracing',
items: [
Expand All @@ -394,7 +396,24 @@ export const tabNavigation: NavTab[] = [
]
},
{
title: 'Integration',
title: 'Reference',
items: [
{ title: 'Trace Filter Syntax', href: '/docs/observe/reference/trace-filter-syntax' },
{ title: 'Dashboard Metric Definitions', href: '/docs/observe/reference/dashboard-metric-definitions' },
{ title: 'Trace Export and Endpoints', href: '/docs/observe/reference/export-formats' },
]
},
{
title: 'Troubleshooting',
items: [
{ title: 'No traces appear', href: '/docs/observe/troubleshooting/no-traces-appearing' },
{ title: 'Missing spans or attributes', href: '/docs/observe/troubleshooting/missing-attributes' },
{ title: 'Dashboard numbers look wrong', href: '/docs/observe/troubleshooting/dashboard-numbers-look-wrong' },
{ title: 'An alert did not fire', href: '/docs/observe/troubleshooting/alerts-did-not-fire' },
]
},
{
title: 'Framework integrations',
items: [
{ title: 'Overview', href: '/docs/tracing/auto' },
{
Expand Down
127 changes: 85 additions & 42 deletions src/pages/docs/observe/features/alerts.mdx
Original file line number Diff line number Diff line change
@@ -1,84 +1,127 @@
---
title: "Alerts and Monitors: Observe Metric Threshold Notifications"
description: "Define monitors on Observe project metrics (system or evaluation) and get notified by email or Slack when values cross a threshold."
title: "Alerts and Monitors: Threshold Notifications"
description: "Define monitors on Observe metrics — error rate, latency, cost, or eval scores — and get notified by email or Slack when a value crosses a threshold."
page_type: "feature-deep-dive"
products: ["traceAI"]
feature: "Alerts and monitors"
feature_status: "stable"
ui_surfaces: ["Observe > Alerts"]
audience: "engineer"
difficulty: "beginner"
status: "review"
owner: "observability"
last_tested: "2026-05-25"
last_screenshotted: "2026-05-25"
schema_type: "TechArticle"
seo:
primary_query: "llm observability alerts and monitors"
geo:
direct_answer: true
related:
feature: "/docs/observe/features/dashboard"
concept: "/docs/tracing/concepts/traces"
how_to: "/docs/observe/features/evals"
---

## About

**Alerts and monitors** notify you when a metric goes above or below a value you set. Pick a metric (error rate, latency, cost, or an eval score), define a threshold, and choose where to get notified: email, Slack, or both. Monitors check the metric on a schedule. If the threshold is breached, you get an alert. You can review past alerts, mark them resolved, or mute a monitor without deleting it.
A monitor watches one Observe metric on a schedule and notifies you when it crosses a threshold. Pick a metric error rate, latency, cost, or an eval score — set a threshold and direction, and choose where alerts go: email, Slack, or both. When the threshold is breached the monitor creates an alert log and sends the notification. You can review past alerts, mark them resolved, or mute a monitor without deleting it. Monitors are how Observe tells *you* something broke instead of you watching a dashboard.

---

## When to use

- **Catch errors early**: Get notified when error rate or API failure rate spikes after a deployment.
- **Stay within latency limits**: Alert when response time goes above your target.
- **Control costs**: Track token usage and get a warning before you hit your budget.
- **Monitor eval quality**: Know when a pass/fail eval like toxicity starts failing more often.
- **Stay informed without watching dashboards**: Send alerts to email, Slack, or both.
- **Catch errors early** — alert when error rate or LLM API failure rate spikes after a deploy.
- **Hold latency limits** — alert when response time goes above your target.
- **Control cost** — warn before token usage hits your budget.
- **Guard quality** — alert when a pass/fail eval (e.g. toxicity) starts failing more often.

---

## How to
## When not to use

- **Exploring trends** — a monitor is a tripwire, not a chart; use [Dashboards](/docs/observe/features/dashboard).
- **Debugging one request** — use the [trace explorer](/docs/observe/features/llm-tracing).

---

## Set up a monitor

<Steps>
<Step title="Choose the metric">
Create a monitor for an Observe project and select the **metric type**:
![Choose the metric](/screenshot/product/observe/1.png)
Create a monitor for an Observe project and pick the metric type.

- **System metrics**: count of errors, error-free session rates, LLM API failure rates, span response time, LLM response time, token usage, daily/monthly tokens spent.
- **Evaluation metrics**: attach an eval config for that project. For pass/fail or choice evals you can set **threshold_metric_value** to the specific value to monitor (e.g. fail rate or a choice label).
<img src="/images/docs/observe/alerts-create.webp" alt="Create-monitor form selecting a metric, threshold, and notification channels" style={{ borderRadius: '5px' }} />
*Building a monitor: metric → threshold → notifications, in one form.*

The monitor is scoped to one project (Observe projects only).
- **System metrics:** error count, error-free session rate, LLM API failure rate, span response time, LLM response time, token usage, daily/monthly tokens spent.
- **Evaluation metrics:** attach an eval config for the project. For pass/fail or choice evals, set `threshold_metric_value` to the value to watch (e.g. a fail rate or a choice label).
</Step>

<Step title="Define the threshold">
Set how the alert is triggered:
![Define the threshold](/screenshot/product/observe/2.png)

- **threshold_operator**: **Greater than** or **Less than** (the current metric value is compared to the threshold).
- **threshold_type**: how the threshold is determined:
- **Static**: you set fixed **critical_threshold_value** and optionally **warning_threshold_value**. Alert fires when the metric is greater than (or less than) these values.
- **Percentage change**: threshold is based on percentage change from a baseline (e.g. historical mean over a time window). You set **critical_threshold_value** and optionally **warning_threshold_value** as percentage values. **auto_threshold_time_window** (default one week, in minutes) defines the window used to compute the baseline.

When the condition is met, the system creates an alert log (critical or warning) and triggers notifications.
- **`threshold_operator`** — `Greater than` or `Less than`.
- **`threshold_type`** — `Static` (fixed `critical_threshold_value`, optional `warning_threshold_value`) or `Percentage change` (compared to a baseline; `auto_threshold_time_window` sets the baseline window, default one week).
</Step>

<Step title="Set alert frequency">
**alert_frequency** is how often the monitor is evaluated, in minutes (minimum 5, default 60). The monitor runs on this schedule and checks the metric over the relevant time window. If the threshold is breached, an alert is created and notifications are sent.
<Step title="Set the frequency">
`alert_frequency` is how often the monitor runs, in minutes — **minimum 5, default 60**. Each run checks the metric over its window and fires an alert if the threshold is breached.
</Step>

<Step title="Configure notifications">
- **Email**: add up to five addresses in **notification_emails**. They receive an email when an alert is triggered (subject and body include alert name, message, and type).
- **Slack**: set **slack_webhook_url** to your Slack incoming webhook. Optional **slack_notes** are included in the message.
![Configure notifications](/screenshot/product/observe/3.png)
You can use email only, Slack only, or both. Mute a monitor with **is_mute** to stop notifications without deleting it.
- **Email** up to **5** addresses in `notification_emails`.
- **Slack**set `slack_webhook_url` (an incoming webhook); optional `slack_notes` are included.

Use email only, Slack only, or both. Mute with `is_mute` to pause notifications without deleting the monitor.
</Step>

<Step title="View and resolve alerts">
Alert history is stored as **UserAlertMonitorLog** records (critical/warning, message, time window, link). You can list logs for a monitor, see when each alert fired, and mark them resolved. Use the monitor detail view in the UI to see trend data and unresolved count.
Alert history is stored as `UserAlertMonitorLog` records (critical or warning, with message, time window, and a link). List them per monitor, see when each fired, and mark them resolved.

<img loading="lazy" src="/images/docs/observe/alerts-overview.webp" alt="Alerts list showing past alerts with severity, message, and resolved status" style={{ borderRadius: '5px' }} />
*Alert history. The unresolved count is your live to-do list.*
</Step>
</Steps>

<Note>
Monitors are only available for projects with **trace_type** `observe`. Optional **filters** (same structure as eval-task filters) can narrow which spans are included when computing the metric.
</Note>
---

## Inputs and parameters

| Parameter | Detail |
|---|---|
| `metric type` | System metric or an attached evaluation metric. |
| `threshold_operator` | `Greater than` / `Less than`. |
| `threshold_type` | `Static` or `Percentage change`. |
| `critical_threshold_value` / `warning_threshold_value` | The trigger values (warning optional). |
| `auto_threshold_time_window` | Baseline window for percentage-change, default one week (minutes). |
| `alert_frequency` | Evaluation cadence, min 5 / default 60 minutes. |
| `notification_emails` | Up to 5 recipients. |
| `slack_webhook_url`, `slack_notes` | Slack channel + optional message. |
| `is_mute` | Pause notifications without deleting. |
| `filters` | Optional; same structure as eval-task filters, to narrow the spans. |

---

## Edge cases and limits

- Monitors are available **only for `observe` projects**.
- A `Percentage change` monitor needs enough history in its baseline window to compute against — a brand-new project may not alert until data accumulates.
- Muting (`is_mute`) stops notifications but the monitor keeps evaluating and logging.

---

## Next Steps
## Related features

<CardGroup cols={2}>
<Card title="Set Up Observability" icon="play" href="/docs/observe/features/quickstart">
Connect the SDK and start capturing traces.
<Card title="Dashboards" icon="chart-simple" href="/docs/observe/features/dashboard">
Chart the same metrics you alert on.
</Card>
<Card title="Run Evals on Traces" icon="chart-line" href="/docs/observe/features/evals">
Run evaluations on your traced spans to score quality.
<Card title="Run evals on traces" icon="chart-line" href="/docs/observe/features/evals">
Produce the eval scores a monitor can watch.
</Card>
<Card title="Group Traces by Session" icon="table-rows" href="/docs/observe/features/session">
Group traces into sessions for multi-turn analysis.
<Card title="Trace explorer" icon="magnifying-glass" href="/docs/observe/features/llm-tracing">
Investigate the requests behind a breached threshold.
</Card>
<Card title="Users" icon="tags" href="/docs/observe/features/users">
View activity and metrics per end user.
<Card title="Users" icon="user" href="/docs/observe/features/users">
Check whether a spike is concentrated in specific users.
</Card>
</CardGroup>
Loading
Loading