feat: wire LLM cost tracking end-to-end — backend, API, WebSocket, and CostDashboard UI by nuthalapativarun · Pull Request #320 · microsoft/UFO

nuthalapativarun · 2026-04-27T01:19:48Z

Feature: LLM Cost & Token Usage Tracking

This is PR 2 of 2. Depends on #319 which adds LLMCallEvent and CostThresholdExceededEvent to the event bus. Merge #319 first.

UFO already computes cost per LLM call in ufo/llm/base.py and returns it from every provider — but the value is discarded by callers. This PR wires that existing data all the way from the LLM call layer through the event bus, into a metrics observer, out through a FastAPI REST API and WebSocket, and into a live-updating React dashboard in the Galaxy Web UI.

What this PR does

Backend — LLM call layer (`ufo/llm/`)

base.py: get_cost_estimator() now returns CostResult(cost, prompt_tokens, completion_tokens) (a NamedTuple) instead of a bare float. Backward-compatible — callers that only used .cost still work.
openai.py, claude.py, gemini.py: Updated to unpack CostResult from get_cost_estimator().
llm_call.py: After each provider call, measures wall-clock duration and emits LLMCallEvent on the Galaxy event bus. Falls back gracefully (logs at DEBUG) if Galaxy is not running.

Backend — Observer (`galaxy/session/observers/base_observer.py`)

SessionMetricsObserver now subscribes to LLM_CALL_COMPLETED events and accumulates:
- Running totals: cost, prompt tokens, completion tokens, API call count
- Per-agent and per-model cost breakdowns
- Rolling call log (capped at last 500 entries)
Configurable cost_alert_threshold (default 0.0 = disabled): emits CostThresholdExceededEvent once when the session total exceeds the threshold.

Backend — API (`galaxy/webui/`)

models/responses.py: Adds LLMCallRecord and SessionCostSummary Pydantic response models.
services/metrics_service.py: Thin service that reads from SessionMetricsObserver.metrics["llm_metrics"] and serialises to JSON or CSV.
routers/metrics.py: New FastAPI router:
- GET /api/metrics/cost — returns SessionCostSummary for the active session
- GET /api/metrics/cost/export?format=json|csv — downloads the full call log
Registered in server.py alongside existing routers.

Backend — WebSocket (`galaxy/webui/websocket_observer.py`)

EventSerializer extended to handle LLMCallEvent (broadcast as message_type: "llm_metrics_update") and CostThresholdExceededEvent (broadcast as message_type: "cost_alert").

Frontend — Zustand store (`src/store/galaxyStore.ts`)

New LLMMetrics interface and LLMCallRecord type.
llmMetrics slice with setLLMMetrics and appendLLMCall actions. appendLLMCall incrementally updates all aggregates client-side (no full-fetch needed on each event).
rightPanelTab union extended with 'cost' tab.

Frontend — WebSocket handlers (`src/main.tsx`)

handleLLMMetricsUpdate: maps llm_metrics_update WebSocket messages to appendLLMCall.
handleCostAlert: maps cost_alert messages to a pushNotification with severity: "warning".
GalaxyEvent interface extended with LLM-specific fields (removes all as any casts).

Frontend — UI (`src/components/metrics/`)

CostByModelChart.tsx: Tailwind-only horizontal bar chart, sorted by cost descending. Reused for both model and agent breakdowns.
CostDashboard.tsx: Live-updating panel showing:
- Summary row: total cost, prompt tokens, completion tokens, API call count
- Cost by model bar chart
- Cost by agent bar chart
- Collapsible recent-calls table (last 50, newest first)
- Dismissible amber alert banner on cost_alert notifications
RightPanel.tsx: Adds a tab bar ("Constellation" / "Cost") to switch between the existing constellation view and the new cost dashboard.

Architecture

LLM Provider (openai / claude / gemini)
  └─ CostResult(cost, prompt_tokens, completion_tokens)
       └─ llm_call.py → emits LLMCallEvent on EventBus
            └─ SessionMetricsObserver
                 ├─ accumulates llm_metrics dict
                 └─ emits CostThresholdExceededEvent (if threshold set)
                      └─ WebSocketObserver → broadcasts to frontend
                           ├─ "llm_metrics_update" → appendLLMCall (Zustand)
                           └─ "cost_alert"         → pushNotification (Zustand)
                                └─ CostDashboard (React) — live updates
GET /api/metrics/cost         → SessionCostSummary (REST poll fallback)
GET /api/metrics/cost/export  → JSON or CSV download

Testing

All existing tests pass (no breaking changes — cost float still returned from get_completions()).
GET /api/metrics/cost returns 404 when no session is active; 200 with live data during a session.
WebSocket cost updates verified by running a session and observing the CostDashboard panel update in real time.

- Add CostResult NamedTuple to base.py; get_cost_estimator now returns CostResult(cost, prompt_tokens, completion_tokens) instead of float - Update openai.py (_chat_completion, _responses_completion, _chat_completion_operator, OperatorServicePreview) to return CostResult - Update claude.py to accumulate tokens across n completions, return CostResult - Update gemini.py to return CostResult - In get_completions(), wrap chat_completion() with wall-time measurement and emit LLMCallEvent via Galaxy event bus (best-effort, non-blocking) - Return cost float unchanged to callers for backward compatibility - Add LLMCallEvent/CostThresholdExceededEvent dataclasses and new EventType values to galaxy/core/events.py (Issue 1 changes included for this branch)

- Add llm_metrics dict to SessionMetricsObserver.__init__() tracking total_cost, token counts, per-agent/model breakdowns, and a capped call log (last 500 entries) - Handle LLMCallEvent in on_event() via _handle_llm_call_event() - Add cost_alert_threshold param: emits CostThresholdExceededEvent once when total_cost exceeds threshold (one-shot, no spam)

Expose LLM cost and token metrics via two HTTP endpoints: - GET /api/metrics/cost → SessionCostSummary (per-agent, per-model breakdown) - GET /api/metrics/cost/export?format=json|csv → full call log download MetricsService reads directly from SessionMetricsObserver._metrics_observer, keeping the service layer thin. Both endpoints require X-API-Key auth.

…ocket Add LLMCallEvent and CostThresholdExceededEvent handling to EventSerializer so the WebSocketObserver forwards them to connected clients with the frontend-specific message_type fields: - LLMCallEvent → message_type: "llm_metrics_update" - CostThresholdExceededEvent → message_type: "cost_alert"

Add LLMMetrics state to the Galaxy store with two actions: - setLLMMetrics: replace full metrics snapshot - appendLLMCall: incrementally update totals and per-agent/model costs Handle two new WebSocket message types in main.tsx: - "llm_metrics_update" → appendLLMCall (real-time accumulation) - "cost_alert" → pushNotification with warning severity

New components under src/components/metrics/: - CostByModelChart: pure-Tailwind horizontal bar chart (no extra deps) - CostDashboard: summary row (cost/tokens/calls), cost-by-model chart, cost-by-agent chart, collapsible recent-calls table, cost alert banner Mount in RightPanel via a two-tab bar (Constellation | Cost). Extends rightPanelTab type with 'cost' variant to drive tab routing.

- Fix NameError in MetricsService.get_cost_summary() where observer variable was referenced before assignment; refetch from session directly - Add missing threshold= arg to CostThresholdExceededEvent constructor in SessionMetricsObserver (would raise TypeError at runtime) - Fix publish() -> publish_event() call on event bus in base_observer - Add LLM event fields to GalaxyEvent interface; remove (event as any) casts in handleLLMMetricsUpdate and handleCostAlert handlers - Log failed LLMCallEvent emissions at DEBUG instead of silently swallowing - Use stable composite key in RecentCallsTable rows instead of array index - Fix JSX formatting in RightPanel: </div>} -> </div>\n)} style - Update CostByModelChart docstring to reflect generic key usage

nuthalapativarun added 7 commits April 26, 2026 17:42

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: wire LLM cost tracking end-to-end — backend, API, WebSocket, and CostDashboard UI#320

feat: wire LLM cost tracking end-to-end — backend, API, WebSocket, and CostDashboard UI#320
nuthalapativarun wants to merge 7 commits intomicrosoft:mainfrom
nuthalapativarun:feat/cost-tracking-issue-2

nuthalapativarun commented Apr 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

nuthalapativarun commented Apr 27, 2026

Feature: LLM Cost & Token Usage Tracking

What this PR does

Backend — LLM call layer (ufo/llm/)

Backend — Observer (galaxy/session/observers/base_observer.py)

Backend — API (galaxy/webui/)

Backend — WebSocket (galaxy/webui/websocket_observer.py)

Frontend — Zustand store (src/store/galaxyStore.ts)

Frontend — WebSocket handlers (src/main.tsx)

Frontend — UI (src/components/metrics/)

Architecture

Testing

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Backend — LLM call layer (`ufo/llm/`)

Backend — Observer (`galaxy/session/observers/base_observer.py`)

Backend — API (`galaxy/webui/`)

Backend — WebSocket (`galaxy/webui/websocket_observer.py`)

Frontend — Zustand store (`src/store/galaxyStore.ts`)

Frontend — WebSocket handlers (`src/main.tsx`)

Frontend — UI (`src/components/metrics/`)