feat: wire LLM cost tracking end-to-end — backend, API, WebSocket, and CostDashboard UI#320
Open
nuthalapativarun wants to merge 7 commits intomicrosoft:mainfrom
Open
Conversation
- Add CostResult NamedTuple to base.py; get_cost_estimator now returns CostResult(cost, prompt_tokens, completion_tokens) instead of float - Update openai.py (_chat_completion, _responses_completion, _chat_completion_operator, OperatorServicePreview) to return CostResult - Update claude.py to accumulate tokens across n completions, return CostResult - Update gemini.py to return CostResult - In get_completions(), wrap chat_completion() with wall-time measurement and emit LLMCallEvent via Galaxy event bus (best-effort, non-blocking) - Return cost float unchanged to callers for backward compatibility - Add LLMCallEvent/CostThresholdExceededEvent dataclasses and new EventType values to galaxy/core/events.py (Issue 1 changes included for this branch)
- Add llm_metrics dict to SessionMetricsObserver.__init__() tracking total_cost, token counts, per-agent/model breakdowns, and a capped call log (last 500 entries) - Handle LLMCallEvent in on_event() via _handle_llm_call_event() - Add cost_alert_threshold param: emits CostThresholdExceededEvent once when total_cost exceeds threshold (one-shot, no spam)
Expose LLM cost and token metrics via two HTTP endpoints: - GET /api/metrics/cost → SessionCostSummary (per-agent, per-model breakdown) - GET /api/metrics/cost/export?format=json|csv → full call log download MetricsService reads directly from SessionMetricsObserver._metrics_observer, keeping the service layer thin. Both endpoints require X-API-Key auth.
…ocket Add LLMCallEvent and CostThresholdExceededEvent handling to EventSerializer so the WebSocketObserver forwards them to connected clients with the frontend-specific message_type fields: - LLMCallEvent → message_type: "llm_metrics_update" - CostThresholdExceededEvent → message_type: "cost_alert"
Add LLMMetrics state to the Galaxy store with two actions: - setLLMMetrics: replace full metrics snapshot - appendLLMCall: incrementally update totals and per-agent/model costs Handle two new WebSocket message types in main.tsx: - "llm_metrics_update" → appendLLMCall (real-time accumulation) - "cost_alert" → pushNotification with warning severity
New components under src/components/metrics/: - CostByModelChart: pure-Tailwind horizontal bar chart (no extra deps) - CostDashboard: summary row (cost/tokens/calls), cost-by-model chart, cost-by-agent chart, collapsible recent-calls table, cost alert banner Mount in RightPanel via a two-tab bar (Constellation | Cost). Extends rightPanelTab type with 'cost' variant to drive tab routing.
- Fix NameError in MetricsService.get_cost_summary() where observer variable was referenced before assignment; refetch from session directly - Add missing threshold= arg to CostThresholdExceededEvent constructor in SessionMetricsObserver (would raise TypeError at runtime) - Fix publish() -> publish_event() call on event bus in base_observer - Add LLM event fields to GalaxyEvent interface; remove (event as any) casts in handleLLMMetricsUpdate and handleCostAlert handlers - Log failed LLMCallEvent emissions at DEBUG instead of silently swallowing - Use stable composite key in RecentCallsTable rows instead of array index - Fix JSX formatting in RightPanel: </div>} -> </div>\n)} style - Update CostByModelChart docstring to reflect generic key usage
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Feature: LLM Cost & Token Usage Tracking
This is PR 2 of 2. Depends on #319 which adds
LLMCallEventandCostThresholdExceededEventto the event bus. Merge #319 first.UFO already computes cost per LLM call in
ufo/llm/base.pyand returns it from every provider — but the value is discarded by callers. This PR wires that existing data all the way from the LLM call layer through the event bus, into a metrics observer, out through a FastAPI REST API and WebSocket, and into a live-updating React dashboard in the Galaxy Web UI.What this PR does
Backend — LLM call layer (
ufo/llm/)base.py:get_cost_estimator()now returnsCostResult(cost, prompt_tokens, completion_tokens)(aNamedTuple) instead of a barefloat. Backward-compatible — callers that only used.coststill work.openai.py,claude.py,gemini.py: Updated to unpackCostResultfromget_cost_estimator().llm_call.py: After each provider call, measures wall-clock duration and emitsLLMCallEventon the Galaxy event bus. Falls back gracefully (logs at DEBUG) if Galaxy is not running.Backend — Observer (
galaxy/session/observers/base_observer.py)SessionMetricsObservernow subscribes toLLM_CALL_COMPLETEDevents and accumulates:cost_alert_threshold(default0.0= disabled): emitsCostThresholdExceededEventonce when the session total exceeds the threshold.Backend — API (
galaxy/webui/)models/responses.py: AddsLLMCallRecordandSessionCostSummaryPydantic response models.services/metrics_service.py: Thin service that reads fromSessionMetricsObserver.metrics["llm_metrics"]and serialises to JSON or CSV.routers/metrics.py: New FastAPI router:GET /api/metrics/cost— returnsSessionCostSummaryfor the active sessionGET /api/metrics/cost/export?format=json|csv— downloads the full call logserver.pyalongside existing routers.Backend — WebSocket (
galaxy/webui/websocket_observer.py)EventSerializerextended to handleLLMCallEvent(broadcast asmessage_type: "llm_metrics_update") andCostThresholdExceededEvent(broadcast asmessage_type: "cost_alert").Frontend — Zustand store (
src/store/galaxyStore.ts)LLMMetricsinterface andLLMCallRecordtype.llmMetricsslice withsetLLMMetricsandappendLLMCallactions.appendLLMCallincrementally updates all aggregates client-side (no full-fetch needed on each event).rightPanelTabunion extended with'cost'tab.Frontend — WebSocket handlers (
src/main.tsx)handleLLMMetricsUpdate: mapsllm_metrics_updateWebSocket messages toappendLLMCall.handleCostAlert: mapscost_alertmessages to apushNotificationwithseverity: "warning".GalaxyEventinterface extended with LLM-specific fields (removes allas anycasts).Frontend — UI (
src/components/metrics/)CostByModelChart.tsx: Tailwind-only horizontal bar chart, sorted by cost descending. Reused for both model and agent breakdowns.CostDashboard.tsx: Live-updating panel showing:cost_alertnotificationsRightPanel.tsx: Adds a tab bar ("Constellation" / "Cost") to switch between the existing constellation view and the new cost dashboard.Architecture
Testing
costfloat still returned fromget_completions()).GET /api/metrics/costreturns404when no session is active;200with live data during a session.