feat: refine token usage display modes#2329
Open
Layau-code wants to merge 4 commits intobytedance:mainfrom
Open
feat: refine token usage display modes#2329Layau-code wants to merge 4 commits intobytedance:mainfrom
Layau-code wants to merge 4 commits intobytedance:mainfrom
Conversation
Contributor
There was a problem hiding this comment.
Pull request overview
Refines token usage UX in the workspace by introducing explicit display modes, aggregating per assistant turn by default, and enabling step-level “debug” attribution backed by structured metadata from the backend.
Changes:
- Add token usage view presets (Off/Summary/Per turn/Debug) and persist preferences in local settings.
- Aggregate inline token usage once per assistant turn, with optional step-level debug rendering and labels.
- Annotate AI steps on the backend with
token_usage_attributionand preserveadditional_kwargsthrough client serialization/streaming.
Reviewed changes
Copilot reviewed 20 out of 20 changed files in this pull request and generated no comments.
Show a summary per file
| File | Description |
|---|---|
| frontend/tests/unit/core/messages/utils.test.ts | Adds coverage for per-turn aggregation helper. |
| frontend/tests/unit/core/messages/usage-model.test.ts | Adds coverage for presets/preferences mapping and debug step labeling/fallback behavior. |
| frontend/src/core/settings/local.ts | Introduces persisted tokenUsage local setting defaults and merge behavior. |
| frontend/src/core/messages/utils.ts | Exposes getMessageGroups and adds getAssistantTurnUsageMessages for per-turn usage aggregation. |
| frontend/src/core/messages/usage-model.ts | New model for token usage presets/preferences and step-level debug labeling (incl. backend attribution parsing). |
| frontend/src/core/i18n/locales/zh-CN.ts | Adds new token usage strings (presets, descriptions, debug labels). |
| frontend/src/core/i18n/locales/types.ts | Extends Translations types for new token usage UI strings. |
| frontend/src/core/i18n/locales/en-US.ts | Adds new token usage strings (presets, descriptions, debug labels). |
| frontend/src/components/workspace/token-usage-indicator.tsx | Replaces tooltip indicator with dropdown selector + totals display and preference updates. |
| frontend/src/components/workspace/messages/message-token-usage.tsx | Switches to per-turn summary rendering and adds debug list renderer. |
| frontend/src/components/workspace/messages/message-list.tsx | Wires inline token usage modes, per-turn aggregation, debug step rendering, and assistant-turn copy behavior. |
| frontend/src/components/workspace/messages/message-list-item.tsx | Removes per-message token usage rendering and adjusts copy toolbar behavior/positioning. |
| frontend/src/components/workspace/messages/message-group.tsx | Integrates step-level token debug summaries into chain-of-thought/tool call rendering. |
| frontend/src/app/workspace/chats/[thread_id]/page.tsx | Loads/saves local token usage preferences and passes inline mode to message list. |
| frontend/src/app/workspace/agents/[agent_name]/chats/[thread_id]/page.tsx | Same as above for agent-specific chat route. |
| backend/tests/test_token_usage_middleware.py | Adds tests for structured attribution metadata emitted by middleware. |
| backend/tests/test_client_message_serialization.py | Ensures additional_kwargs are preserved during message serialization. |
| backend/tests/test_client.py | Tests streaming behavior for propagating additional_kwargs updates to clients. |
| backend/packages/harness/deerflow/client.py | Preserves/streams additional_kwargs (incl. attribution) for AI/tool/human/system messages. |
| backend/packages/harness/deerflow/agents/middlewares/token_usage_middleware.py | Adds step attribution annotation logic and attaches it to AI messages via additional_kwargs. |
Comments suppressed due to low confidence (1)
frontend/src/components/workspace/messages/message-list.tsx:298
- In the subagent rendering loop, the
subtask-countelement is pushed for every AI message but always uses the same Reactkey("subtask-count"). Ifgroup.messagescontains more than one AI message, this produces duplicate keys and can cause unstable rendering.
Consider moving the count element outside the loop, or include message.id/index in the key so each entry is unique.
<div
key="subtask-count"
className="text-muted-foreground pt-2 text-sm font-normal"
>
{t.subtasks.executing(tasks.size)}
</div>,
);
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fixes #2313
Summary
This PR refines how token usage is displayed in the workspace.
The previous implementation exposed token usage at too fine a granularity in multi-step responses. In tool-call, subagent, and planning-heavy turns, a single assistant reply could render multiple token usage entries, which made the UI noisy and hard to understand.
This follow-up makes the display granularity explicit and introduces selectable token usage modes, with a cleaner default experience.
Changes
Token usage display modes
Add selectable token usage display modes in the workspace header:
Off: hide token usageSummary: show only the top-level token totalPer turn: show one aggregated token usage entry for each assistant replyDebug: show step-level token attribution for inspection/debuggingThe default inline experience is now
Per turn, which better matches the original goal: one user request + one assistant response should feel like one token usage unit.Per-turn aggregation
Instead of rendering token usage for each internal step in a grouped assistant response, the UI now aggregates usage across the assistant turn and renders a single inline token usage item.
This keeps normal conversations readable while preserving cost visibility.
Debug mode for step-level attribution
Step-level token usage is still available, but it is now treated as a debug-oriented mode rather than the default experience.
In debug mode, token usage is attached to specific step labels where possible, instead of rendering as a detached list of raw token lines.
Examples include:
When a single AI step covers multiple actions, the UI explicitly treats it as a shared step total instead of pretending to provide exact per-tool token splits.
Backend attribution metadata
The backend now annotates AI steps with structured
token_usage_attributionmetadata.This gives the frontend a more reliable attribution source for debug mode, especially for:
write_todosThe frontend still keeps a safe fallback path when attribution is missing or malformed.
Streaming/client consistency
Structured attribution metadata is now preserved in client serialization and streaming-related paths, so step-level token views do not depend only on history snapshots.
Test Results
Backend
Commands:
Result:
Frontend
Commands:
pnpm format:write pnpm check pnpm testResult:
Misc
Commands:
Result:
Checklist
make formatmake lintPYTHONPATH=/app/backend uv run pytest tests/test_client.py tests/test_client_message_serialization.py tests/test_token_usage_middleware.py -vpnpm format:writepnpm checkpnpm testgit diff --checkpassesNotes
Debugmode remains AI-step-level attribution and does not attempt exact tool-level token accounting