Skip to content

feat: refine token usage display modes#2329

Open
Layau-code wants to merge 4 commits intobytedance:mainfrom
Layau-code:feature/refine-token-usage-display
Open

feat: refine token usage display modes#2329
Layau-code wants to merge 4 commits intobytedance:mainfrom
Layau-code:feature/refine-token-usage-display

Conversation

@Layau-code
Copy link
Copy Markdown
Contributor

@Layau-code Layau-code commented Apr 17, 2026

Fixes #2313

Summary

This PR refines how token usage is displayed in the workspace.

The previous implementation exposed token usage at too fine a granularity in multi-step responses. In tool-call, subagent, and planning-heavy turns, a single assistant reply could render multiple token usage entries, which made the UI noisy and hard to understand.

This follow-up makes the display granularity explicit and introduces selectable token usage modes, with a cleaner default experience.

Changes

Token usage display modes

Add selectable token usage display modes in the workspace header:

  • Off: hide token usage
  • Summary: show only the top-level token total
  • Per turn: show one aggregated token usage entry for each assistant reply
  • Debug: show step-level token attribution for inspection/debugging

The default inline experience is now Per turn, which better matches the original goal: one user request + one assistant response should feel like one token usage unit.

Per-turn aggregation

Instead of rendering token usage for each internal step in a grouped assistant response, the UI now aggregates usage across the assistant turn and renders a single inline token usage item.

This keeps normal conversations readable while preserving cost visibility.

Debug mode for step-level attribution

Step-level token usage is still available, but it is now treated as a debug-oriented mode rather than the default experience.

In debug mode, token usage is attached to specific step labels where possible, instead of rendering as a detached list of raw token lines.

Examples include:

  • final answer
  • search-related steps
  • subagent dispatch
  • todo start / complete / update / remove actions

When a single AI step covers multiple actions, the UI explicitly treats it as a shared step total instead of pretending to provide exact per-tool token splits.

Backend attribution metadata

The backend now annotates AI steps with structured token_usage_attribution metadata.

This gives the frontend a more reliable attribution source for debug mode, especially for:

  • write_todos
  • subagent/task dispatch
  • search/tool batches
  • final-answer-only steps

The frontend still keeps a safe fallback path when attribution is missing or malformed.

Streaming/client consistency

Structured attribution metadata is now preserved in client serialization and streaming-related paths, so step-level token views do not depend only on history snapshots.

Test Results

Backend

Commands:

make format
make lint
PYTHONPATH=/app/backend uv run pytest tests/test_client.py tests/test_client_message_serialization.py tests/test_token_usage_middleware.py -v

Result:

148 passed

Frontend

Commands:

pnpm format:write
pnpm check
pnpm test

Result:

All frontend checks passed

Misc

Commands:

git diff --check

Result:

Passed

Checklist

  • Backend formatted with make format
  • Backend lint passes with make lint
  • Backend token usage related tests pass with PYTHONPATH=/app/backend uv run pytest tests/test_client.py tests/test_client_message_serialization.py tests/test_token_usage_middleware.py -v
  • Frontend formatted with pnpm format:write
  • Frontend lint + typecheck pass with pnpm check
  • Frontend unit tests pass with pnpm test
  • git diff --check passes
  • No provider credentials or local config changes included
  • Fixes improve: token用量展示优化 #2313

Notes

  • Debug mode remains AI-step-level attribution and does not attempt exact tool-level token accounting
  • the top-level token total still reflects currently available thread messages rather than a persisted ledger total
  • subagent-heavy conversations may still need a future ledger-based accounting model for fully stable totals

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Refines token usage UX in the workspace by introducing explicit display modes, aggregating per assistant turn by default, and enabling step-level “debug” attribution backed by structured metadata from the backend.

Changes:

  • Add token usage view presets (Off/Summary/Per turn/Debug) and persist preferences in local settings.
  • Aggregate inline token usage once per assistant turn, with optional step-level debug rendering and labels.
  • Annotate AI steps on the backend with token_usage_attribution and preserve additional_kwargs through client serialization/streaming.

Reviewed changes

Copilot reviewed 20 out of 20 changed files in this pull request and generated no comments.

Show a summary per file
File Description
frontend/tests/unit/core/messages/utils.test.ts Adds coverage for per-turn aggregation helper.
frontend/tests/unit/core/messages/usage-model.test.ts Adds coverage for presets/preferences mapping and debug step labeling/fallback behavior.
frontend/src/core/settings/local.ts Introduces persisted tokenUsage local setting defaults and merge behavior.
frontend/src/core/messages/utils.ts Exposes getMessageGroups and adds getAssistantTurnUsageMessages for per-turn usage aggregation.
frontend/src/core/messages/usage-model.ts New model for token usage presets/preferences and step-level debug labeling (incl. backend attribution parsing).
frontend/src/core/i18n/locales/zh-CN.ts Adds new token usage strings (presets, descriptions, debug labels).
frontend/src/core/i18n/locales/types.ts Extends Translations types for new token usage UI strings.
frontend/src/core/i18n/locales/en-US.ts Adds new token usage strings (presets, descriptions, debug labels).
frontend/src/components/workspace/token-usage-indicator.tsx Replaces tooltip indicator with dropdown selector + totals display and preference updates.
frontend/src/components/workspace/messages/message-token-usage.tsx Switches to per-turn summary rendering and adds debug list renderer.
frontend/src/components/workspace/messages/message-list.tsx Wires inline token usage modes, per-turn aggregation, debug step rendering, and assistant-turn copy behavior.
frontend/src/components/workspace/messages/message-list-item.tsx Removes per-message token usage rendering and adjusts copy toolbar behavior/positioning.
frontend/src/components/workspace/messages/message-group.tsx Integrates step-level token debug summaries into chain-of-thought/tool call rendering.
frontend/src/app/workspace/chats/[thread_id]/page.tsx Loads/saves local token usage preferences and passes inline mode to message list.
frontend/src/app/workspace/agents/[agent_name]/chats/[thread_id]/page.tsx Same as above for agent-specific chat route.
backend/tests/test_token_usage_middleware.py Adds tests for structured attribution metadata emitted by middleware.
backend/tests/test_client_message_serialization.py Ensures additional_kwargs are preserved during message serialization.
backend/tests/test_client.py Tests streaming behavior for propagating additional_kwargs updates to clients.
backend/packages/harness/deerflow/client.py Preserves/streams additional_kwargs (incl. attribution) for AI/tool/human/system messages.
backend/packages/harness/deerflow/agents/middlewares/token_usage_middleware.py Adds step attribution annotation logic and attaches it to AI messages via additional_kwargs.
Comments suppressed due to low confidence (1)

frontend/src/components/workspace/messages/message-list.tsx:298

  • In the subagent rendering loop, the subtask-count element is pushed for every AI message but always uses the same React key ("subtask-count"). If group.messages contains more than one AI message, this produces duplicate keys and can cause unstable rendering.

Consider moving the count element outside the loop, or include message.id/index in the key so each entry is unique.

                <div
                  key="subtask-count"
                  className="text-muted-foreground pt-2 text-sm font-normal"
                >
                  {t.subtasks.executing(tasks.size)}
                </div>,
              );

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

improve: token用量展示优化

2 participants