Skip to content

perf(tui): streaming thinking cell re-parses the full reasoning buffer on every revision bump (O(N²), sibling of #3897) #3903

Description

@Hmbown

Problem

The streaming thinking cell still re-parses its entire accumulated reasoning buffer on every throttled revision bump — the same O(N²) shape #3897 fixed for streaming assistant cells.

Evidence

Impact

DeepSeek reasoning blocks routinely run tens of KB. At ~10 bumps/sec the render thread does a full parse + wrap of the whole buffer per bump — while the visible payload is only the tail 8 lines. Long thinking phases are exactly when users are watching; this reads as input latency and CPU burn late in a reasoning block.

Proposed approach

Route the streaming-thinking body through the same incremental stream renderer (render_markdown_tagged_streaming — it is append-only, same contract; the slot cache was sized to accommodate a thinking body interleaving), or add a tail cache keyed on committed-source length since the collapsed view only ever shows the last THINKING_STREAMING_PREVIEW_LINE_LIMIT lines. Also avoid the content.to_string() clone in the streaming path.

Acceptance criteria

  • Rendering a streaming thinking cell after a chunk does work proportional to the appended chunk (plus constant tail), not to the whole buffer.
  • Output byte-identical to the full re-parse (reuse the debug_assert differential-guard pattern from markdown_render.rs).
  • Existing render_thinking tests pass unchanged.

Notes

Found while fixing #3897; called out there as the "chattier sibling" (thinking bypasses the newline gate via bypass_gate: true).

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or requestperformanceRuntime/render performance

    Projects

    Status
    Backlog

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions