Problem
The streaming thinking cell still re-parses its entire accumulated reasoning buffer on every throttled revision bump — the same O(N²) shape #3897 fixed for streaming assistant cells.
Evidence
Impact
DeepSeek reasoning blocks routinely run tens of KB. At ~10 bumps/sec the render thread does a full parse + wrap of the whole buffer per bump — while the visible payload is only the tail 8 lines. Long thinking phases are exactly when users are watching; this reads as input latency and CPU burn late in a reasoning block.
Proposed approach
Route the streaming-thinking body through the same incremental stream renderer (render_markdown_tagged_streaming — it is append-only, same contract; the slot cache was sized to accommodate a thinking body interleaving), or add a tail cache keyed on committed-source length since the collapsed view only ever shows the last THINKING_STREAMING_PREVIEW_LINE_LIMIT lines. Also avoid the content.to_string() clone in the streaming path.
Acceptance criteria
- Rendering a streaming thinking cell after a chunk does work proportional to the appended chunk (plus constant tail), not to the whole buffer.
- Output byte-identical to the full re-parse (reuse the
debug_assert differential-guard pattern from markdown_render.rs).
- Existing
render_thinking tests pass unchanged.
Notes
Found while fixing #3897; called out there as the "chattier sibling" (thinking bypasses the newline gate via bypass_gate: true).
Problem
The streaming thinking cell still re-parses its entire accumulated reasoning buffer on every throttled revision bump — the same O(N²) shape #3897 fixed for streaming assistant cells.
Evidence
crates/tui/src/tui/streaming_thinking.rs:80-101—appendbumpsactive_cell_revision(100ms throttle at:35), which invalidates the cache entry for the thinking row.crates/tui/src/tui/history/thinking.rs:123-146—render_thinkingclones the FULL content (content.to_string()) and runs the plain fullmarkdown_render::render_markdown— then, in the default collapsed view, discards everything but the last ~8 lines (:149-166).render_markdown_tagged_streaming(history/message.rs,markdown_render.rs— perf(tui): streaming re-parses the whole growing message every chunk (O(N²) markdown) #3897/PR perf(tui): fix the five render/input hot paths (#3896–#3900) #3902).Impact
DeepSeek reasoning blocks routinely run tens of KB. At ~10 bumps/sec the render thread does a full parse + wrap of the whole buffer per bump — while the visible payload is only the tail 8 lines. Long thinking phases are exactly when users are watching; this reads as input latency and CPU burn late in a reasoning block.
Proposed approach
Route the streaming-thinking body through the same incremental stream renderer (
render_markdown_tagged_streaming— it is append-only, same contract; the slot cache was sized to accommodate a thinking body interleaving), or add a tail cache keyed on committed-source length since the collapsed view only ever shows the lastTHINKING_STREAMING_PREVIEW_LINE_LIMITlines. Also avoid thecontent.to_string()clone in the streaming path.Acceptance criteria
debug_assertdifferential-guard pattern frommarkdown_render.rs).render_thinkingtests pass unchanged.Notes
Found while fixing #3897; called out there as the "chattier sibling" (thinking bypasses the newline gate via
bypass_gate: true).