perf(tui): fix the five render/input hot paths (#3896–#3900) by Hmbown · Pull Request #3902 · Hmbown/CodeWhale

Hmbown · 2026-07-02T14:31:29Z

Summary

Fixes all five open performance-labeled issues, one commit per issue, plus a follow-up commit closing four regressions found by an adversarial multi-agent review of the branch, plus a merge of main (v0.8.67).

perf(tui): Tasks sidebar computes its rows twice per frame (redundant clone+sort) #3898 — Tasks sidebar computed its rows twice per frame. TaskPanelRowSets is computed once in render_sidebar_tasks and shared by the row renderer and hover-text builder; focus-conditional live-tool dedup preserved. Also removes the second full clone of every agent row in sort_sidebar_agent_rows_as_tree.
perf(tui): file-tree expand does a synchronous recursive read_dir on the UI thread #3900 — file-tree expand/collapse walked the workspace synchronously on the UI thread. Collapse is a pure in-memory splice; expand walks only the newly expanded subtree on a background thread and splices results in on poll, with per-directory sequence guards against stale walks.
perf(tui): avoid deep-cloning every visible transcript cell each frame in the collapse path #3896 — the collapse render path deep-cloned every visible transcript cell each frame. The filtered path now borrows cells via a new TranscriptViewCache::ensure_filtered; ensure_split and the new variant share one iterator-driven helper. Zero cell clones per frame, byte-identical output.
perf(tui): path-like @mention completion re-walks the filesystem on every keystroke #3899 — @mention completion re-walked the filesystem per keystroke on the UI thread. The walk runs on a background thread with at most one real walk running; the popup shows same-token-lineage fallback entries while a walk is in flight.
perf(tui): streaming re-parses the whole growing message every chunk (O(N²) markdown) #3897 — streaming re-parsed the whole growing message per committed chunk (O(N²)). Incremental streaming render: only newly completed lines parse, frozen lines re-use, only the volatile tail (partial line / open table group) re-renders. A debug_assert differential guard makes the entire suite double as a byte-identity test against the full re-parse.

Adversarial review round (34 reviewer/verifier agents over the five commits — sidebar/transcript/markdown came back clean): confirmed and fixed four async-window regressions in commit 80e058a — file-tree expand results now trigger a redraw when they land; Enter can no longer submit a half-typed mention during the first walk's in-flight window; the stale popup fallback is bounded to the same token lineage (no wrong-file splice); superseded walks are serialized instead of stacking concurrently.

Merge with v0.8.67 (4504153): main's #3757 candidate cache (one needle-independent walk serves every keystroke of a non-path-like mention token, ranked in memory) is kept as the fast path, adapted to the key-based cache form; this branch's #3899 background walk now covers exactly the needles that cache bypasses (path-like and browser). The two fixes compose: non-path needles rarely walk at all, and when a walk is needed it never blocks the UI thread.

Testing

cargo fmt --all -- --check
cargo clippy --workspace --all-targets --all-features
cargo test -p codewhale-tui --locked (post-merge) — 5715 passed / 1 failed: sandbox::tests::test_parity_linux_landlock_available, which asserts the host kernel exposes Landlock and fails in this container on any branch (environment-only, untouched by this PR)
cargo test --workspace --all-features

Targeted suites post-merge: mention 50, sidebar:: 80, file_tree 8, transcript:: 31, widgets:: 188, markdown_render 38, history 148 — all green. Each commit message describes its new tests.

Checklist

Updated docs or comments as needed
Added or updated tests where relevant
Verified TUI behavior manually if UI changes
Harvested/co-authored credit uses a GitHub numeric noreply address (n/a — no harvested work)

🤖 Generated with Claude Code

https://claude.ai/code/session_01Ffi9Tm4TfXhK5ngpA68xw3

task_panel_rows() and task_panel_hover_texts() each independently called active_tool_rows(), reasoning_task_rows(), background_task_rows() (a clone+sort of app.task_panel), and recent_tool_rows() on every frame the Tasks panel was visible — doubling the row-building work per frame. Introduce TaskPanelRowSets, computed once in render_sidebar_tasks and passed by reference to both consumers. The focus-conditional live-tool dedup argument to background_task_rows (Tasks focus dedups, Auto mode does not) is preserved and now evaluated in one place. As a side benefit, both consumers now see one snapshot, so hover rows can no longer disagree with rendered rows across an Instant::elapsed TTL boundary within a single frame. Also stop double-cloning every SidebarAgentRow in sort_sidebar_agent_rows_as_tree: emit an index order first, then materialize by move (the `seen` set guarantees each index is emitted exactly once), noted as a related win in the issue. New tests pin the focus-conditional background dedup, rows/hover snapshot alignment, and exactly-once emission of agent rows (including the rootless parent-cycle orphan sweep). Fixes #3898

Toggling a directory in the file-tree pane called rebuild(), which re-walked the workspace root plus every expanded subtree with synchronous read_dir on the UI thread — freezing the UI for the duration of the walk on large directories. Only the initial build was offloaded (#399 S3). Collapse now needs no I/O at all: the directory's visible descendants are spliced out of the flat entry list in memory. Descendant expansion state stays in expanded_dirs, so re-expanding the parent restores it, exactly as the full rebuild did. Expand marks the entry expanded immediately (the ▼ acknowledges the keypress), then walks only the newly expanded subtree on a background thread via the same spawn_blocking_supervised + poll-cell pattern as the initial build, splicing the children in when the walk lands. Per-directory sequence numbers plus a collapse-clears-pending rule guarantee a stale walk can never splice into a re-toggled or collapsed node. Without a tokio runtime (plain unit tests) both the initial build and expand walk run synchronously, preserving test semantics. One behavioral delta, deliberate: the old full rebuild re-read every expanded directory on each toggle, so unrelated filesystem changes were incidentally picked up; the splice path only reads the newly expanded subtree. Tests cover splice-vs-full-rebuild parity across expansion orders, I/O-free collapse, descendant-state restoration, stale-result discarding, cursor stability across splices, and the async end-to-end path. Fixes #3900

… path With tool-run auto-collapse active (the default once any run of >=3 successful low-risk tools exists), ChatWidget::new rebuilt a Vec<HistoryCell> of every non-collapsed cell on every frame — deep-cloning the entire visible transcript (assistant/user bodies, tool outputs, sub-agent cards) at frame rate even when nothing had changed. The per-cell revision cache underneath never touches an unchanged cell, so the clones were pure waste. The filtered path now borrows cells instead of cloning them: the few synthetic tool-run summary cells are materialized into a stable local Vec first, then the filter builds a Vec<&HistoryCell> over history, active entries, and those summaries. TranscriptViewCache gains an ensure_filtered(&[&HistoryCell], ...) entry point; ensure_split and the new variant both delegate to one private iterator-driven helper, so the render/reuse/invalidation logic stays a single code path and the fast path plus all existing callers are untouched. Filtered sequence, revision values, collapsed_cell_map, and folded-thinking resolution are unchanged, so output is identical — now with zero cell clones on every frame, changed or not. New tests: ensure_filtered/ensure_split output parity, per-cell reuse through ensure_filtered, frame-stability of the collapse path at the ChatWidget level, and a collapsed run spanning the history/active boundary. Fixes #3896

The completion cache keys on the exact partial, so every keystroke that changed the needle missed and ran a fresh synchronous workspace walk on the UI thread — a gitignore-aware walk plus, for path-like needles ('.', '/', '\\'), the gitignore-disabled local-reference walk capped at 4096 paths (the same walk behind the #1921 WSL2 freezes). Typing a path-like @mention in a large repo stalled the input loop on every keystroke. The walk now runs on a background thread (spawn_blocking_supervised + shared-cell poll, same shape as the file-tree build and workspace-context refresh): the cache key is factored into MentionCompletionKey, at most one walk is in flight, and the latest keystroke wins — a completed walk for an older needle is kept only as the transient popup fallback under its true key, never returned as the live needle's result. While a walk is in flight the popup keeps showing the previous keystroke's entries (same session context only) so it never closes mid-word and Enter keeps routing to the popup instead of submitting the message. poll_mention_completion drains finished walks once per event-loop iteration, sets needs_redraw only when a result actually landed, and drops orphaned walks when the cursor leaves the mention token. Without a tokio runtime the walk runs synchronously, so completion results for a given input are byte-identical to before and the existing plain #[test] suite (including the cursor-move cache-reuse and freshness pins) passes unchanged. Tab completion (try_autocomplete_file_mention) stays synchronous: a deliberate one-shot action, unlike per-keystroke popup refreshes. Fixes #3899

…ing the whole message While a message streams, every committed chunk bumped the cell's revision and re-rendered the entire growing content: a full markdown parse plus a full word-wrap/span build, O(N) per chunk and O(N²/chunk) over the life of the stream. During an active stream chunks commit on nearly every 24ms loop tick, so long answers got visibly slower to render as they grew. The Ctrl+T overlay doubled the cost at a second width. The parser makes prefix stability provable: classification is per line with a single carried bool (the code-fence state), so appended bytes can never re-classify an earlier complete line — only the partial trailing line is volatile. Rendering is per block except consecutive table rows, whose column widths derive from the whole group, so a trailing table run stays volatile until a non-table block closes it. render_markdown_tagged_streaming keeps a small thread-local LRU of StreamRenderState slots (main transcript + overlay widths + a thinking body can interleave): per call it verifies the committed prefix with one memcmp, parses only newly completed lines, freezes their rendered lines, and re-renders just the volatile tail. Parse and render go through the same helpers as the full path (parse_line_into, render_blocks_tagged), and a debug_assert differential guard compares against the full re-parse on every call in debug builds, so the whole existing suite doubles as a byte-identity test. Streaming assistant cells route through the new path via render_assistant_message*; the final render after streaming ends takes the plain full-parse path, so finished messages are a full re-parse by construction. New tests: per-chunk byte-identity across chunk sizes 1/3/7 and widths 24/40/80 over headings/lists/fences/tables/links/CJK, an O(delta) parsed-line-count guard, non-extension fallback, two-width interleaving, trailing-table volatility, and char-by-char unclosed fences. Fixes #3897

An adversarial review of the five perf commits confirmed four real regressions in the two fixes that moved work off the UI thread; this closes all of them. File tree (#3900): a finished background expand walk never triggered a redraw — with the loop idle, an expanded directory looked empty until the next unrelated input event. The event loop now drains file-tree walks (initial build and expands) each iteration via poll_background() and sets needs_redraw when results were applied, matching the @mention poll wiring. Mention completion (#3899), three fixes: - Enter could submit a half-typed @mention during the first walk's in-flight window (popup routing keyed only on non-empty entries; the pre-async walk finished inside the key event, making that impossible). Enter is now held with a "Scanning files for @…" status while a walk for the live token is pending and the popup has nothing to show; Esc hides the pending popup; forced Ctrl+Enter submit keeps priority. - The stale popup fallback was unbounded: cached results for an UNRELATED earlier mention token could be shown and Enter/Tab would splice the wrong file over the live token. The fallback now requires same token lineage (live partial extends the cached one or vice versa) in addition to same context. - Superseded in-flight walks were dropped and respawned per keystroke; since a blocking walk cannot be cancelled, fast typing stacked concurrent full-workspace walks. The in-flight walk is now retained until it lands (its stale result is discarded), so real walks are serialized — at most one running — and the fresh needle's walk starts on the next call after landing. Tests: pending-expand drain requests exactly one repaint; fallback lineage positive/negative; in-flight walk retained across a needle change; mention_walk_pending true only for the live, un-hidden token. Follow-up to #3899/#3900 (PR #3902).

Resolves the conflicts with the #3861 constitution-first-setup merge: - @mention completion: main landed the #3757 candidate cache (one needle-independent walk serves every keystroke of a non-path-like mention token, ranked in memory with a 4s TTL). Kept it as the fast path, adapted to the MentionCompletionKey cache form, and kept this branch's #3899 background walk for exactly the needles the candidate cache bypasses (path-like and browser) — the two fixes compose: non-path needles rarely walk at all, and when a walk is needed it no longer blocks the UI thread. - app.rs: kept both structs/fields (MentionCompletionPending from #3899, MentionCandidateCache from #3757). - sidebar.rs: kept both tests added at the same location (this branch's tree-sort exactly-once test and main's terminal-row collapse test). cargo test -p codewhale-tui --locked after the merge: 5715 passed, 1 failed (sandbox::tests::test_parity_linux_landlock_available — pre-existing container-environment failure, kernel lacks Landlock).

Hmbown · 2026-07-02T16:14:34Z

Review — perf hot paths (verifying output-equivalence, not just speed)

Read-only review focused on the thing that matters for a perf change to render code: does each optimized path produce byte-identical output, and does every new cache invalidate correctly? Traced each fix against its issue.

Overall: merge-ready, with two tracked follow-ups. 4/5 fixes are correct with strong equivalence tests; the one real gap is bounded and has an Esc escape hatch.

Per-fix verdicts

perf(tui): avoid deep-cloning every visible transcript cell each frame in the collapse path #3896 transcript deep-clone → CORRECT. Slow path now borrows &HistoryCell and pre-materializes only the synthetic tool-run summaries; the per-cell reuse key (revision + index + layout/folded change) is unchanged, so collapse/expand/stream invalidation is unaffected. New .expect("summary cell materialized") push sites are provably safe (only called when run.start == idx && collapsed_run_starts.contains(idx)). Equivalence + reuse + frame-stability tests present.
perf(tui): streaming re-parses the whole growing message every chunk (O(N²) markdown) #3897 streaming O(N²) markdown → CORRECT (highest blast radius). Single shared parse_line_into classifier ⇒ classification identical by construction; only cross-line state is in_code_block; the tail parses against a scratch fence state; stable_end backs up over trailing table groups so column widths always see the whole group. A debug_assert!(rendered_lines_eq(full_reparse, streaming)) differential guard + width×chunk sweep tests pin it. Caveat: the guard is #[cfg(debug_assertions)] (compiled out in release), and this rewrites the render path for every streaming message — the most-repainted surface — so any future parser change that breaks prefix-stability would be a silent release-only visual regression. Keep the guard debug-gated and scrutinize future edits here. (Also: parse is now O(delta), but the frozen-lines clone() per committed chunk keeps an O(N) memcpy — still a big win, not fully linear.)
perf(tui): Tasks sidebar computes its rows twice per frame (redundant clone+sort) #3898 sidebar rows twice/frame → CORRECT. task_panel_row_sets computes once; rows + hover-texts share it, so they can't disagree across a mid-frame TTL boundary. Dedup/recent-gating preserved verbatim; the tree-sort move-materialization takes each slot exactly once (tested incl. a 2-node cycle).
perf(tui): path-like @mention completion re-walks the filesystem on every keystroke #3899 path-like @mention fs walk → RISKY (bounded). The core anti-race property holds: a completed walk is stored under its own needle key and only returned when it equals the live needle, so a slow old walk can't masquerade as the current one. But stale_mention_fallback_entries keeps the previous keystroke's entries visible for prefix-lineage tokens with the menu "open", so a fast Enter before the new walk lands can apply a completion that doesn't match the live token. Repro: @docs/a (shows docs/apple.md) → type b → Enter immediately → docs/apple.md inserted against @docs/ab. Pre-async, Enter always saw exact-needle results. Bounded (always a real file, prefix-lineage only) but a genuine equivalence gap. Minor secondary: if the walk thread panics, mention_completion_pending never clears and Enter is swallowed for that token until Esc.
perf(tui): file-tree expand does a synchronous recursive read_dir on the UI thread #3900 file-tree sync read_dir → CORRECT. Expand walk offloaded to spawn_blocking_supervised; per-dir seq drops superseded results; ancestor-collapsed-mid-walk and double-splice are guarded; cursor/scroll math adjusted on splice in/out. incremental_expand_matches_full_rebuild asserts entry-by-entry parity. Only visible change: ▼ shows for one poll cycle before children appear (progressive reveal) — imperceptible on local FS.

Suggested follow-ups (neither blocks the perf win)

perf(tui): path-like @mention completion re-walks the filesystem on every keystroke #3899 (worth doing before merge if you want strict equivalence): filter the stale fallback to entries still matching the live needle, or don't let Enter apply the fallback while mention_completion_pending is set (hold it like the empty-menu case).
perf(tui): path-like @mention completion re-walks the filesystem on every keystroke #3899 robustness: clear mention_completion_pending if a walk fails to produce a result, so a panicked walk can't soft-lock Enter for that token.

Test coverage is genuinely strong (not asserted-by-comment) — the one untested window is the #3899 fast-Enter-applies-stale case above.

(Detailed read; happy to open a small follow-up PR for the #3899 items if useful.)

1. Strict equivalence for the stale popup fallback: entries are now filtered to those still matching the LIVE needle (case-insensitive substring, mirroring the walk's own match rule), so a fast Enter after a keystroke can only apply an entry the fresh walk would also have returned — the reviewer's `docs/apple.md` vs `@docs/ab` repro is pinned by a test. When the filter empties the list the popup reads as pending and the existing Enter-hold takes over. 2. A panicking walk can no longer soft-lock Enter for its token: the walker closure now routes its result through a completion guard whose Drop writes an empty result on unwind, so mention_completion_pending always drains. Requested in review on PR #3902.

Hmbown · 2026-07-02T16:24:21Z

Both #3899 follow-ups are in as ebf2313 — no separate PR needed:

Fast-Enter-applies-stale (your docs/apple.md vs @docs/ab repro): took the stricter of your two options — the stale fallback now filters entries to those still matching the live needle (case-insensitive substring, same rule the walk itself uses), so anything shown or applied is a subset of what the fresh walk would return. When the filter empties the list, the popup reads as pending and the existing Enter-hold takes over. The exact repro is pinned by mention_fallback_drops_entries_that_no_longer_match_the_live_needle.
Panicked-walk soft-lock: the walker now routes its result through a completion guard whose Drop writes an empty result on unwind, so mention_completion_pending always drains and Enter releases. Pinned by mention_walk_completion_fills_cell_even_on_unwind.

On the #3897 caveat: agreed and leaving the differential guard debug-gated as you suggest — every cargo test run exercises it, and the two O(N) residuals you noted (frozen-lines clone per chunk, plus the message-layer decoration pass) are called out in the commit message as the known follow-up if profiling ever warrants it.

Post-push suite: 5717 passed / 1 failed (test_parity_linux_landlock_available, container-environment only).

Generated by Claude Code

claude added 5 commits July 2, 2026 14:17

claude added 2 commits July 2, 2026 15:26

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf(tui): fix the five render/input hot paths (#3896–#3900)#3902

perf(tui): fix the five render/input hot paths (#3896–#3900)#3902
Hmbown wants to merge 8 commits into
mainfrom
claude/performance-improvement-issues-bjknbf

Hmbown commented Jul 2, 2026 •

edited

Loading

Uh oh!

Hmbown commented Jul 2, 2026

Uh oh!

Hmbown commented Jul 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Hmbown commented Jul 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Testing

Checklist

Uh oh!

Hmbown commented Jul 2, 2026

Review — perf hot paths (verifying output-equivalence, not just speed)

Per-fix verdicts

Suggested follow-ups (neither blocks the perf win)

Uh oh!

Hmbown commented Jul 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Hmbown commented Jul 2, 2026 •

edited

Loading