Skip to content

perf(tui): fix the five render/input hot paths (#3896–#3900)#3902

Draft
Hmbown wants to merge 8 commits into
mainfrom
claude/performance-improvement-issues-bjknbf
Draft

perf(tui): fix the five render/input hot paths (#3896–#3900)#3902
Hmbown wants to merge 8 commits into
mainfrom
claude/performance-improvement-issues-bjknbf

Conversation

@Hmbown

@Hmbown Hmbown commented Jul 2, 2026

Copy link
Copy Markdown
Owner

Summary

Fixes all five open performance-labeled issues, one commit per issue, plus a follow-up commit closing four regressions found by an adversarial multi-agent review of the branch, plus a merge of main (v0.8.67).

Adversarial review round (34 reviewer/verifier agents over the five commits — sidebar/transcript/markdown came back clean): confirmed and fixed four async-window regressions in commit 80e058a — file-tree expand results now trigger a redraw when they land; Enter can no longer submit a half-typed mention during the first walk's in-flight window; the stale popup fallback is bounded to the same token lineage (no wrong-file splice); superseded walks are serialized instead of stacking concurrently.

Merge with v0.8.67 (4504153): main's #3757 candidate cache (one needle-independent walk serves every keystroke of a non-path-like mention token, ranked in memory) is kept as the fast path, adapted to the key-based cache form; this branch's #3899 background walk now covers exactly the needles that cache bypasses (path-like and browser). The two fixes compose: non-path needles rarely walk at all, and when a walk is needed it never blocks the UI thread.

Testing

  • cargo fmt --all -- --check
  • cargo clippy --workspace --all-targets --all-features
  • cargo test -p codewhale-tui --locked (post-merge) — 5715 passed / 1 failed: sandbox::tests::test_parity_linux_landlock_available, which asserts the host kernel exposes Landlock and fails in this container on any branch (environment-only, untouched by this PR)
  • cargo test --workspace --all-features

Targeted suites post-merge: mention 50, sidebar:: 80, file_tree 8, transcript:: 31, widgets:: 188, markdown_render 38, history 148 — all green. Each commit message describes its new tests.

Checklist

  • Updated docs or comments as needed
  • Added or updated tests where relevant
  • Verified TUI behavior manually if UI changes
  • Harvested/co-authored credit uses a GitHub numeric noreply address (n/a — no harvested work)

🤖 Generated with Claude Code

https://claude.ai/code/session_01Ffi9Tm4TfXhK5ngpA68xw3

claude added 5 commits July 2, 2026 14:17
task_panel_rows() and task_panel_hover_texts() each independently called
active_tool_rows(), reasoning_task_rows(), background_task_rows() (a
clone+sort of app.task_panel), and recent_tool_rows() on every frame the
Tasks panel was visible — doubling the row-building work per frame.

Introduce TaskPanelRowSets, computed once in render_sidebar_tasks and
passed by reference to both consumers. The focus-conditional live-tool
dedup argument to background_task_rows (Tasks focus dedups, Auto mode
does not) is preserved and now evaluated in one place. As a side
benefit, both consumers now see one snapshot, so hover rows can no
longer disagree with rendered rows across an Instant::elapsed TTL
boundary within a single frame.

Also stop double-cloning every SidebarAgentRow in
sort_sidebar_agent_rows_as_tree: emit an index order first, then
materialize by move (the `seen` set guarantees each index is emitted
exactly once), noted as a related win in the issue.

New tests pin the focus-conditional background dedup, rows/hover
snapshot alignment, and exactly-once emission of agent rows (including
the rootless parent-cycle orphan sweep).

Fixes #3898
Toggling a directory in the file-tree pane called rebuild(), which
re-walked the workspace root plus every expanded subtree with
synchronous read_dir on the UI thread — freezing the UI for the
duration of the walk on large directories. Only the initial build was
offloaded (#399 S3).

Collapse now needs no I/O at all: the directory's visible descendants
are spliced out of the flat entry list in memory. Descendant expansion
state stays in expanded_dirs, so re-expanding the parent restores it,
exactly as the full rebuild did.

Expand marks the entry expanded immediately (the ▼ acknowledges the
keypress), then walks only the newly expanded subtree on a background
thread via the same spawn_blocking_supervised + poll-cell pattern as
the initial build, splicing the children in when the walk lands.
Per-directory sequence numbers plus a collapse-clears-pending rule
guarantee a stale walk can never splice into a re-toggled or collapsed
node. Without a tokio runtime (plain unit tests) both the initial
build and expand walk run synchronously, preserving test semantics.

One behavioral delta, deliberate: the old full rebuild re-read every
expanded directory on each toggle, so unrelated filesystem changes
were incidentally picked up; the splice path only reads the newly
expanded subtree.

Tests cover splice-vs-full-rebuild parity across expansion orders,
I/O-free collapse, descendant-state restoration, stale-result
discarding, cursor stability across splices, and the async
end-to-end path.

Fixes #3900
… path

With tool-run auto-collapse active (the default once any run of >=3
successful low-risk tools exists), ChatWidget::new rebuilt a
Vec<HistoryCell> of every non-collapsed cell on every frame —
deep-cloning the entire visible transcript (assistant/user bodies,
tool outputs, sub-agent cards) at frame rate even when nothing had
changed. The per-cell revision cache underneath never touches an
unchanged cell, so the clones were pure waste.

The filtered path now borrows cells instead of cloning them: the few
synthetic tool-run summary cells are materialized into a stable local
Vec first, then the filter builds a Vec<&HistoryCell> over history,
active entries, and those summaries. TranscriptViewCache gains an
ensure_filtered(&[&HistoryCell], ...) entry point; ensure_split and
the new variant both delegate to one private iterator-driven helper,
so the render/reuse/invalidation logic stays a single code path and
the fast path plus all existing callers are untouched.

Filtered sequence, revision values, collapsed_cell_map, and
folded-thinking resolution are unchanged, so output is identical —
now with zero cell clones on every frame, changed or not.

New tests: ensure_filtered/ensure_split output parity, per-cell reuse
through ensure_filtered, frame-stability of the collapse path at the
ChatWidget level, and a collapsed run spanning the history/active
boundary.

Fixes #3896
The completion cache keys on the exact partial, so every keystroke
that changed the needle missed and ran a fresh synchronous workspace
walk on the UI thread — a gitignore-aware walk plus, for path-like
needles ('.', '/', '\\'), the gitignore-disabled local-reference walk
capped at 4096 paths (the same walk behind the #1921 WSL2 freezes).
Typing a path-like @mention in a large repo stalled the input loop on
every keystroke.

The walk now runs on a background thread (spawn_blocking_supervised +
shared-cell poll, same shape as the file-tree build and
workspace-context refresh): the cache key is factored into
MentionCompletionKey, at most one walk is in flight, and the latest
keystroke wins — a completed walk for an older needle is kept only as
the transient popup fallback under its true key, never returned as
the live needle's result. While a walk is in flight the popup keeps
showing the previous keystroke's entries (same session context only)
so it never closes mid-word and Enter keeps routing to the popup
instead of submitting the message. poll_mention_completion drains
finished walks once per event-loop iteration, sets needs_redraw only
when a result actually landed, and drops orphaned walks when the
cursor leaves the mention token.

Without a tokio runtime the walk runs synchronously, so completion
results for a given input are byte-identical to before and the
existing plain #[test] suite (including the cursor-move cache-reuse
and freshness pins) passes unchanged. Tab completion
(try_autocomplete_file_mention) stays synchronous: a deliberate
one-shot action, unlike per-keystroke popup refreshes.

Fixes #3899
…ing the whole message

While a message streams, every committed chunk bumped the cell's
revision and re-rendered the entire growing content: a full markdown
parse plus a full word-wrap/span build, O(N) per chunk and O(N²/chunk)
over the life of the stream. During an active stream chunks commit on
nearly every 24ms loop tick, so long answers got visibly slower to
render as they grew. The Ctrl+T overlay doubled the cost at a second
width.

The parser makes prefix stability provable: classification is per line
with a single carried bool (the code-fence state), so appended bytes
can never re-classify an earlier complete line — only the partial
trailing line is volatile. Rendering is per block except consecutive
table rows, whose column widths derive from the whole group, so a
trailing table run stays volatile until a non-table block closes it.

render_markdown_tagged_streaming keeps a small thread-local LRU of
StreamRenderState slots (main transcript + overlay widths + a thinking
body can interleave): per call it verifies the committed prefix with
one memcmp, parses only newly completed lines, freezes their rendered
lines, and re-renders just the volatile tail. Parse and render go
through the same helpers as the full path (parse_line_into,
render_blocks_tagged), and a debug_assert differential guard compares
against the full re-parse on every call in debug builds, so the whole
existing suite doubles as a byte-identity test. Streaming assistant
cells route through the new path via render_assistant_message*; the
final render after streaming ends takes the plain full-parse path, so
finished messages are a full re-parse by construction.

New tests: per-chunk byte-identity across chunk sizes 1/3/7 and widths
24/40/80 over headings/lists/fences/tables/links/CJK, an O(delta)
parsed-line-count guard, non-extension fallback, two-width
interleaving, trailing-table volatility, and char-by-char unclosed
fences.

Fixes #3897
claude added 2 commits July 2, 2026 15:26
An adversarial review of the five perf commits confirmed four real
regressions in the two fixes that moved work off the UI thread; this
closes all of them.

File tree (#3900): a finished background expand walk never triggered
a redraw — with the loop idle, an expanded directory looked empty
until the next unrelated input event. The event loop now drains
file-tree walks (initial build and expands) each iteration via
poll_background() and sets needs_redraw when results were applied,
matching the @mention poll wiring.

Mention completion (#3899), three fixes:
- Enter could submit a half-typed @mention during the first walk's
  in-flight window (popup routing keyed only on non-empty entries;
  the pre-async walk finished inside the key event, making that
  impossible). Enter is now held with a "Scanning files for @…"
  status while a walk for the live token is pending and the popup has
  nothing to show; Esc hides the pending popup; forced Ctrl+Enter
  submit keeps priority.
- The stale popup fallback was unbounded: cached results for an
  UNRELATED earlier mention token could be shown and Enter/Tab would
  splice the wrong file over the live token. The fallback now
  requires same token lineage (live partial extends the cached one or
  vice versa) in addition to same context.
- Superseded in-flight walks were dropped and respawned per
  keystroke; since a blocking walk cannot be cancelled, fast typing
  stacked concurrent full-workspace walks. The in-flight walk is now
  retained until it lands (its stale result is discarded), so real
  walks are serialized — at most one running — and the fresh needle's
  walk starts on the next call after landing.

Tests: pending-expand drain requests exactly one repaint; fallback
lineage positive/negative; in-flight walk retained across a needle
change; mention_walk_pending true only for the live, un-hidden token.

Follow-up to #3899/#3900 (PR #3902).
Resolves the conflicts with the #3861 constitution-first-setup merge:

- @mention completion: main landed the #3757 candidate cache (one
  needle-independent walk serves every keystroke of a non-path-like
  mention token, ranked in memory with a 4s TTL). Kept it as the fast
  path, adapted to the MentionCompletionKey cache form, and kept this
  branch's #3899 background walk for exactly the needles the candidate
  cache bypasses (path-like and browser) — the two fixes compose:
  non-path needles rarely walk at all, and when a walk is needed it no
  longer blocks the UI thread.
- app.rs: kept both structs/fields (MentionCompletionPending from
  #3899, MentionCandidateCache from #3757).
- sidebar.rs: kept both tests added at the same location (this
  branch's tree-sort exactly-once test and main's terminal-row
  collapse test).

cargo test -p codewhale-tui --locked after the merge: 5715 passed,
1 failed (sandbox::tests::test_parity_linux_landlock_available —
pre-existing container-environment failure, kernel lacks Landlock).
@Hmbown

Hmbown commented Jul 2, 2026

Copy link
Copy Markdown
Owner Author

Review — perf hot paths (verifying output-equivalence, not just speed)

Read-only review focused on the thing that matters for a perf change to render code: does each optimized path produce byte-identical output, and does every new cache invalidate correctly? Traced each fix against its issue.

Overall: merge-ready, with two tracked follow-ups. 4/5 fixes are correct with strong equivalence tests; the one real gap is bounded and has an Esc escape hatch.

Per-fix verdicts

  • perf(tui): avoid deep-cloning every visible transcript cell each frame in the collapse path #3896 transcript deep-clone → CORRECT. Slow path now borrows &HistoryCell and pre-materializes only the synthetic tool-run summaries; the per-cell reuse key (revision + index + layout/folded change) is unchanged, so collapse/expand/stream invalidation is unaffected. New .expect("summary cell materialized") push sites are provably safe (only called when run.start == idx && collapsed_run_starts.contains(idx)). Equivalence + reuse + frame-stability tests present.
  • perf(tui): streaming re-parses the whole growing message every chunk (O(N²) markdown) #3897 streaming O(N²) markdown → CORRECT (highest blast radius). Single shared parse_line_into classifier ⇒ classification identical by construction; only cross-line state is in_code_block; the tail parses against a scratch fence state; stable_end backs up over trailing table groups so column widths always see the whole group. A debug_assert!(rendered_lines_eq(full_reparse, streaming)) differential guard + width×chunk sweep tests pin it. Caveat: the guard is #[cfg(debug_assertions)] (compiled out in release), and this rewrites the render path for every streaming message — the most-repainted surface — so any future parser change that breaks prefix-stability would be a silent release-only visual regression. Keep the guard debug-gated and scrutinize future edits here. (Also: parse is now O(delta), but the frozen-lines clone() per committed chunk keeps an O(N) memcpy — still a big win, not fully linear.)
  • perf(tui): Tasks sidebar computes its rows twice per frame (redundant clone+sort) #3898 sidebar rows twice/frame → CORRECT. task_panel_row_sets computes once; rows + hover-texts share it, so they can't disagree across a mid-frame TTL boundary. Dedup/recent-gating preserved verbatim; the tree-sort move-materialization takes each slot exactly once (tested incl. a 2-node cycle).
  • perf(tui): path-like @mention completion re-walks the filesystem on every keystroke #3899 path-like @mention fs walk → RISKY (bounded). The core anti-race property holds: a completed walk is stored under its own needle key and only returned when it equals the live needle, so a slow old walk can't masquerade as the current one. But stale_mention_fallback_entries keeps the previous keystroke's entries visible for prefix-lineage tokens with the menu "open", so a fast Enter before the new walk lands can apply a completion that doesn't match the live token. Repro: @docs/a (shows docs/apple.md) → type b → Enter immediately → docs/apple.md inserted against @docs/ab. Pre-async, Enter always saw exact-needle results. Bounded (always a real file, prefix-lineage only) but a genuine equivalence gap. Minor secondary: if the walk thread panics, mention_completion_pending never clears and Enter is swallowed for that token until Esc.
  • perf(tui): file-tree expand does a synchronous recursive read_dir on the UI thread #3900 file-tree sync read_dir → CORRECT. Expand walk offloaded to spawn_blocking_supervised; per-dir seq drops superseded results; ancestor-collapsed-mid-walk and double-splice are guarded; cursor/scroll math adjusted on splice in/out. incremental_expand_matches_full_rebuild asserts entry-by-entry parity. Only visible change: ▼ shows for one poll cycle before children appear (progressive reveal) — imperceptible on local FS.

Suggested follow-ups (neither blocks the perf win)

  1. perf(tui): path-like @mention completion re-walks the filesystem on every keystroke #3899 (worth doing before merge if you want strict equivalence): filter the stale fallback to entries still matching the live needle, or don't let Enter apply the fallback while mention_completion_pending is set (hold it like the empty-menu case).
  2. perf(tui): path-like @mention completion re-walks the filesystem on every keystroke #3899 robustness: clear mention_completion_pending if a walk fails to produce a result, so a panicked walk can't soft-lock Enter for that token.

Test coverage is genuinely strong (not asserted-by-comment) — the one untested window is the #3899 fast-Enter-applies-stale case above.

(Detailed read; happy to open a small follow-up PR for the #3899 items if useful.)

1. Strict equivalence for the stale popup fallback: entries are now
   filtered to those still matching the LIVE needle (case-insensitive
   substring, mirroring the walk's own match rule), so a fast Enter
   after a keystroke can only apply an entry the fresh walk would also
   have returned — the reviewer's `docs/apple.md` vs `@docs/ab` repro
   is pinned by a test. When the filter empties the list the popup
   reads as pending and the existing Enter-hold takes over.

2. A panicking walk can no longer soft-lock Enter for its token:
   the walker closure now routes its result through a completion
   guard whose Drop writes an empty result on unwind, so
   mention_completion_pending always drains.

Requested in review on PR #3902.

Hmbown commented Jul 2, 2026

Copy link
Copy Markdown
Owner Author

Both #3899 follow-ups are in as ebf2313 — no separate PR needed:

  1. Fast-Enter-applies-stale (your docs/apple.md vs @docs/ab repro): took the stricter of your two options — the stale fallback now filters entries to those still matching the live needle (case-insensitive substring, same rule the walk itself uses), so anything shown or applied is a subset of what the fresh walk would return. When the filter empties the list, the popup reads as pending and the existing Enter-hold takes over. The exact repro is pinned by mention_fallback_drops_entries_that_no_longer_match_the_live_needle.
  2. Panicked-walk soft-lock: the walker now routes its result through a completion guard whose Drop writes an empty result on unwind, so mention_completion_pending always drains and Enter releases. Pinned by mention_walk_completion_fills_cell_even_on_unwind.

On the #3897 caveat: agreed and leaving the differential guard debug-gated as you suggest — every cargo test run exercises it, and the two O(N) residuals you noted (frozen-lines clone per chunk, plus the message-layer decoration pass) are called out in the commit message as the known follow-up if profiling ever warrants it.

Post-push suite: 5717 passed / 1 failed (test_parity_linux_landlock_available, container-environment only).


Generated by Claude Code

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants