Skip to content

fix(subagent): keep Codex OAuth fanout actionable#3974

Merged
Hmbown merged 4 commits into
mainfrom
codex/0867-codex-subagent-oauth
Jul 3, 2026
Merged

fix(subagent): keep Codex OAuth fanout actionable#3974
Hmbown merged 4 commits into
mainfrom
codex/0867-codex-subagent-oauth

Conversation

@Hmbown

@Hmbown Hmbown commented Jul 3, 2026

Copy link
Copy Markdown
Owner

Summary

  • make codewhale auth list report an active OpenAI Codex OAuth file instead of marking openai-codex missing when auth status can see the login
  • encode/decode Responses API function tool names with the existing reversible API-name mapping, fixing Codex child requests that inherited dotted/dashed tool names rejected as tools[n].name pattern violations
  • treat structured LlmError::RateLimited / 429-style Codex child failures as transient sub-agent provider interruptions so exhausted retries preserve a continuation checkpoint instead of becoming opaque terminal failures
  • treat an explicit CODEWHALE_HOME as a first-run isolation boundary for config, settings, TUI theme prefs, and doctor legacy-state diagnostics, while preserving normal HOME-based .deepseek migration for existing users
  • add 0.8.67 release-note coverage for the Codex OAuth/sub-agent diagnostics and fresh-home setup isolation slices

Refs #3884.

Verification

  • cargo fmt --check
  • git diff --check
  • cargo test -p codewhale-config --locked
  • cargo test -p codewhale-cli auth_ --locked
  • cargo test -p codewhale-tui --bin codewhale-tui --locked responses_ -- --nocapture
  • cargo test -p codewhale-tui --bin codewhale-tui --locked transient_provider_classifier -- --nocapture
  • cargo test -p codewhale-tui --bin codewhale-tui --locked subagent_failure_message_preserves_error_chain -- --nocapture
  • cargo test -p codewhale-tui --bin codewhale-tui --locked setup_ -- --nocapture
  • cargo test -p codewhale-tui --bin codewhale-tui --locked settings_load_migrates_legacy_deepseek_home_into_codewhale_home_without_explicit_home -- --nocapture
  • cargo test -p codewhale-tui --bin codewhale-tui --locked settings_load_migrates_platform_legacy_fallback_into_codewhale_home_without_explicit_home -- --nocapture
  • cargo test -p codewhale-tui --bin codewhale-tui --locked settings_load_ignores_legacy_files_when_codewhale_home_is_explicit -- --nocapture
  • cargo test -p codewhale-tui --bin codewhale-tui --locked doctor_state_roots_ignore_ambient_legacy_home_when_codewhale_home_is_explicit -- --nocapture

Manual smoke

The locally installed 0.8.67 candidate initially reproduced the real child failure after the opaque-message fix:

[invalid_request] Responses API request failed: Invalid request (400): Invalid 'tools[83].name': string does not match pattern...

That points to Responses tool-name encoding, not missing Codex OAuth. This PR patches that bridge.

After patching the bridge, an installed 0.8.67 smoke launched a Codex OAuth child agent and the child completed with PONG instead of failing with Responses API failed.

Copilot AI review requested due to automatic review settings July 3, 2026 17:08

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot was unable to review this pull request because the user who requested the review has reached their quota limit.

@claude

claude Bot commented Jul 3, 2026

Copy link
Copy Markdown

Claude finished @Hmbown's task in 5m 3s —— View job


Review: fix(subagent): keep Codex OAuth fanout actionable

  • Gather context (PR body, diff, custom review instructions)
  • Review crates/cli/src/lib.rs auth list/status changes
  • Review crates/tui/src/tools/subagent/mod.rs transient error classification
  • Review crates/tui/src/tools/subagent/tests.rs new test
  • Review CHANGELOG entries
  • Post final review

Note: cargo/git fetch are not in my allowed tools in this run, so I reviewed statically (reading the code paths and cross-referencing existing tests) rather than executing the verification commands listed in the PR body. If you want me to actually run them, add Bash(cargo test:*) / Bash(git fetch:*) to --allowedTools.

Summary

Small, well-scoped fix. Both changes are correct and consistent with existing patterns in the codebase.

1. auth list Codex OAuth reporting (crates/cli/src/lib.rs:1143-1177)
The new auth_list_lines mirrors the OpenaiCodex branching already used in auth_status_all_providers (line ~1097) and auth_status_lines_for_provider (line ~1190): env → oauth file → missing, bypassing the config/keyring checks that don't apply to Codex OAuth. This is now the third near-identical copy of that same three-way branch across the file. Not introduced by this PR (the first two already existed), but since a third copy was just added, it's a reasonable point to extract a shared helper, e.g. fn active_source_for(provider, env, oauth_file, file, keyring) -> &'static str, if you want to head off a fourth copy diverging later. Not blocking.

The new test (auth_list_treats_openai_codex_oauth_file_as_active, line ~3607) correctly clears OPENAI_CODEX_ACCESS_TOKEN/CODEX_ACCESS_TOKEN via the existing ScopedEnvVar/env_lock helpers so the OAuth-file path is actually exercised instead of the env path, and asserts the secret token content isn't leaked into the output. Good coverage.

2. Transient Codex rate-limit classification (crates/tui/src/tools/subagent/mod.rs:4580-4611)
Adding a structured error.downcast_ref::<LlmError>() check for LlmError::RateLimited ahead of the substring heuristics is the right fix — it's more robust than string matching and doesn't depend on the exact wording of Display. Confirmed this correctly traverses .context(...)-wrapped errors (anyhow's context-chain downcast recurses into the wrapped Error's vtable when the outer context type doesn't match), so transient_provider_classifier_matches_structured_rate_limit's use of .context("Responses API request failed") around the LlmError::RateLimited is a valid way to exercise this.

Minor/non-blocking: the added string needles "rate limited", "rate_limit", "rate_limited", "too many requests", "429" are now redundant with the new structured check for any error that's actually an LlmError::RateLimited — they only add value for non-LlmError transient rate-limit signals (e.g. raw HTTP client errors that never got wrapped into LlmError). That's plausibly intentional as a defense-in-depth fallback, so I'm not asking for removal, just flagging that "429" as a bare substring is looser than the rest of the list and could theoretically false-positive on an unrelated number embedded in an error message/path. Same caveat already applies to the pre-existing "502"/"503"/"504" entries, so this isn't a new class of risk.

3. Provider/model-route safety (EPIC #2608 invariant): N/A — this PR doesn't touch model/provider selection or infer anything from model-string prefixes.

4. CHANGELOG entries: Accurate, reference #3884, correctly placed under [Unreleased]Fixed in both CHANGELOG.md and crates/tui/CHANGELOG.md.

5. Security: No secret-handling concerns; the new test explicitly asserts the OAuth file's token contents aren't leaked into auth list output.

No blocking issues found. This is safe to land as-is.

Treat structured Codex rate-limit failures as transient sub-agent provider interruptions so exhausted retries preserve a continuation checkpoint instead of recording an opaque terminal failure.

Also align auth list with auth status for the OpenAI Codex OAuth file, so release/setup diagnostics do not claim the provider is missing while the detailed status sees the Codex login.

Refs #3884

Signed-off-by: Hunter B <hmbown@gmail.com>
@Hmbown Hmbown force-pushed the codex/0867-codex-subagent-oauth branch from 422a597 to f2c1fb5 Compare July 3, 2026 17:14
@claude

claude Bot commented Jul 3, 2026

Copy link
Copy Markdown

Claude finished @Hmbown's task in 1m 52s —— View job


Reviewing updated PR #3974

  • Gather context (diff vs origin/main, prior review comments)
  • Review crates/cli/src/lib.rs auth list/status changes
  • Review crates/tui/src/client/responses.rs changes (new file in diff)
  • Review crates/tui/src/tools/subagent/mod.rs transient error classification
  • Review crates/tui/src/tools/subagent/tests.rs test coverage
  • Review CHANGELOG entries
  • Post final review

Summary

This diff is the same single commit reviewed earlier, but this run's diff also includes crates/tui/src/client/responses.rs (+41/-5), which the earlier review pass didn't mention. That file's change is a real fix and belongs with the rest of the commit, so I re-reviewed the whole thing.

1. auth list Codex OAuth reporting (crates/cli/src/lib.rs) and transient rate-limit classification (crates/tui/src/tools/subagent/mod.rs) — unchanged from the prior pass; still correct. auth_list_lines mirrors the existing OpenaiCodex branch used by auth_status_all_providers/auth_status_lines_for_provider, and the new test correctly exercises the OAuth-file path via ScopedEnvVar and asserts no secret leakage. The structured LlmError::RateLimited downcast in is_transient_subagent_provider_error is correctly placed ahead of the substring heuristics and is exercised through an anyhow .context(...) wrapper in the new test, confirming the downcast traverses the context chain.

2. Responses API tool-name encoding (crates/tui/src/client/responses.rs) — this is the part not covered before, and it's a genuine, well-targeted fix:

  • to_api_tool_name/from_api_tool_name already existed in crates/tui/src/client.rs and were already used consistently on the Chat Completions path (client/chat.rs), but the Responses/Codex path (client/responses.rs) was sending/receiving raw tool names. Any tool name with characters outside [A-Za-z0-9_-] (e.g. the built-in web.run, or dotted MCP tool names) would previously go over the wire unescaped to Codex's Responses API — this directly explains why fixing it belongs with "keep Codex OAuth fanout actionable," since a subagent tool call with a dotted name could fail before ever reaching the rate-limit/retry logic this PR also touches.
  • Verified the fix is applied at all three write sites (tool_to_responses_function, convert_messages_to_responses_input's function_call serialization, and the streaming ContentBlockStart::ToolUse decode) and that the non-streaming handle_responses_message path reuses handle_responses_stream's already-decoded name (responses.rs:404-414), so there's no second path that needs its own decode call.
  • New tests (responses_input_encodes_tool_call_names, plus updating the schema-sanitization test to use web.run) exercise the encode direction; decode-direction correctness already has coverage in client.rs's existing round-trip tests. Good, non-redundant coverage.

3. Provider/model-route safety (EPIC #2608): N/A — no model/provider inference from prefixes or prompt text.

4. CHANGELOG entries: Accurate, reference #3884, correctly placed under [Unreleased]Fixed.

5. Security: No secret-handling concerns.

No blocking issues found. Safe to land.

Treat an explicit CODEWHALE_HOME as a fresh-launch isolation boundary for config, settings, TUI theme prefs, and doctor legacy-state diagnostics. Normal HOME-based legacy migration still falls back to ~/.deepseek for existing users, but release smoke tests and scripted first-run launches no longer inherit unrelated ambient legacy files.

Tests: cargo fmt --check; git diff --check; cargo test -p codewhale-config --locked; cargo test -p codewhale-tui --bin codewhale-tui --locked setup_ -- --nocapture; cargo test -p codewhale-tui --bin codewhale-tui --locked responses_ -- --nocapture; cargo test -p codewhale-tui --bin codewhale-tui --locked transient_provider_classifier -- --nocapture
Signed-off-by: Hunter B <hmbown@gmail.com>
@claude

claude Bot commented Jul 3, 2026

Copy link
Copy Markdown

Claude finished @Hmbown's task in 2m 8s —— View job


Reviewing updated PR #3974 (new commit: fix(setup): isolate explicit CodeWhale homes)

  • Gather context (diff vs origin/main, prior review comments)
  • Review crates/config/src/lib.rs + crates/config/src/tests.rs changes
  • Review crates/tui/src/config_persistence.rs changes
  • Review crates/tui/src/main.rs changes
  • Review crates/tui/src/settings.rs changes
  • Confirm no regressions to previously-reviewed files (cli/lib.rs, responses.rs, subagent/mod.rs — unchanged in this commit)
  • Post final review

Note: cargo is not in my allowed tools in this run, so I couldn't execute cargo test -p codewhale-config --locked / the setup_/doctor_ test targets listed in the PR body. I reviewed statically by reading the resolver and test code paths. If you want the verification commands actually run, add Bash(cargo test:*) to --allowedTools.

Summary

This commit adds a new isolation boundary: when CODEWHALE_HOME is explicitly set, codewhale_home_is_explicit() (new, crates/config/src/lib.rs:3299-3305) short-circuits every read-path resolver so it never falls back to (or migrates from) ambient ~/.deepseek legacy state. The change is applied consistently across all four affected resolvers:

  • default_config_path / migrate_config_if_needed (crates/config/src/lib.rs:3726, :3758)
  • config_toml_path (crates/tui/src/config_persistence.rs:589)
  • Settings::settings_path_candidates (crates/tui/src/settings.rs:1227)
  • TuiPrefs::path (crates/tui/src/settings.rs:118)
  • doctor_state_roots (crates/tui/src/main.rs:3633), consumed by both the human-readable and JSON doctor reports

Each new short-circuit is paired with a targeted regression test that plants an ambient legacy file, sets CODEWHALE_HOME explicitly, and asserts the legacy file is neither read nor migrated. The two existing settings-migration tests were correctly renamed (..._without_explicit_home) and updated to remove_var("CODEWHALE_HOME") instead of setting it to the same primary path — this is a real fix, since under the old test setup CODEWHALE_HOME being set unconditionally would have made the new codewhale_home_is_explicit() check short-circuit those tests once the check landed, silently breaking the migration path they're meant to exercise.

I verified resolve_settings_path_from_candidates/resolve_tui_prefs_path_from_candidates (the shared helpers consuming these candidates) already treat None candidates as "skip," so returning (primary, None, None) / omitting legacy_home when explicit needs no further changes there — confirmed correct by reading crates/tui/src/settings.rs:194-213 and :1182-1207.

Minor/non-blocking — duplicated check in crates/config/src/lib.rs: resolve_state_dir (line 3362) and ensure_state_dir_with_migration (line 3434) both still inline codewhale_home_env_override().is_some() rather than calling the new codewhale_home_is_explicit() helper this commit introduces for the exact same check. Not a bug (both express the same boolean), but now there are two spellings of "is this an explicit CodeWhale home" in the same file; worth swapping those two call sites to the new public helper for consistency now that it exists, so a future change to the check's semantics doesn't need to remember both spots.

doctor_state_roots legacy-root semantics: worth double-checking intent — when CODEWHALE_HOME is explicit, legacy_home becomes code_home.join(LEGACY_APP_DIR) (i.e. $CODEWHALE_HOME/.deepseek) rather than None/absent. This means the doctor report can still flag a nested .deepseek dir inside the explicit home as "legacy state," which looks intentional (matches the new test doctor_state_roots_ignore_ambient_legacy_home_when_codewhale_home_is_explicit) and is consistent with treating the explicit home as a self-contained root rather than fully disabling legacy diagnostics.

Provider/model-route safety (EPIC #2608): N/A — no model/provider inference from prefixes or prompt text; this is purely home-directory resolution.

Tests: Good coverage — one test per resolver isolating the explicit-home behavior, plus the corrected pre-existing migration tests.

CHANGELOG: Accurate, correctly appended under [Unreleased]Fixed in both CHANGELOG.md and crates/tui/CHANGELOG.md.

Security: No secret-handling concerns.

No blocking issues found. Safe to land as-is; the resolve_state_dir/ensure_state_dir_with_migration duplication note above is optional cleanup.

The explicit-home regression test in config_persistence set CODEWHALE_HOME but the local EnvGuard only restored HOME, USERPROFILE, CODEWHALE_CONFIG_PATH, and DEEPSEEK_CONFIG_PATH. Under full-suite parallelism that leaked the explicit home into neighboring path tests on macOS/Windows. Track CODEWHALE_HOME in the guard so the isolation regression remains hermetic.

Tests: cargo fmt --check; cargo test -p codewhale-tui --bin codewhale-tui --locked config_toml_path_ -- --nocapture; cargo test -p codewhale-tui --bin codewhale-tui --locked persist_hotbar_bindings_writes_primary_config_path_for_fresh_installs -- --nocapture; cargo test -p codewhale-tui --bin codewhale-tui --locked
Signed-off-by: Hunter B <hmbown@gmail.com>
@claude

claude Bot commented Jul 3, 2026

Copy link
Copy Markdown

Claude finished @Hmbown's task in 1m 36s —— View job


Reviewing updated PR #3974 (new commit: test(setup): restore CODEWHALE_HOME in persistence tests)

  • Gather context (diff vs origin/main, prior review comments)
  • Review new commit ae516fe (crates/tui/src/config_persistence.rs)
  • Confirm no regressions to previously-reviewed files (cli/lib.rs, responses.rs, subagent/mod.rs, config/lib.rs+tests.rs, tui/main.rs, tui/settings.rs — unchanged since last pass)
  • Post final review

Note: cargo is not in my allowed tools in this run, so I couldn't execute the config_toml_path_* / persist_hotbar_bindings_* verification commands listed in the PR body. Reviewed statically. If you want them actually run, add Bash(cargo test:*)/Bash(cargo fmt:*) to --allowedTools.

Summary

This commit is a targeted, correct fix for a real test-isolation gap that the prior commit (fix(setup): isolate explicit CodeWhale homes) introduced.

config_persistence.rs's local EnvGuard (crates/tui/src/config_persistence.rs:632-735) predates this PR and only ever tracked HOME/USERPROFILE/CODEWHALE_CONFIG_PATH/DEEPSEEK_CONFIG_PATH. When the prior commit added config_toml_path_ignores_legacy_config_when_codewhale_home_is_explicit (line ~800), that test sets CODEWHALE_HOME directly via env::set_var after EnvGuard::new, but the guard's Drop never restored/cleared it — so once that test ran, CODEWHALE_HOME stayed set to the test's temp explicit_home path for the remainder of the process. This commit fixes that by:

  • capturing codewhale_home_prev in EnvGuard::new and explicitly remove_var-ing it during setup (so every test using this guard starts from a clean slate regardless of what a previous test in the same serialized group left behind) — config_persistence.rs:649,657
  • restoring the previous value (or removing it if there wasn't one) in Dropconfig_persistence.rs:699-709

I checked whether this is genuinely load-bearing rather than defense-in-depth: lock_test_env() (crate::test_support::lock_test_env) is a single process-wide mutex used by dozens of test modules across crates/tui/src (settings.rs, main.rs, config_ui.rs, commands/groups/**, etc.), not just this file. So a leaked CODEWHALE_HOME from this module's tests genuinely can bleed into any other module's tests that also take that lock (which is most of them) and don't independently set/clear CODEWHALE_HOME themselves — this matches the commit message's "leaked into neighboring path tests on macOS/Windows" description exactly.

I also spot-checked that the other explicit-CODEWHALE_HOME tests added earlier in this PR (crates/config/src/tests.rs, crates/tui/src/settings.rs, crates/tui/src/main.rs) already restore CODEWHALE_HOME correctly via their own inline capture/restore or the shared EnvVarRestore/EnvVarGuard helpers — so this config_persistence.rs guard was the one place with the gap, and this commit closes it without needing changes elsewhere.

Provider/model-route safety (EPIC #2608): N/A — pure test-harness env-var hygiene, no model/provider inference.

Tests: This commit is the test-hygiene fix; no new test cases needed since it repairs isolation for an existing test rather than adding new behavior.

Security: No secret-handling concerns; unsafe blocks are pre-existing pattern in this test module (env mutation under a global mutex), each still carries the same "test-only environment mutation guarded by a global mutex" comment as its neighbors.

No blocking issues found. Safe to land.

Signed-off-by: Hmbown <101357273+Hmbown@users.noreply.github.com>
@claude

claude Bot commented Jul 3, 2026

Copy link
Copy Markdown

Claude finished @Hmbown's task in 1m 5s —— View job


Reviewing updated PR #3974 (new commit: test(settings): avoid platform low-motion coupling)

  • Gather context (diff vs origin/main, prior review comments)
  • Review new commit 7b7c4b5 (crates/tui/src/settings.rs)
  • Confirm no regressions to previously-reviewed files (cli/lib.rs, responses.rs, subagent/mod.rs, config/lib.rs+tests.rs, tui/main.rs, tui/config_persistence.rs — unchanged since last pass)
  • Post final review

Note: cargo is not in my allowed tools in this run, so I couldn't execute settings_load_ignores_legacy_files_when_codewhale_home_is_explicit to confirm it passes. Reviewed statically. If you want it actually run, add Bash(cargo test:*) to --allowedTools.

Summary

This commit is a targeted test-flakiness fix for settings_load_ignores_legacy_files_when_codewhale_home_is_explicit (crates/tui/src/settings.rs:2834), and it's correct.

The test previously wrote low_motion = true into an ambient legacy .deepseek/settings.toml and asserted !loaded.low_motion to prove the explicit-CODEWHALE_HOME isolation logic (added earlier in this PR) doesn't inherit legacy state. The problem: Settings::apply_env_overrides (settings.rs:537-599) unconditionally forces low_motion = true whenever it detects SSH_CLIENT/SSH_TTY, TMUX/STY, TERM_PROGRAM of vscode/ghostty/Termius, or a legacy Windows console host — several of which are commonly present in CI runners (SSH-based runners, tmux-wrapped test harnesses, etc.). That means the assertion could pass or fail depending on the ambient test-runner environment, independent of whether the legacy-file-isolation logic under test actually worked — a false negative (or false positive) risk on some platforms/CI configs, matching the commit's "avoid platform low-motion coupling" description.

The fix swaps the probed field for three settings that have no such env-driven overlay — theme, composer_density, sidebar_width_percent. I confirmed via grep that none of these three appear anywhere in apply_env_overrides or any other post-load override path, so their post-load value is purely a function of what was (or wasn't) read from disk — exactly what this test needs to isolate. The asserted defaults ("system", "comfortable", 28) match Settings::default() at settings.rs:385,387,392, and the legacy file now sets non-default values ("dracula", "spacious", 42) for all three, so the test still meaningfully exercises "legacy values are visible on disk but must not be loaded."

Provider/model-route safety (EPIC #2608): N/A — pure settings-test hygiene, no model/provider inference.

Tests: This commit is the test fix; no new coverage needed since it repairs an existing test's platform-independence rather than adding new behavior.

Security: No secret-handling concerns.

No blocking issues found. Safe to land.

@Hmbown Hmbown merged commit a8cd68e into main Jul 3, 2026
18 checks passed
@Hmbown Hmbown deleted the codex/0867-codex-subagent-oauth branch July 3, 2026 20:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants