Skip to content

feat(tui): auto-discover .codewhale/rules/ and .claude/rules/ directories as project context#3892

Merged
Hmbown merged 3 commits into
Hmbown:mainfrom
yekern:codex/rules-dir-auto-discovery
Jul 3, 2026
Merged

feat(tui): auto-discover .codewhale/rules/ and .claude/rules/ directories as project context#3892
Hmbown merged 3 commits into
Hmbown:mainfrom
yekern:codex/rules-dir-auto-discovery

Conversation

@yekern

@yekern yekern commented Jul 2, 2026

Copy link
Copy Markdown
Contributor

PR: feat(tui) — auto-discover .codewhale/rules/ and .claude/rules/ directories as project context

Closes #3867

Summary

Add rules-directory auto-discovery to load_project_context(): on every session start,
CodeWhale automatically scans .codewhale/rules/ (native) and .claude/rules/ (Claude compat)
for .md files, loads them in filename order, and appends them to the project-context block
injected into the system prompt. Each rule is wrapped in a <project_rule source="…"> element.

This completes solution D from the design anchor issue #3867 — the same trust model as
AGENTS.md (workspace-contained content only, no absolute-path escape), with no #417
project-config relaxation required.

Motivation

Before this PR, CodeWhale's instruction system was nearly unusable in multi-project workflows:

  1. instructions config key blocked at project scope since v0.8.8 (PRIOR: Ignore dangerous project-level config keys #417) — users could
    only list rule files in ~/.codewhale/config.toml, making it painful to maintain
    per-project rules across many repositories.
  2. No rules-directory auto-discovery — Claude Code's .claude/rules/ auto-loads all
    .md files; CodeWhale had no equivalent and no mechanism to load multiple rule files
    without manual config.
  3. No glob support in instructions_paths(), so even instructions = [".claude/rules/*.md"]
    was impossible.

The recommended path from the #3867 design discussion was D first — rules-directory
auto-discovery sits in the same trust class as AGENTS.md, needs no #417 relaxation, and
delivers the majority of multi-project pain relief on its own. This PR implements that slice.

Design decisions

rules_block vs mixing into instructions

Rules are stored in a separate rules_block: Option<String> field on ProjectContext,
not mixed into instructions. This is essential for mono-repo support:

  • has_instructions() controls whether the parent-directory traversal searches for a root
    AGENTS.md. If rules alone set instructions, they would block parent discovery.
  • By keeping rules in rules_block, has_instructions() stays unchanged (only reflects
    main instructions), and parent traversal works correctly.
  • as_system_block() appends rules_block after instructions at render time, so both
    are present in the final system prompt.

Security model

Same trust class as AGENTS.md:

  • Workspace-subtree only — rules live in .codewhale/rules/ or .claude/rules/ within
    the project. No absolute-path escape.
  • Symlink refusalload_context_file() (shared with AGENTS.md) rejects symlinked files,
    matching the existing precedent in read_project_config_file.
  • Capped at 50 files per directory (MAX_RULES_FILES) to prevent abuse.
  • 100 KB per file (MAX_CONTEXT_SIZE) inherited from the context loader.

No #417 relaxation

merge_project_config's rejection of project-scope instructions is left unchanged.
Scheme D is orthogonal to #417 — it doesn't touch the config key at all.

Changes

crates/tui/src/project_context.rs (+~190 lines)

New constants:

  • RULES_DIRS = [".codewhale/rules", ".claude/rules"] — directories scanned in order
  • MAX_RULES_FILES = 50 — per-directory file cap

New field on ProjectContext:

  • rules_block: Option<String> — holds the assembled rules XML, separate from instructions

New function load_rules_from_dir():

  • Scans a rules directory for *.md files
  • Sorts by filename for deterministic order
  • Reuses load_context_file() for size checking + symlink safety + empty-file rejection
  • Returns Vec<(PathBuf, String)> — silently returns empty on missing/unreadable directories

Modified load_project_context():

  • After loading PROJECT_CONTEXT_FILES (AGENTS.md etc.), iterates RULES_DIRS and calls
    load_rules_from_dir()
  • Wraps each rule file in <project_rule source="…">…</project_rule>
  • Stores assembled rules in ctx.rules_block (not ctx.instructions, preserving parent traversal)

Modified as_system_block():

  • Appends rules_block inside the project-context block when instructions exist
  • Emits rules_block standalone when no main instructions are present
  • Emits rules_block after constitution when constitution exists but instructions don't

Modified project_context_cache_candidate_paths():

  • Scans RULES_DIRS for *.md files and adds them to the cache-key candidate list
  • Ensures rules changes invalidate the project-context cache (editing a rule file,
    adding/removing rule files all produce a different cache key)

9 new tests:

Test What it covers
rules_from_codewhale_dir_are_loaded_as_project_context Basic discovery + <project_rule> wrapper
rules_are_loaded_in_filename_order Deterministic filename sort (aaa < mmm < zzz)
rules_from_claude_dir_are_compat_loaded .claude/rules/ compatibility
rules_directory_missing_does_not_crash Graceful handling of missing directories
rules_coexist_with_agents_md AGENTS.md + rules coexist, AGENTS.md precedes rules
non_md_files_in_rules_dir_are_ignored Only *.md files are loaded
rules_cap_truncates_excess_files MAX_RULES_FILES=50 enforced
rules_rejects_symlinked_files Symlinked rule files are refused (unix only)
rules_from_both_dirs_are_loaded_together Dual directory support + correct priority order

crates/tui/src/context_report.rs (+18 lines)

  • /context report now includes rules_block content when rules are present
  • When only rules exist (no main instructions), they appear as a separate "Project rules" entry

crates/tui/src/project_context_cache.rs (+28 lines, 2 tests)

  • signature_changes_when_rules_file_changes — verifies content change triggers cache invalidation
  • signature_changes_when_rules_file_is_added_or_removed — verifies file addition/removal triggers invalidation

Verification

Check Result
cargo fmt --all -- --check clean
cargo clippy -p codewhale-tui (our files only) clean
cargo test -p codewhale-tui --bin codewhale-tui -- project_context 56 passed, 0 failed
cargo test -p codewhale-tui --bin codewhale-tui -- project_context_cache 7 passed, 0 failed
cargo test -p codewhale-tui --bin codewhale-tui -- context_report 9 passed, 0 failed

System prompt structure (with rules)

┌─ System Prompt ──────────────────────────────────────────────┐
│ [mode prompt + constitution]                                  │
│                                                                │
│ <project_instructions source="AGENTS.md">                      │
│   ...AGENTS.md content...                                     │
│ </project_instructions>                                       │
│                                                                │
│ <project_rule source=".codewhale/rules/coding-style.md">      │
│   ...rule content...                                          │
│ </project_rule>                                               │
│ <project_rule source=".codewhale/rules/testing.md">           │
│   ...rule content...                                          │
│ </project_rule>                                               │
│                                                                │
│ ── volatile boundary ──                                       │
│ ## Environment …                                              │
│ <instructions source="~/global.md">…</instructions>           │
└────────────────────────────────────────────────────────────────┘

Audit summary

A comprehensive cross-system audit (2 rounds, 5 dimensions) was performed to ensure no
regressions or unexpected interactions:

Audit scope Verdict Details
Prompt byte-stability ✅ Safe Rules in static layer (same as AGENTS.md). KV cache busts on rule changes — by design.
All prompt construction paths ✅ Covered TUI, engine init, refresh_system_prompt, build_system_prompt all go through as_system_block().
Sub-agent / Fleet 🟡 Pre-existing Model-visible agent tool ✔️ inherits rules via fork_context. Background /agent path ❌ uses static prompt — same pre-existing limitation as AGENTS.md.
WhaleFlow ✅ No interaction Independent crate, no project-context references.
Project-context cache ✅ Fixed Cache key now includes rules directory files. Tested for content change + file addition/removal.
Parent-directory AGENTS.md ✅ Preserved rules_block separated from instructionshas_instructions() unchanged.
#417 project-config ✅ Unchanged merge_project_config's instructions rejection untouched.

What this PR does NOT do (deferred to future milestones)

  • Glob support in instructions_paths() (scheme C)
  • Path restriction for project-scope instructions relaxation (scheme B)
  • Conditional rule loading with YAML frontmatter / paths matching (scheme E)
  • Trust gating for project-scope instructions (scheme A)

These are tracked in #3867 as separate workstreams.

Migration path

  • New projects: create .codewhale/rules/ (or .claude/rules/) and drop .md files.
    No config changes needed — rules are auto-discovered on next session start.
  • Existing .claude/rules/ users: rules are picked up automatically — zero migration cost.
  • Existing global instructions users: both channels are additive (project rules + global
    instructions coexist in the system prompt), so no conflict.

PR:feat(tui) — 自动发现 .codewhale/rules/.claude/rules/ 目录作为项目上下文

Closes #3867

概述

load_project_context() 新增 rules 目录自动发现:每次会话启动时,CodeWhale
自动扫描 .codewhale/rules/(原生)和 .claude/rules/(Claude 兼容)目录下的 .md
文件,按文件名排序加载,追加到注入 system prompt 的项目上下文块中。每条规则包裹在
<project_rule source="…"> 元素中。

这是设计锚点 issue #3867方案 D 的实现——与 AGENTS.md 相同的安全模型(仅限
工作区内容,无绝对路径逃逸),不需要 relax #417 项目级配置限制。

动机

此 PR 之前,CodeWhale 在多项目场景下的规则系统几乎不可用:

  1. instructions 配置项被项目级禁止(自 v0.8.8 PRIOR: Ignore dangerous project-level config keys #417)——用户只能在
    ~/.codewhale/config.toml 中列举规则文件,跨多个仓库维护极其痛苦。
  2. 无 rules 目录自动发现——Claude Code 的 .claude/rules/ 自动加载所有 .md
    文件;CodeWhale 没有对应机制,且无法批量加载多文件规则。
  3. instructions_paths() 不支持 glob,即使写 instructions = [".claude/rules/*.md"]
    也是无效的。

#3867 设计讨论的推荐路径是 D 优先——rules 目录自动发现与 AGENTS.md 同安全等级,
无需改动 #417,且能独立解决多项目痛点的大部分。本 PR 实现该方案。

设计决策

rules_block 分离 vs 混入 instructions

Rules 存储在 ProjectContext独立字段 rules_block: Option<String> 中,不混入
instructions。这对 mono-repo 场景至关重要:

  • has_instructions() 控制是否向上搜索父目录的 AGENTS.md。若 rules 单独设置了
    instructions,会阻止父目录发现。
  • 将 rules 保持在 rules_block 中,has_instructions() 保持不变(仅反映主指令),
    父目录遍历正常工作。
  • as_system_block() 在渲染时将 rules_block 追在 instructions 之后,两者都出现在
    最终 system prompt 中。

安全模型

AGENTS.md 同等级:

  • 仅限工作区子树——rules 位于项目内的 .codewhale/rules/.claude/rules/
    无绝对路径逃逸。
  • 拒绝软链接——load_context_file()(与 AGENTS.md 共享)拒绝软链接文件,与
    read_project_config_file 中的现有先例一致。
  • 每目录上限 50 文件MAX_RULES_FILES)防止滥用。
  • 每文件 100 KBMAX_CONTEXT_SIZE)继承自上下文加载器。

不触碰 #417

merge_project_config 对项目级 instructions 的拒绝保持原样。方案 D 与 #417
完全正交——不涉及配置项。

改动

crates/tui/src/project_context.rs(+~190 行)

新增常量:

  • RULES_DIRS = [".codewhale/rules", ".claude/rules"] — 按顺序扫描的目录
  • MAX_RULES_FILES = 50 — 每目录文件上限

ProjectContext 新增字段:

  • rules_block: Option<String> — 存放组装好的 rules XML,与 instructions 分离

新增函数 load_rules_from_dir()

  • 扫描 rules 目录中的 *.md 文件
  • 按文件名排序,保证确定性顺序
  • 复用 load_context_file() 做大小检查 + 软链接安全 + 空文件拒绝
  • 返回 Vec<(PathBuf, String)> — 目录缺失或不可读时静默返回空 vector

修改 load_project_context()

  • 加载 PROJECT_CONTEXT_FILES(AGENTS.md 等)后,遍历 RULES_DIRS 调用
    load_rules_from_dir()
  • 将每条规则包裹在 <project_rule source="…">…</project_rule>
  • 组装结果存入 ctx.rules_block(而非 ctx.instructions,保留父目录遍历)

修改 as_system_block()

  • instructions 存在时,将 rules_block 追在项目上下文块中
  • 无主指令时,独立输出 rules_block
  • constitution 存在但 instructions 不存在时,constitution 后输出 rules_block

修改 project_context_cache_candidate_paths()

  • 扫描 RULES_DIRS 中的 *.md 文件,加入缓存 key 候选列表
  • 确保 rules 变更触发项目上下文缓存失效(编辑规则文件、新增/删除规则文件均产生不同缓存 key)

9 个新测试:

测试 覆盖
rules_from_codewhale_dir_are_loaded_as_project_context 基础发现 + <project_rule> 包裹
rules_are_loaded_in_filename_order 确定性文件名排序(aaa < mmm < zzz)
rules_from_claude_dir_are_compat_loaded .claude/rules/ 兼容
rules_directory_missing_does_not_crash 目录缺失不崩溃
rules_coexist_with_agents_md AGENTS.md + rules 共存,AGENTS.md 在前
non_md_files_in_rules_dir_are_ignored 仅加载 *.md
rules_cap_truncates_excess_files MAX_RULES_FILES=50 强制
rules_rejects_symlinked_files 拒绝软链接规则文件(仅 unix)
rules_from_both_dirs_are_loaded_together 双目录共存 + 正确优先级

crates/tui/src/context_report.rs(+18 行)

  • /context report 现在在 rules 存在时包含 rules_block 内容
  • 仅 rules 存在(无主指令)时显示为独立的"Project rules"条目

crates/tui/src/project_context_cache.rs(+28 行,2 个测试)

  • signature_changes_when_rules_file_changes — 验证内容变更触发缓存失效
  • signature_changes_when_rules_file_is_added_or_removed — 验证文件增删触发失效

验证

检查项 结果
cargo fmt --all -- --check clean
cargo clippy -p codewhale-tui(仅本次改动文件) clean
cargo test -p codewhale-tui --bin codewhale-tui -- project_context 56 passed, 0 failed
cargo test -p codewhale-tui --bin codewhale-tui -- project_context_cache 7 passed, 0 failed
cargo test -p codewhale-tui --bin codewhale-tui -- context_report 9 passed, 0 failed

System prompt 结构(含 rules)

┌─ System Prompt ──────────────────────────────────────────────────┐
│ [mode prompt + constitution]                                      │
│                                                                    │
│ <project_instructions source="AGENTS.md">                          │
│   ...AGENTS.md 内容...                                             │
│ </project_instructions>                                           │
│                                                                    │
│ <project_rule source=".codewhale/rules/coding-style.md">          │
│   ...规则内容...                                                   │
│ </project_rule>                                                   │
│ <project_rule source=".codewhale/rules/testing.md">               │
│   ...规则内容...                                                   │
│ </project_rule>                                                   │
│                                                                    │
│ ── volatile boundary ──                                           │
│ ## Environment …                                                  │
│ <instructions source="~/global.md">…</instructions>               │
└────────────────────────────────────────────────────────────────────┘

审计摘要

进行了全面的跨系统审计(2 轮、5 个维度),确保无回归或意外交互:

审计范围 结论 详情
Prompt 字节稳定性 ✅ 安全 Rules 在静态层(与 AGENTS.md 一致)。KV cache 随规则变更刷新——设计如此。
所有 prompt 构造路径 ✅ 全覆盖 TUI、engine init、refresh_system_prompt、build_system_prompt 均经过 as_system_block()
子任务 / Fleet 🟡 预存限制 模型可见的 agent 工具 ✔️ 通过 fork_context 继承 rules。后台 /agent 路径 ❌ 使用静态 prompt——与 AGENTS.md 相同的预存限制。
WhaleFlow ✅ 无交互 独立 crate,无项目上下文引用。
项目上下文缓存 ✅ 已修复 缓存 key 现在包含 rules 目录文件。已验证内容变更 + 文件增删。
父目录 AGENTS.md ✅ 保持 rules_blockinstructions 分离——has_instructions() 不变。
#417 项目配置 ✅ 未触碰 merge_project_configinstructions 拒绝保持不变。

本 PR 不包含的内容(推迟到后续 milestone)

  • Glob 支持 instructions_paths()(方案 C)
  • 路径限制 放宽项目级 instructions(方案 B)
  • 按需加载 YAML frontmatter / paths 匹配(方案 E)
  • Trust gating 项目级 instructions(方案 A)

以上在 #3867 中作为独立工作流跟踪。

迁移路径

  • 新项目:创建 .codewhale/rules/(或 .claude/rules/)并放入 .md 文件。
    无需配置变更——下次会话启动时自动发现 rules。
  • 现有 .claude/rules/ 用户:rules 直接生效——零迁移成本。
  • 现有全局 instructions 用户:两个通道是叠加关系(项目 rules + 全局 instructions
    共存于 system prompt),无冲突。

@yekern yekern requested a review from Hmbown as a code owner July 2, 2026 09:45
@github-actions

github-actions Bot commented Jul 2, 2026

Copy link
Copy Markdown

Thanks @yekern for taking the time to contribute.

This repository is observing a maintainer-managed PR intake gate in dry-run mode, so this pull request is staying open. This note helps maintainers prepare the allowlist before any enforcement is considered.

Please read CONTRIBUTING.md for the expected contribution shape. A maintainer can grant recurring PR access by commenting /lgtm on a pull request.

@LeoLin990405 LeoLin990405 left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the ping — really nice work, and cleanly scoped exactly as discussed: no #417 relaxation, merge_project_config untouched, and keeping rules_block separate from instructions so has_instructions() isn't poisoned is the right call for parent-directory AGENTS.md traversal in mono-repos. The cache-invalidation catch (adding the rules *.md to project_context_cache_candidate_paths) is a good find, the 50-file cap + deterministic filename order are sensible, and reusing load_context_file gets you the size check + per-file symlink safety for free.

One security gap worth closing before this lands, since it's exactly the "escape the workspace subtree" class Hunter flagged:

A symlinked rules directory escapes the workspace. rules_rejects_symlinked_files covers a symlinked .md file, but nothing checks whether .codewhale/rules / .claude/rules is itself a symlink. load_rules_from_dir (and the cache-path enumerator) call fs::read_dir(workspace.join(dir)) directly, which follows a directory symlink. The files behind it are real, so the per-file is_symlink() check in load_context_file passes them through:

$ ln -s /some/outside/dir .codewhale/rules     # real .md files live in /some/outside/dir
# symlink_metadata(".codewhale/rules/secret.md") → is_symlink=false, is_file=true
# → load_context_file reads /some/outside/dir/secret.md and injects it into project context

Confirmed locally: a repo shipping .codewhale/rules -> /some/outside/dir gets that directory's *.md read into the prompt at load_project_context time — before any command approval, including in read-only/plan mode. It's .md-only, so it's information disclosure rather than arbitrary read, but it still reads files the repo doesn't own, which is the #417 concern.

Suggested guard — refuse a symlinked rules dir (mirrors the existing file-level Refusing symlinked context file precedent):

// A repo could point .codewhale/rules at a path outside the workspace;
// refuse a symlinked rules directory so real .md files behind it aren't read.
if fs::symlink_metadata(&rules_dir)
    .map(|m| m.file_type().is_symlink())
    .unwrap_or(false)
{
    tracing::warn!(target: "project_context", dir = %rules_dir.display(), "Refusing symlinked rules directory");
    return entries;
}

Two follow-ups: the same guard needs to go in project_context_cache_candidate_paths (it re-scans the directory independently), and a rules_rejects_symlinked_directory test would lock it in. Since the directory scan is now duplicated in both places, it might be worth a small shared rules_md_files(workspace, dir) helper so the symlink guard can't drift between the load path and the cache path.

Everything else looks solid. 🐳

@aidaiprivate-source

Copy link
Copy Markdown

PR: feat(tui) — auto-discover .codewhale/rules/ and .claude/rules/ directories as project context

Closes #3867

Summary

Add rules-directory auto-discovery to load_project_context(): on every session start,
CodeWhale automatically scans .codewhale/rules/ (native) and .claude/rules/ (Claude compat)
for .md files, loads them in filename order, and appends them to the project-context block
injected into the system prompt. Each rule is wrapped in a <project_rule source="…"> element.

This completes solution D from the design anchor issue #3867 — the same trust model as
AGENTS.md (workspace-contained content only, no absolute-path escape), with no #417
project-config relaxation required.

Motivation

Before this PR, CodeWhale's instruction system was nearly unusable in multi-project workflows:

  1. instructions config key blocked at project scope since v0.8.8 (PRIOR: Ignore dangerous project-level config keys #417) — users could
    only list rule files in ~/.codewhale/config.toml, making it painful to maintain
    per-project rules across many repositories.
  2. No rules-directory auto-discovery — Claude Code's .claude/rules/ auto-loads all
    .md files; CodeWhale had no equivalent and no mechanism to load multiple rule files
    without manual config.
  3. No glob support in instructions_paths(), so even instructions = [".claude/rules/*.md"]
    was impossible.

The recommended path from the #3867 design discussion was D first — rules-directory
auto-discovery sits in the same trust class as AGENTS.md, needs no #417 relaxation, and
delivers the majority of multi-project pain relief on its own. This PR implements that slice.

Design decisions

rules_block vs mixing into instructions

Rules are stored in a separate rules_block: Option<String> field on ProjectContext,
not mixed into instructions. This is essential for mono-repo support:

  • has_instructions() controls whether the parent-directory traversal searches for a root
    AGENTS.md. If rules alone set instructions, they would block parent discovery.
  • By keeping rules in rules_block, has_instructions() stays unchanged (only reflects
    main instructions), and parent traversal works correctly.
  • as_system_block() appends rules_block after instructions at render time, so both
    are present in the final system prompt.

Security model

Same trust class as AGENTS.md:

  • Workspace-subtree only — rules live in .codewhale/rules/ or .claude/rules/ within
    the project. No absolute-path escape.
  • Symlink refusalload_context_file() (shared with AGENTS.md) rejects symlinked files,
    matching the existing precedent in read_project_config_file.
  • Capped at 50 files per directory (MAX_RULES_FILES) to prevent abuse.
  • 100 KB per file (MAX_CONTEXT_SIZE) inherited from the context loader.

No #417 relaxation

merge_project_config's rejection of project-scope instructions is left unchanged.
Scheme D is orthogonal to #417 — it doesn't touch the config key at all.

Changes

crates/tui/src/project_context.rs (+~190 lines)

New constants:

  • RULES_DIRS = [".codewhale/rules", ".claude/rules"] — directories scanned in order
  • MAX_RULES_FILES = 50 — per-directory file cap

New field on ProjectContext:

  • rules_block: Option<String> — holds the assembled rules XML, separate from instructions

New function load_rules_from_dir():

  • Scans a rules directory for *.md files
  • Sorts by filename for deterministic order
  • Reuses load_context_file() for size checking + symlink safety + empty-file rejection
  • Returns Vec<(PathBuf, String)> — silently returns empty on missing/unreadable directories

Modified load_project_context():

  • After loading PROJECT_CONTEXT_FILES (AGENTS.md etc.), iterates RULES_DIRS and calls
    load_rules_from_dir()
  • Wraps each rule file in <project_rule source="…">…</project_rule>
  • Stores assembled rules in ctx.rules_block (not ctx.instructions, preserving parent traversal)

Modified as_system_block():

  • Appends rules_block inside the project-context block when instructions exist
  • Emits rules_block standalone when no main instructions are present
  • Emits rules_block after constitution when constitution exists but instructions don't

Modified project_context_cache_candidate_paths():

  • Scans RULES_DIRS for *.md files and adds them to the cache-key candidate list
  • Ensures rules changes invalidate the project-context cache (editing a rule file,
    adding/removing rule files all produce a different cache key)

9 new tests:

Test What it covers
rules_from_codewhale_dir_are_loaded_as_project_context Basic discovery + <project_rule> wrapper
rules_are_loaded_in_filename_order Deterministic filename sort (aaa < mmm < zzz)
rules_from_claude_dir_are_compat_loaded .claude/rules/ compatibility
rules_directory_missing_does_not_crash Graceful handling of missing directories
rules_coexist_with_agents_md AGENTS.md + rules coexist, AGENTS.md precedes rules
non_md_files_in_rules_dir_are_ignored Only *.md files are loaded
rules_cap_truncates_excess_files MAX_RULES_FILES=50 enforced
rules_rejects_symlinked_files Symlinked rule files are refused (unix only)
rules_from_both_dirs_are_loaded_together Dual directory support + correct priority order

crates/tui/src/context_report.rs (+18 lines)

  • /context report now includes rules_block content when rules are present
  • When only rules exist (no main instructions), they appear as a separate "Project rules" entry

crates/tui/src/project_context_cache.rs (+28 lines, 2 tests)

  • signature_changes_when_rules_file_changes — verifies content change triggers cache invalidation
  • signature_changes_when_rules_file_is_added_or_removed — verifies file addition/removal triggers invalidation

Verification

Check Result
cargo fmt --all -- --check clean
cargo clippy -p codewhale-tui (our files only) clean
cargo test -p codewhale-tui --bin codewhale-tui -- project_context 56 passed, 0 failed
cargo test -p codewhale-tui --bin codewhale-tui -- project_context_cache 7 passed, 0 failed
cargo test -p codewhale-tui --bin codewhale-tui -- context_report 9 passed, 0 failed

System prompt structure (with rules)

┌─ System Prompt ──────────────────────────────────────────────┐
│ [mode prompt + constitution]                                  │
│                                                                │
│ <project_instructions source="AGENTS.md">                      │
│   ...AGENTS.md content...                                     │
│ </project_instructions>                                       │
│                                                                │
│ <project_rule source=".codewhale/rules/coding-style.md">      │
│   ...rule content...                                          │
│ </project_rule>                                               │
│ <project_rule source=".codewhale/rules/testing.md">           │
│   ...rule content...                                          │
│ </project_rule>                                               │
│                                                                │
│ ── volatile boundary ──                                       │
│ ## Environment …                                              │
│ <instructions source="~/global.md">…</instructions>           │
└────────────────────────────────────────────────────────────────┘

Audit summary

A comprehensive cross-system audit (2 rounds, 5 dimensions) was performed to ensure no
regressions or unexpected interactions:

Audit scope Verdict Details
Prompt byte-stability ✅ Safe Rules in static layer (same as AGENTS.md). KV cache busts on rule changes — by design.
All prompt construction paths ✅ Covered TUI, engine init, refresh_system_prompt, build_system_prompt all go through as_system_block().
Sub-agent / Fleet 🟡 Pre-existing Model-visible agent tool ✔️ inherits rules via fork_context. Background /agent path ❌ uses static prompt — same pre-existing limitation as AGENTS.md.
WhaleFlow ✅ No interaction Independent crate, no project-context references.
Project-context cache ✅ Fixed Cache key now includes rules directory files. Tested for content change + file addition/removal.
Parent-directory AGENTS.md ✅ Preserved rules_block separated from instructionshas_instructions() unchanged.
#417 project-config ✅ Unchanged merge_project_config's instructions rejection untouched.

What this PR does NOT do (deferred to future milestones)

  • Glob support in instructions_paths() (scheme C)
  • Path restriction for project-scope instructions relaxation (scheme B)
  • Conditional rule loading with YAML frontmatter / paths matching (scheme E)
  • Trust gating for project-scope instructions (scheme A)

These are tracked in #3867 as separate workstreams.

Migration path

  • New projects: create .codewhale/rules/ (or .claude/rules/) and drop .md files.
    No config changes needed — rules are auto-discovered on next session start.
  • Existing .claude/rules/ users: rules are picked up automatically — zero migration cost.
  • Existing global instructions users: both channels are additive (project rules + global
    instructions coexist in the system prompt), so no conflict.

PR:feat(tui) — 自动发现 .codewhale/rules/.claude/rules/ 目录作为项目上下文

Closes #3867

概述

load_project_context() 新增 rules 目录自动发现:每次会话启动时,CodeWhale
自动扫描 .codewhale/rules/(原生)和 .claude/rules/(Claude 兼容)目录下的 .md
文件,按文件名排序加载,追加到注入 system prompt 的项目上下文块中。每条规则包裹在
<project_rule source="…"> 元素中。

这是设计锚点 issue #3867方案 D 的实现——与 AGENTS.md 相同的安全模型(仅限
工作区内容,无绝对路径逃逸),不需要 relax #417 项目级配置限制。

动机

此 PR 之前,CodeWhale 在多项目场景下的规则系统几乎不可用:

  1. instructions 配置项被项目级禁止(自 v0.8.8 PRIOR: Ignore dangerous project-level config keys #417)——用户只能在
    ~/.codewhale/config.toml 中列举规则文件,跨多个仓库维护极其痛苦。
  2. 无 rules 目录自动发现——Claude Code 的 .claude/rules/ 自动加载所有 .md
    文件;CodeWhale 没有对应机制,且无法批量加载多文件规则。
  3. instructions_paths() 不支持 glob,即使写 instructions = [".claude/rules/*.md"]
    也是无效的。

#3867 设计讨论的推荐路径是 D 优先——rules 目录自动发现与 AGENTS.md 同安全等级,
无需改动 #417,且能独立解决多项目痛点的大部分。本 PR 实现该方案。

设计决策

rules_block 分离 vs 混入 instructions

Rules 存储在 ProjectContext独立字段 rules_block: Option<String> 中,不混入
instructions。这对 mono-repo 场景至关重要:

  • has_instructions() 控制是否向上搜索父目录的 AGENTS.md。若 rules 单独设置了
    instructions,会阻止父目录发现。
  • 将 rules 保持在 rules_block 中,has_instructions() 保持不变(仅反映主指令),
    父目录遍历正常工作。
  • as_system_block() 在渲染时将 rules_block 追在 instructions 之后,两者都出现在
    最终 system prompt 中。

安全模型

AGENTS.md 同等级:

  • 仅限工作区子树——rules 位于项目内的 .codewhale/rules/.claude/rules/
    无绝对路径逃逸。
  • 拒绝软链接——load_context_file()(与 AGENTS.md 共享)拒绝软链接文件,与
    read_project_config_file 中的现有先例一致。
  • 每目录上限 50 文件MAX_RULES_FILES)防止滥用。
  • 每文件 100 KBMAX_CONTEXT_SIZE)继承自上下文加载器。

不触碰 #417

merge_project_config 对项目级 instructions 的拒绝保持原样。方案 D 与 #417
完全正交——不涉及配置项。

改动

crates/tui/src/project_context.rs(+~190 行)

新增常量:

  • RULES_DIRS = [".codewhale/rules", ".claude/rules"] — 按顺序扫描的目录
  • MAX_RULES_FILES = 50 — 每目录文件上限

ProjectContext 新增字段:

  • rules_block: Option<String> — 存放组装好的 rules XML,与 instructions 分离

新增函数 load_rules_from_dir()

  • 扫描 rules 目录中的 *.md 文件
  • 按文件名排序,保证确定性顺序
  • 复用 load_context_file() 做大小检查 + 软链接安全 + 空文件拒绝
  • 返回 Vec<(PathBuf, String)> — 目录缺失或不可读时静默返回空 vector

修改 load_project_context()

  • 加载 PROJECT_CONTEXT_FILES(AGENTS.md 等)后,遍历 RULES_DIRS 调用
    load_rules_from_dir()
  • 将每条规则包裹在 <project_rule source="…">…</project_rule>
  • 组装结果存入 ctx.rules_block(而非 ctx.instructions,保留父目录遍历)

修改 as_system_block()

  • instructions 存在时,将 rules_block 追在项目上下文块中
  • 无主指令时,独立输出 rules_block
  • constitution 存在但 instructions 不存在时,constitution 后输出 rules_block

修改 project_context_cache_candidate_paths()

  • 扫描 RULES_DIRS 中的 *.md 文件,加入缓存 key 候选列表
  • 确保 rules 变更触发项目上下文缓存失效(编辑规则文件、新增/删除规则文件均产生不同缓存 key)

9 个新测试:

测试 覆盖
rules_from_codewhale_dir_are_loaded_as_project_context 基础发现 + <project_rule> 包裹
rules_are_loaded_in_filename_order 确定性文件名排序(aaa < mmm < zzz)
rules_from_claude_dir_are_compat_loaded .claude/rules/ 兼容
rules_directory_missing_does_not_crash 目录缺失不崩溃
rules_coexist_with_agents_md AGENTS.md + rules 共存,AGENTS.md 在前
non_md_files_in_rules_dir_are_ignored 仅加载 *.md
rules_cap_truncates_excess_files MAX_RULES_FILES=50 强制
rules_rejects_symlinked_files 拒绝软链接规则文件(仅 unix)
rules_from_both_dirs_are_loaded_together 双目录共存 + 正确优先级

crates/tui/src/context_report.rs(+18 行)

  • /context report 现在在 rules 存在时包含 rules_block 内容
  • 仅 rules 存在(无主指令)时显示为独立的"Project rules"条目

crates/tui/src/project_context_cache.rs(+28 行,2 个测试)

  • signature_changes_when_rules_file_changes — 验证内容变更触发缓存失效
  • signature_changes_when_rules_file_is_added_or_removed — 验证文件增删触发失效

验证

检查项 结果
cargo fmt --all -- --check clean
cargo clippy -p codewhale-tui(仅本次改动文件) clean
cargo test -p codewhale-tui --bin codewhale-tui -- project_context 56 passed, 0 failed
cargo test -p codewhale-tui --bin codewhale-tui -- project_context_cache 7 passed, 0 failed
cargo test -p codewhale-tui --bin codewhale-tui -- context_report 9 passed, 0 failed

System prompt 结构(含 rules)

┌─ System Prompt ──────────────────────────────────────────────────┐
│ [mode prompt + constitution]                                      │
│                                                                    │
│ <project_instructions source="AGENTS.md">                          │
│   ...AGENTS.md 内容...                                             │
│ </project_instructions>                                           │
│                                                                    │
│ <project_rule source=".codewhale/rules/coding-style.md">          │
│   ...规则内容...                                                   │
│ </project_rule>                                                   │
│ <project_rule source=".codewhale/rules/testing.md">               │
│   ...规则内容...                                                   │
│ </project_rule>                                                   │
│                                                                    │
│ ── volatile boundary ──                                           │
│ ## Environment …                                                  │
│ <instructions source="~/global.md">…</instructions>               │
└────────────────────────────────────────────────────────────────────┘

审计摘要

进行了全面的跨系统审计(2 轮、5 个维度),确保无回归或意外交互:

审计范围 结论 详情
Prompt 字节稳定性 ✅ 安全 Rules 在静态层(与 AGENTS.md 一致)。KV cache 随规则变更刷新——设计如此。
所有 prompt 构造路径 ✅ 全覆盖 TUI、engine init、refresh_system_prompt、build_system_prompt 均经过 as_system_block()
子任务 / Fleet 🟡 预存限制 模型可见的 agent 工具 ✔️ 通过 fork_context 继承 rules。后台 /agent 路径 ❌ 使用静态 prompt——与 AGENTS.md 相同的预存限制。
WhaleFlow ✅ 无交互 独立 crate,无项目上下文引用。
项目上下文缓存 ✅ 已修复 缓存 key 现在包含 rules 目录文件。已验证内容变更 + 文件增删。
父目录 AGENTS.md ✅ 保持 rules_blockinstructions 分离——has_instructions() 不变。
#417 项目配置 ✅ 未触碰 merge_project_configinstructions 拒绝保持不变。

本 PR 不包含的内容(推迟到后续 milestone)

  • Glob 支持 instructions_paths()(方案 C)
  • 路径限制 放宽项目级 instructions(方案 B)
  • 按需加载 YAML frontmatter / paths 匹配(方案 E)
  • Trust gating 项目级 instructions(方案 A)

以上在 #3867 中作为独立工作流跟踪。

迁移路径

  • 新项目:创建 .codewhale/rules/(或 .claude/rules/)并放入 .md 文件。
    无需配置变更——下次会话启动时自动发现 rules。
  • 现有 .claude/rules/ 用户:rules 直接生效——零迁移成本。
  • 现有全局 instructions 用户:两个通道是叠加关系(项目 rules + 全局 instructions
    共存于 system prompt),无冲突。

@yekern

yekern commented Jul 2, 2026

Copy link
Copy Markdown
Contributor Author

@LeoLin990405 Good catch — just pushed a commit adding the symlink-directory guard.

Two places patched: load_rules_from_dir() (the load path) and project_context_cache_candidate_paths() (the cache-key path), both with the same fs::symlink_metadata(…).is_symlink() check. Added rules_rejects_symlinked_directory test to lock it in. All green (59 tests, 0 failures).

Appreciate the thorough review. 🐳

@Hmbown

Hmbown commented Jul 2, 2026

Copy link
Copy Markdown
Owner

Review — solid feature; needs a rebase onto the merged v0.8.67 context work

This is a nice addition and it directly addresses #3867 (project-scope instructions being overly denied). The design is thoughtful and the security model is handled well:

  • Symlink containment is correct. Refusing a symlinked rules directory (fs::symlink_metadata(...).is_symlink()) is the right call — the comment nails why a per-file is_symlink check alone wouldn't suffice (the real .md files behind a symlinked dir would pass the per-file check and be read from outside the workspace). Each file also goes through load_context_file (size + symlink safety). 👍
  • Bounded: MAX_RULES_FILES = 50 per dir with a truncation warning; deterministic filename ordering.
  • rules_block kept separate from instructions so rules alone don't suppress parent-directory AGENTS.md discovery via has_instructions() — good, subtle call.
  • .claude/rules/ compat alongside .codewhale/rules/ is a sensible bridge.

Blocker: rebase required

This now conflicts with main (2 markers in project_context.rs) — the merged #3861 v0.8.67 work reworked that file (constitution loading + repo-law protected_invariants). A rebase onto current main is needed before it can land, and CI here only ran partially (SKIPPED,SUCCESS), so a full run post-rebase is worth confirming.

The interaction is only textual, not behavioral: this feature reads rules as project context, which is orthogonal to the repo-law write enforcement that also lives in project_context.rs now — they don't fight, they just touch the same file.

One minor suggestion (non-blocking)

Per-file content is capped (MAX_CONTEXT_SIZE, 100 KB), but 50 files × 100 KB ≈ 5 MB of rules could be injected in the pathological case. Consider a total byte budget across the assembled rules_block (truncate with a note once the cumulative size crosses a threshold) so a large rules dir can't dominate the context window. Compaction would eventually handle it, but bounding at assembly time is cheaper.

Net: sound and safe design; rebase + full CI, optional total-size cap. Happy to help with the rebase against the new project_context.rs if useful.

(Reviewed against the current diff; not a merge/approve — for @Hmbown's decision.)

yekern added 3 commits July 3, 2026 08:58
…ries as project context

Add rules-directory auto-discovery as solution D from Hmbown#3867.

- Scans .codewhale/rules/ (native) and .claude/rules/ (Claude compat) for *.md files
- Loads in filename order, wraps each in <project_rule source="…"> elements
- Separates rules into rules_block field to avoid blocking parent AGENTS.md traversal
- Reuses load_context_file() for size checking + symlink safety (MAX_CONTEXT_SIZE 100KB)
- Caps at MAX_RULES_FILES=50 per directory to prevent abuse
- Adds rules files to project_context_cache_candidate_paths for proper cache invalidation
- Updates /context report to surface rules

Files: project_context.rs (+~190), context_report.rs (+18), project_context_cache.rs (+28)
Tests: 11 new (9 rules + 2 cache), 67 passed, clippy clean
A symlinked rules directory (e.g. .codewhale/rules -> /outside) would
allow real .md files behind it to pass per-file symlink checks and be
read from outside the workspace subtree — same escape class as Hmbown#417.

Adds fs::symlink_metadata guard in load_rules_from_dir() and
project_context_cache_candidate_paths() to skip symlinked directories.
New test rules_rejects_symlinked_directory locks in the guard.

Reported-by: @LeoLin990405
50 files × 100KB could reach ~5MB. Caps cumulative rules_block at
MAX_RULES_BLOCK_BYTES=500KB with truncation marker to prevent a large
rules directory from dominating the context window.

Suggested-by: review on Hmbown#3892
@yekern yekern force-pushed the codex/rules-dir-auto-discovery branch from fbd881a to 20926cf Compare July 3, 2026 01:03
@yekern

yekern commented Jul 3, 2026

Copy link
Copy Markdown
Contributor Author

@Hmbown Rebased onto latest main (v0.8.67 constitution work from #3861). Two review-driven additions:

  • Symlinked rules directory guard (caught by @LeoLin990405): load_rules_from_dir() and project_context_cache_candidate_paths() now refuse symlinked directories with the same fs::symlink_metadata(…).is_symlink() check used for files. Added rules_rejects_symlinked_directory test.

  • Total byte budget: MAX_RULES_BLOCK_BYTES = 500 KB caps the assembled rules_block to prevent a pathological 50 × 100 KB scenario from dominating the context window. Truncation includes an explicit marker.

All green: 62 tests pass (including 10 new rules tests + 2 cache tests), fmt clean, clippy clean. Conflict was textual only — the ignored_project_whale_warnings line from #3861 sits right above our rules loading block.

@Hmbown Hmbown merged commit 296f050 into Hmbown:main Jul 3, 2026
13 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Project-scope instructions are overly denied — need glob + rules directory auto-discovery

4 participants