Summary
index_repository(mode="moderate") produces a graph that's missing entire subtrees with no warning or error. Running the same call with mode="full" (otherwise identical inputs) restores the dropped content. The drop is silent — both modes report status: "indexed" with non-zero node/edge counts.
Reproduction
Repository under test: a Python project where the indexed sub-tree was src/ha_mcp/ of homeassistant-ai/ha-mcp@82050348, containing 87 Python files across ~11 sub-directories (notably a tools/ directory with ~47 files).
Steps:
delete_project(project="<P>")
index_repository(repo_path="<absolute-path-to-src/ha_mcp>", mode="moderate") → returns {status: "indexed", nodes: 1294, edges: 3060}
get_architecture(project="<P>", aspects=["file_tree"]) → tools/ directory entirely absent from the returned file_tree (only auth/, client/, policy/, transforms/, utils/, dashboard_screenshot/, plus top-level files appear).
delete_project(project="<P>")
index_repository(repo_path="<same path>", mode="full") → returns {status: "indexed", nodes: 2372, edges: 7570}
get_architecture(project="<P>", aspects=["file_tree"]) → tools/ present with all 47 files, file_tree complete.
Net delta: moderate produced ~54 % of the nodes and ~40 % of the edges that full produced, silently — both calls reported successful indexing.
Expected
moderate mode is documented as filtering files (noise reduction), not as a structural drop of major source directories. A 47-file production source directory should not be silently filtered out. At minimum the response should expose which paths/files were excluded (count + sample), so callers can detect the drop.
Actual
tools/ is entirely missing from the graph after the moderate indexing. Subsequent queries like search_graph(file_pattern="tools/<anything>.py") return total: 0, indistinguishable from "this file doesn't exist in the repo", even though the file is plainly present in repo_path.
Why it matters
Reviewers or agents relying on moderate-indexed graph queries get false-negative answers for symbols in the dropped subtrees (zero callers, zero definitions, zero hits) with no signal that the answer is unreliable. For a PR-review workflow this means: a changed file in tools/ would appear "uncalled / safe to change" when in reality the protocol just can't see it.
.cbmignore for the test repo only contained common Python noise (.venv/, __pycache__/, dist/, build/, node_modules/, *.egg-info/, .git/) — nothing that explains dropping tools/.
Environment
codebase-memory-mcp --version → 0.7.0 (latest at filing; release 2026-05-30)
- Linux 6.12.85-haos (Home Assistant OS, Alpine-based container)
- stdio MCP transport via Claude Code
- Indexed
repo_path: /data/home/projects/claude-code-ha/ha-mcp/src/ha_mcp (87 Python files per get_architecture in full mode)
- Same path under
mode="moderate": 54 of those 87 files present (the missing 33 are exactly src/ha_mcp/tools/)
Workaround
Use mode="full" exclusively on this codebase. Walltime delta is small (~3 s either way) but moderate loses ~46 % of nodes for no observable speed advantage.
Suggested fix direction
Whatever heuristic moderate uses to filter (file count threshold? size threshold? path-pattern match?), make the dropped path list visible in the index_repository response (e.g. excluded_paths: ["tools/"] or excluded_count: 33), so the caller can decide whether to retry with full. Silent drops are the worst failure mode.
Summary
index_repository(mode="moderate")produces a graph that's missing entire subtrees with no warning or error. Running the same call withmode="full"(otherwise identical inputs) restores the dropped content. The drop is silent — both modes reportstatus: "indexed"with non-zero node/edge counts.Reproduction
Repository under test: a Python project where the indexed sub-tree was
src/ha_mcp/of homeassistant-ai/ha-mcp@82050348, containing 87 Python files across ~11 sub-directories (notably atools/directory with ~47 files).Steps:
delete_project(project="<P>")index_repository(repo_path="<absolute-path-to-src/ha_mcp>", mode="moderate")→ returns{status: "indexed", nodes: 1294, edges: 3060}get_architecture(project="<P>", aspects=["file_tree"])→tools/directory entirely absent from the returnedfile_tree(onlyauth/,client/,policy/,transforms/,utils/,dashboard_screenshot/, plus top-level files appear).delete_project(project="<P>")index_repository(repo_path="<same path>", mode="full")→ returns{status: "indexed", nodes: 2372, edges: 7570}get_architecture(project="<P>", aspects=["file_tree"])→tools/present with all 47 files, file_tree complete.Net delta:
moderateproduced ~54 % of the nodes and ~40 % of the edges thatfullproduced, silently — both calls reported successful indexing.Expected
moderatemode is documented as filtering files (noise reduction), not as a structural drop of major source directories. A 47-file production source directory should not be silently filtered out. At minimum the response should expose which paths/files were excluded (count + sample), so callers can detect the drop.Actual
tools/is entirely missing from the graph after themoderateindexing. Subsequent queries likesearch_graph(file_pattern="tools/<anything>.py")returntotal: 0, indistinguishable from "this file doesn't exist in the repo", even though the file is plainly present inrepo_path.Why it matters
Reviewers or agents relying on
moderate-indexed graph queries get false-negative answers for symbols in the dropped subtrees (zero callers, zero definitions, zero hits) with no signal that the answer is unreliable. For a PR-review workflow this means: a changed file intools/would appear "uncalled / safe to change" when in reality the protocol just can't see it..cbmignorefor the test repo only contained common Python noise (.venv/,__pycache__/,dist/,build/,node_modules/,*.egg-info/,.git/) — nothing that explains droppingtools/.Environment
codebase-memory-mcp --version→0.7.0(latest at filing; release 2026-05-30)repo_path:/data/home/projects/claude-code-ha/ha-mcp/src/ha_mcp(87 Python files perget_architectureinfullmode)mode="moderate": 54 of those 87 files present (the missing 33 are exactlysrc/ha_mcp/tools/)Workaround
Use
mode="full"exclusively on this codebase. Walltime delta is small (~3 s either way) butmoderateloses ~46 % of nodes for no observable speed advantage.Suggested fix direction
Whatever heuristic
moderateuses to filter (file count threshold? size threshold? path-pattern match?), make the dropped path list visible in the index_repository response (e.g.excluded_paths: ["tools/"]orexcluded_count: 33), so the caller can decide whether to retry withfull. Silent drops are the worst failure mode.