Skip to content

index_repository(mode="moderate") silently drops entire subtrees from the indexed graph #411

@Patch76

Description

@Patch76

Summary

index_repository(mode="moderate") produces a graph that's missing entire subtrees with no warning or error. Running the same call with mode="full" (otherwise identical inputs) restores the dropped content. The drop is silent — both modes report status: "indexed" with non-zero node/edge counts.

Reproduction

Repository under test: a Python project where the indexed sub-tree was src/ha_mcp/ of homeassistant-ai/ha-mcp@82050348, containing 87 Python files across ~11 sub-directories (notably a tools/ directory with ~47 files).

Steps:

  1. delete_project(project="<P>")
  2. index_repository(repo_path="<absolute-path-to-src/ha_mcp>", mode="moderate") → returns {status: "indexed", nodes: 1294, edges: 3060}
  3. get_architecture(project="<P>", aspects=["file_tree"])tools/ directory entirely absent from the returned file_tree (only auth/, client/, policy/, transforms/, utils/, dashboard_screenshot/, plus top-level files appear).
  4. delete_project(project="<P>")
  5. index_repository(repo_path="<same path>", mode="full") → returns {status: "indexed", nodes: 2372, edges: 7570}
  6. get_architecture(project="<P>", aspects=["file_tree"])tools/ present with all 47 files, file_tree complete.

Net delta: moderate produced ~54 % of the nodes and ~40 % of the edges that full produced, silently — both calls reported successful indexing.

Expected

moderate mode is documented as filtering files (noise reduction), not as a structural drop of major source directories. A 47-file production source directory should not be silently filtered out. At minimum the response should expose which paths/files were excluded (count + sample), so callers can detect the drop.

Actual

tools/ is entirely missing from the graph after the moderate indexing. Subsequent queries like search_graph(file_pattern="tools/<anything>.py") return total: 0, indistinguishable from "this file doesn't exist in the repo", even though the file is plainly present in repo_path.

Why it matters

Reviewers or agents relying on moderate-indexed graph queries get false-negative answers for symbols in the dropped subtrees (zero callers, zero definitions, zero hits) with no signal that the answer is unreliable. For a PR-review workflow this means: a changed file in tools/ would appear "uncalled / safe to change" when in reality the protocol just can't see it.

.cbmignore for the test repo only contained common Python noise (.venv/, __pycache__/, dist/, build/, node_modules/, *.egg-info/, .git/) — nothing that explains dropping tools/.

Environment

  • codebase-memory-mcp --version0.7.0 (latest at filing; release 2026-05-30)
  • Linux 6.12.85-haos (Home Assistant OS, Alpine-based container)
  • stdio MCP transport via Claude Code
  • Indexed repo_path: /data/home/projects/claude-code-ha/ha-mcp/src/ha_mcp (87 Python files per get_architecture in full mode)
  • Same path under mode="moderate": 54 of those 87 files present (the missing 33 are exactly src/ha_mcp/tools/)

Workaround

Use mode="full" exclusively on this codebase. Walltime delta is small (~3 s either way) but moderate loses ~46 % of nodes for no observable speed advantage.

Suggested fix direction

Whatever heuristic moderate uses to filter (file count threshold? size threshold? path-pattern match?), make the dropped path list visible in the index_repository response (e.g. excluded_paths: ["tools/"] or excluded_count: 33), so the caller can decide whether to retry with full. Silent drops are the worst failure mode.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions