feat(git): persist per-commit change-risk history#317
Merged
Conversation
Add a per-commit `git_commits` table capturing one row per commit in the indexed window: author, timestamp, subject, Kamei change features (lines/files/dirs/subsystems touched, churn entropy, fix-flag) and a calibrated just-in-time change-risk score/level. The rows are collected during the existing repo-wide commit-index walk via an optional sink (no extra git pass) and scored by the existing linear, interpretable change-risk model — pure arithmetic, no LLM or blame at runtime. Author experience, the one change-risk feature that costs a subprocess in the live `repowise risk` path, is reconstructed in-memory as a cumulative per-author commit tally, so the whole pass stays zero-extra-git (~14 ms for 273 commits on this repo). Wiring is additive and non-breaking: rows ride on `GitIndexSummary` and are upserted alongside file-level git metadata. Includes the model, an Alembic migration, bulk-upsert/delete/paginated CRUD (sortable by risk or date), and unit + integration coverage.
swati510
approved these changes
May 30, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
Adds a per-commit
git_commitstable so the indexer stops throwing awaycommit-level history. Each row records one commit in the indexed window:
subsystems touched, churn entropy, and a fix-flag
low/moderate/high)Previously the git layer persisted only per-file aggregates; commit-level
risk was computable live (
repowise risk) but never stored. This is thedata foundation a commits / change-risk surface and change-over-time trends
can build on.
How
via an optional sink, so there is no extra git pass. The walk already
parses every commit; it now also yields each commit's full file footprint.
pure arithmetic on already-parsed diff data, no LLM and no blame at
runtime. Author experience (the one feature that needs a subprocess in
the live path) is reconstructed in-memory as a cumulative per-author
commit count, keeping the pass fully git-free beyond the one walk.
index summary and are upserted next to the file-level metadata. No
behaviour change for callers that don't read them; empty in
rename-tracking mode (which uses the per-file walk).
Includes the ORM model, an Alembic migration, and bulk-upsert / delete /
paginated CRUD (sortable by risk or date).
Validation
round-trip (risk/date sort, pagination, sha-prefix lookup, idempotent
upsert, delete), and a real-repo run asserting rows are produced with the
expected shape. Model roster and schema-reconciliation tests updated.
nothing measurable to the walk. No re-index required — the table populates
on the next index.
ruff check+ruff formatclean on all touched files.Notes
change_risk_scoreis the raw model output. Its ranking(review-priority order) is the intended use; absolute level bucketing can
skew high on repos whose typical commit is large, so a UI surfacing it
should normalize per-repo rather than show an absolute badge.