Skip to content

feat(git): persist per-commit change-risk history#317

Merged
RaghavChamadiya merged 1 commit into
mainfrom
feat/persist-git-commits
May 30, 2026
Merged

feat(git): persist per-commit change-risk history#317
RaghavChamadiya merged 1 commit into
mainfrom
feat/persist-git-commits

Conversation

@RaghavChamadiya
Copy link
Copy Markdown
Member

What

Adds a per-commit git_commits table so the indexer stops throwing away
commit-level history. Each row records one commit in the indexed window:

  • author, timestamp, subject
  • Kamei change features — lines added/deleted, files / directories /
    subsystems touched, churn entropy, and a fix-flag
  • a calibrated change-risk score + level (low/moderate/high)

Previously the git layer persisted only per-file aggregates; commit-level
risk was computable live (repowise risk) but never stored. This is the
data foundation a commits / change-risk surface and change-over-time trends
can build on.

How

  • The rows are gathered during the existing repo-wide commit-index walk
    via an optional sink, so there is no extra git pass. The walk already
    parses every commit; it now also yields each commit's full file footprint.
  • Scoring reuses the existing linear, interpretable change-risk model —
    pure arithmetic on already-parsed diff data, no LLM and no blame at
    runtime. Author experience (the one feature that needs a subprocess in
    the live path) is reconstructed in-memory as a cumulative per-author
    commit count, keeping the pass fully git-free beyond the one walk.
  • Wiring is additive and non-breaking: the rows ride on the existing git
    index summary and are upserted next to the file-level metadata. No
    behaviour change for callers that don't read them; empty in
    rename-tracking mode (which uses the per-file walk).

Includes the ORM model, an Alembic migration, and bulk-upsert / delete /
paginated CRUD (sortable by risk or date).

Validation

  • New unit + integration tests: the pure feature/risk builder, a CRUD
    round-trip (risk/date sort, pagination, sha-prefix lookup, idempotent
    upsert, delete), and a real-repo run asserting rows are produced with the
    expected shape. Model roster and schema-reconciliation tests updated.
  • Measured marginal cost ≈ 14 ms for 273 commits; the collection adds
    nothing measurable to the walk. No re-index required — the table populates
    on the next index.
  • ruff check + ruff format clean on all touched files.

Notes

  • The persisted change_risk_score is the raw model output. Its ranking
    (review-priority order) is the intended use; absolute level bucketing can
    skew high on repos whose typical commit is large, so a UI surfacing it
    should normalize per-repo rather than show an absolute badge.

Add a per-commit `git_commits` table capturing one row per commit in the
indexed window: author, timestamp, subject, Kamei change features
(lines/files/dirs/subsystems touched, churn entropy, fix-flag) and a
calibrated just-in-time change-risk score/level.

The rows are collected during the existing repo-wide commit-index walk via
an optional sink (no extra git pass) and scored by the existing linear,
interpretable change-risk model — pure arithmetic, no LLM or blame at
runtime. Author experience, the one change-risk feature that costs a
subprocess in the live `repowise risk` path, is reconstructed in-memory as a
cumulative per-author commit tally, so the whole pass stays zero-extra-git
(~14 ms for 273 commits on this repo).

Wiring is additive and non-breaking: rows ride on `GitIndexSummary` and are
upserted alongside file-level git metadata. Includes the model, an Alembic
migration, bulk-upsert/delete/paginated CRUD (sortable by risk or date), and
unit + integration coverage.
@RaghavChamadiya RaghavChamadiya requested a review from swati510 as a code owner May 30, 2026 06:04
@RaghavChamadiya RaghavChamadiya merged commit 5be3cda into main May 30, 2026
5 checks passed
@RaghavChamadiya RaghavChamadiya deleted the feat/persist-git-commits branch May 30, 2026 06:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants