feat(health): scale coverage deductions by uncovered fraction#314
Merged
Conversation
Add a continuous coverage biomarker, coverage_gradient, that deducts health in direct proportion to a file's uncovered fraction (4.0 * (1 - line_coverage_pct/100), capped) for files with known coverage, and stays silent when no coverage was ingested (absent is never imputed as uncovered). The two existing coverage biomarkers only fire below hard thresholds (~40-60% line coverage), so on well-tested codebases - where most files sit at 85-99% - the score was effectively blind to coverage even though the uncovered fraction still carries defect signal. The gradient fires across the whole 0-100% range and recovers the magnitude the binary gates discard. Mechanism: a new optional `deduction` override on BiomarkerResult lets a finding carry a continuous magnitude that replaces the discrete severity -> deduction table in score_file; the value is still weighted and category-capped, so per-finding health_impact stays linear and attributable. The gradient lives in its own capped category (test_coverage_gradient, -2.0) so the additive signal neither squeezes nor is squeezed by the binary gates. Calibrated offline against the defect corpus: +0.043 corpus AUC [95% CI +0.023, +0.061] on the covered subset, Popt-neutral, exactly zero on repos without ingested coverage. Validated by re-aggregating the cached corpus findings through the shipped detector + score_file (reproduced +0.041 [+0.023, +0.059]). Zero added walk cost. Snapshot, biomarker tests, and docs updated in step.
swati510
approved these changes
May 29, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Adds a continuous coverage biomarker,
coverage_gradient, that deducts health in direct proportion to a file's uncovered fraction —4.0 × (1 − line_coverage_pct/100), capped — for files with known coverage, and stays silent when no coverage was ingested (absent is never imputed as uncovered).Why
The two existing coverage biomarkers only fire below hard thresholds (~40–60% line coverage), so on well-tested codebases — where most files sit at 85–99% — the score was effectively blind to coverage even though the uncovered fraction still carries defect signal. The gradient fires across the whole 0–100% range and recovers the magnitude the binary gates discard.
How
A new optional
deductionoverride onBiomarkerResultlets a finding carry a continuous magnitude that replaces the discrete severity → deduction table inscore_file; the value is still weighted and category-capped, so per-findinghealth_impactstays linear and attributable. The gradient lives in its own capped category (test_coverage_gradient, −2.0) so the additive signal neither squeezes nor is squeezed by the binary gates.Validation
Calibrated offline against the defect corpus and validated by re-aggregating the cached corpus findings through the shipped detector +
score_file: +0.041 corpus AUC [95% CI +0.023, +0.059] on the covered subset, Popt-neutral, and exactly zero on repos without ingested coverage — purely additive. Zero added walk cost (arithmetic on already-parsed coverage).Snapshot, biomarker tests, and docs updated in the same PR (biomarker count 24 → 25). 219 health tests pass; ruff clean.