Skip to content

Clarify ICSD24 oxidation-state filtering thresholds (fixes #627)#630

Merged
KingaMas merged 1 commit into
masterfrom
fix/issue-627-icsd24-oxidation-state-consistency
May 28, 2026
Merged

Clarify ICSD24 oxidation-state filtering thresholds (fixes #627)#630
KingaMas merged 1 commit into
masterfrom
fix/issue-627-icsd24-oxidation-state-consistency

Conversation

@KingaMas

@KingaMas KingaMas commented May 28, 2026

Copy link
Copy Markdown
Member

Fixes #627.

@dylancjohn flagged three places where the ICSD24 oxidation-state set is exposed inconsistently. This PR addresses all three.

Changes

  • smact/data_loader.pylookup_element_oxidation_states_icsd24 docstring said ≥5 reports, but the underlying file (oxidation_states_icsd24_filtered.txt) actually uses a strict >5 cut-off. Verified empirically: every species with results_count == 5 (P⁻⁴, Cl⁴, Ti¹, Co⁵, Se³, Sb¹, Sm⁴, Tb¹, Ra², Pu⁵, Cm⁴) is excluded from the filtered file. Corrected the docstring and added a pointer to ICSD24FilterConfig.

  • smact/__init__.py — Expanded the Element.oxidation_states docstring to spell out that the attribute is the static >5-reports file, and to flag that smact_validity applies a different (consensus=3) cut-off by default via ICSD24FilterConfig. Points users at ICSD24OxStatesFilter for a configurable threshold.

  • smact/screening.py — Aligned ICSD24FilterConfig.commonality default from "medium" to "low", matching ICSD24OxStatesFilter.filter(). Two public APIs should not silently apply different defaults for the same parameter. Also refreshed the dataclass docstring and the smact_validity docstring that still quoted the old default.

  • docs/examples/property_prediction.ipynb — One comment updated to reflect the new default.

Behaviour change ⚠

smact_validity callers who relied on the implicit "medium" filter will now get a more permissive result (no proportion filter; only consensus=3 is applied). Callers that pass icsd_filter=ICSD24FilterConfig(commonality="medium") explicitly are unaffected.

Test plan

  • `ruff check` / `ruff format` / `codespell` clean on the touched files (ran locally via `uv run`)
  • Full `pytest smact/tests` — couldn't run locally on this HPC node; relying on CI. Manually audited the negative-validity assertions (`Fe3O4`, `LiF2`, `NeF2` in `test_core.py`) — all hold by integer-charge arithmetic regardless of the looser filter, so no flips expected.
  • CI lint job covers pyright, markdownlint, prettier, nbstripout, blacken-docs

Summary by CodeRabbit

  • Documentation
    • Enhanced explanatory notes in the property prediction workflow with detailed descriptions of filter configuration and selection methods
    • Clarified how oxidation states data is sourced and explained ICSD24 filtering criteria throughout library documentation
    • Modified default screening filter's commonality setting from 'medium' to 'low' for improved default filtering behaviour

Review Change Stack

Aligns documentation and defaults across the three places that expose the
ICSD24 oxidation-state set so the same element no longer silently returns
different lists depending on the code path:

- data_loader.lookup_element_oxidation_states_icsd24: docstring said
  "≥5 reports" but the underlying file actually applies a strict ">5"
  cut-off (species with results_count == 5 are excluded). Corrected the
  docstring and added a pointer to ICSD24FilterConfig so users see the
  static-vs-dynamic threshold difference.

- Element.oxidation_states (__init__.py): expanded the docstring to spell
  out that the attribute is the static >5-reports file, and to note that
  smact_validity applies a different (consensus=3) cut-off by default via
  ICSD24FilterConfig. Points users to ICSD24OxStatesFilter for a
  configurable threshold.

- ICSD24FilterConfig.commonality (screening.py): default changed from
  "medium" to "low" to match ICSD24OxStatesFilter.filter(). Two public
  APIs should not silently apply different defaults for the same
  parameter. Updated the dataclass docstring with what each level means
  and refreshed the smact_validity docstring that quoted the old default.

- docs/examples/property_prediction.ipynb: comment updated to reflect
  the new default.

Behaviour-changing: smact_validity callers that relied on the implicit
"medium" filter will now get a more permissive result. Worth a CHANGELOG
entry on release.

Fixes #627.
@coderabbitai

coderabbitai Bot commented May 28, 2026

Copy link
Copy Markdown
Contributor

Walkthrough

This pull request aligns inconsistent ICSD24 oxidation-state filtering defaults by changing ICSD24FilterConfig.commonality from "medium" to "low" and updating documentation across the library and examples to clarify the static file threshold (>5 reports) versus dynamic filtering defaults.

Changes

ICSD24 Filtering Default Alignment

Layer / File(s) Summary
ICSD24FilterConfig default and smact_validity documentation
smact/screening.py
ICSD24FilterConfig.commonality default parameter changes from "medium" to "low". The smact_validity function docstring is updated to document that new default.
Static and dynamic filtering threshold clarification
smact/__init__.py, smact/data_loader.py
Element.oxidation_states docstring is expanded to explain it loads from a static ICSD24-filtered file (>5 reports) and how this may differ from dynamically applied ICSD24FilterConfig filters. lookup_element_oxidation_states_icsd24 docstring is corrected to state ">5 reports" (≥6) threshold and clarifies the distinction from the dynamic consensus=3 default.
Example notebook commonality update
docs/examples/property_prediction.ipynb
The property prediction example notebook comment is updated to reflect the new commonality='low' default in the screening workflow.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~8 minutes

Possibly related PRs

  • WMD-group/SMACT#432: Modifies commonality handling for SMACT oxidation-state validity filtering in smact/screening.py by changing the default ICSD24FilterConfig.commonality to "low".
  • WMD-group/SMACT#282: Touches smact/screening.py's oxidation-state logic for smact_validity and directly relates to the ICSD24 consensus filter changes.
  • WMD-group/SMACT#436: Updates smact/screening.py ICSD24 filtering behaviour and documentation by changing ICSD24FilterConfig.commonality defaults to "low".

Suggested labels

docs, bug, breaking

Suggested reviewers

  • AntObi
  • hspark1212

🐰 A rabbit hops through filtering thresholds,
"Low" beats "medium" on this fine day,
Static and dynamic now speak truth, no riddles,
Documentation fixed—hopping away! 🌱

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title 'Clarify ICSD24 oxidation-state filtering thresholds (fixes #627)' clearly and specifically describes the main change—documenting and aligning ICSD24 filtering inconsistencies across multiple files.
Linked Issues check ✅ Passed The PR directly addresses all three core objectives from issue #627: documenting which threshold Element.oxidation_states uses, resolving the >5 vs ≥5 discrepancy, and aligning commonality defaults between ICSD24FilterConfig and ICSD24OxStatesFilter.
Out of Scope Changes check ✅ Passed All changes are directly scoped to resolving issue #627; no unrelated modifications to core logic, dependencies, or unrelated features are present.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Description check ✅ Passed The PR description comprehensively documents all changes, linked issue, behaviour implications, and testing approach.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/issue-627-icsd24-oxidation-state-consistency

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@codecov

codecov Bot commented May 28, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 89.31%. Comparing base (72799fe) to head (9a06cec).
⚠️ Report is 4 commits behind head on master.

Additional details and impacted files
@@           Coverage Diff           @@
##           master     #630   +/-   ##
=======================================
  Coverage   89.31%   89.31%           
=======================================
  Files          49       49           
  Lines        4988     4988           
=======================================
  Hits         4455     4455           
  Misses        533      533           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@coderabbitai

coderabbitai Bot commented May 28, 2026

Copy link
Copy Markdown
Contributor

Actionable comments posted: 0

@KingaMas KingaMas merged commit bc862e2 into master May 28, 2026
16 checks passed
@KingaMas KingaMas deleted the fix/issue-627-icsd24-oxidation-state-consistency branch May 28, 2026 13:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Inconsistent ICSD24 oxidation state filtering thresholds

1 participant