feat: implement deep dependency tracking in scientific manifests#246
Open
google-labs-jules[bot] wants to merge 1 commit into
Open
Conversation
This commit modifies software_info to capture all active packages from sys.modules and includes their versions and source origins using importlib.metadata.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Overview
This PR enhances the manifest generation utility to include a comprehensive snapshot of the scientific computational environment. Previously, manifests only recorded the versions of Python and
eegprep, which was insufficient for diagnosing numerical drift caused by variations in critical dependencies like NumPy or SciPy.Rationale
Scientific reproducibility depends heavily on the specific versions and sources of underlying libraries. Identical pipeline code can produce different results if one environment uses a stable PyPI release while another uses a custom Git-based build or a local development version.
By capturing "deep" environment metadata, we allow researchers to perform high-fidelity audits of their analysis runs, ensuring that variations in results can be traced back to specific environmental discrepancies rather than flaws in the logic.
Key Changes
src/eegprep/cli/core.pyto inspectsys.modulesat runtime. This ensures that only packages actually active during the execution are recorded.importlib.metadatato extract not just version strings, but also the origin of the package (e.g., PyPI registry vs. Direct URL/Git).software_infoblock into the centralized manifest generation logic.Technical Decisions
importlib.metadataoverpkg_resources: We chose the standard library's metadata utility to avoid the performance overhead and deprecation issues associated withsetuptools.Success Criteria Verification
software_infoblock in the output manifest.