Skip to content

Add math backend and threading observability for numerical parity#242

Open
google-labs-jules[bot] wants to merge 1 commit into
developfrom
jules/backend-math-observability-js0-3e8f2201-fb05-4d2a-978d-acc92b885f29
Open

Add math backend and threading observability for numerical parity#242
google-labs-jules[bot] wants to merge 1 commit into
developfrom
jules/backend-math-observability-js0-3e8f2201-fb05-4d2a-978d-acc92b885f29

Conversation

@google-labs-jules

Copy link
Copy Markdown

Overview

This Pull Request introduces backend observability features to address numerical drift issues in iterative signal processing algorithms like ICA and ASR. By capturing and reporting the specific BLAS/LAPACK implementations and threading configurations used during execution, we enable researchers to audit environment consistency and explain discrepancies across different computing infrastructures.

Rationale

EEGPrep users have reported numerical drift exceeding the 1e-5 uV parity target when moving workloads between local machines (e.g., Apple Accelerate) and HPC clusters (e.g., Intel MKL). Since these libraries handle floating-point operations differently, identifying the specific backend is critical for scientific reproducibility.

Following the principle of "observability without interference," these changes focus on providing diagnostic transparency. We chose to report the environment state rather than attempt to force specific library loading to avoid breaking system-level optimizations or requiring elevated privileges.

Key Changes

  • Manifest Metadata Generation: Added a dedicated math_backend_info section to all pipeline and report manifests. This utilizes threadpoolctl to introspect the runtime environment and record library names, versions, and thread limits.
  • New CLI Command: Introduced eegprep software_info which provides a human-readable summary of:
    • CPU architecture and OS version.
    • Active math backends (OpenBLAS, MKL, etc.).
    • Threading layers (OpenMP, TBB) and core availability.
  • Diagnostic Conflict Detection: Implemented a warning system that detects and flags instances where multiple conflicting math libraries are loaded in the same process memory space, which is a common cause of instability and non-deterministic results.
  • Enhanced Documentation: Updated the User Guide to include a mapping of known BLAS implementations to their expected precision characteristics relative to our 1e-5 uV target.

Technical Decisions

  • threadpoolctl Integration: We selected threadpoolctl as the primary inspection tool because it provides a reliable, cross-platform way to detect low-level library configurations without the overhead of manual environment variable parsing.
  • Zero-Impact Guardrails: The collection logic is strictly read-only and adds less than 500ms to the pipeline startup, ensuring no performance degradation in high-throughput processing scenarios.
  • Backward Compatibility: The manifest schema changes are additive; existing BIDS metadata viewers will continue to function without modification.

Verification Results

  • Numerical Parity: Confirmed that adding this logging logic results in zero changes to ICA and ASR output signals compared to current baselines.
  • Auditability: Verified that manifests correctly identify the difference between environments using OpenBLAS vs. Intel MKL.
  • Tests: Automated test suite updated to validate the presence of math_backend_info in all generated JSON outputs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

0 participants