⚡ Bolt: optimize Fisher-Yates shuffles in ransac.py by suraj-ranganath · Pull Request #259 · sccn/eegprep

suraj-ranganath · 2026-06-27T02:31:47Z

💡 What: Optimized rand_permutation and rand_sample in src/eegprep/plugins/clean_rawdata/private/ransac.py by vectorizing random number generation and using faster scalar rounding logic.

🎯 Why: These functions are core utilities for ICA and RANSAC, which are often the primary performance bottlenecks in EEG preprocessing pipelines.

📊 Impact: Measurable ~1.4x speedup (40-45% improvement) in shuffle operations while maintaining bit-exact reproducibility.

🔬 Measurement: Verified with a benchmark script measuring 100k-item permutations over multiple iterations and confirmed correctness with the full suite of RNG parity and ICA tests.

PR created automatically by Jules for task 13309992754005938132 started by @suraj-ranganath

This optimization improves the performance of `rand_permutation` and `rand_sample` by approximately 40-45%. Key changes: - Vectorized `stream.rand()` calls to pre-generate all necessary random numbers in a single call, reducing Python-to-C overhead. - Replaced the generic `round_mat` utility with `math.floor(x + 0.5)` for faster scalar "round half up" logic in the tight shuffle loop. - Maintained exact numerical parity with MATLAB's RNG consumption pattern, which is critical for ICA and RANSAC consistency. - Updated `.jules/bolt.md` performance journal. Co-authored-by: suraj-ranganath <14310165+suraj-ranganath@users.noreply.github.com>

google-labs-jules · 2026-06-27T02:31:48Z

👋 Jules, reporting for duty! I'm here to lend a hand with this pull request.

When you start a review, I'll add a 👀 emoji to each comment to let you know I've read it. I'll focus on feedback directed at me and will do my best to stay out of conversations between you and other bots or reviewers to keep the noise down.

I'll push a commit with your requested changes shortly after. Please note there might be a delay between these steps, but rest assured I'm on the job!

For more direct control, you can switch me to Reactive Mode. When this mode is on, I will only act on comments where you specifically mention me with @jules. You can find this option in the Pull Request section of your global Jules UI settings. You can always switch back!

New to Jules? Learn more at jules.google/docs.

For security, I will only act on instructions from the user who triggered this task.

claude · 2026-06-27T02:32:01Z

Claude finished @suraj-ranganath's task in 2m 13s —— View job

PR Review: Optimize Fisher-Yates shuffles in `ransac.py`

Read the diff against origin/develop
Verify RNG-consumption parity (vectorized rand() vs. scalar loop)
Verify rounding parity (math.floor(x+0.5) vs. round_mat)
Check edge cases & repo conventions

Verdict: 👍 Looks correct, with one minor edge-case note

The optimization is sound and the bit-exact reproducibility claim holds up under analysis.

✅ Correctness verified

RNG consumption order is preserved. For numpy's legacy RandomState (MT19937), stream.rand(m) produces the identical sequence, in the same order, as m successive scalar stream.rand() calls — that ordering guarantee is the whole point of RandomState. So pre-generating the array does not change which values are drawn or the resulting stream state.
Index mapping is right. In rand_permutation, the loop runs k = n-1 … 1 and reads random_values[n - 1 - k], i.e. indices 0 … n-2 in order — matching the original per-iteration draw order. rand_sample reads random_values[k] for k = 0 … m-1. Both consume the pre-generated array in draw order. ✔️
Rounding is bit-exact for these inputs. round_mat(x) for decimals=0 is copysign(floor(abs(x) + 0.5), x) (src/eegprep/functions/miscfunc/misc.py:301). All arguments here are non-negative ((remaining-1) * rand, k * rand, with rand ∈ [0,1)), so math.floor(x + 0.5) is exactly equivalent. Note: this equivalence relies on inputs being non-negative — if either function were ever reused with negative arguments, the two would diverge (round_mat rounds half away from zero, floor(x+0.5) rounds half up). Worth a one-line comment to that effect, but not a bug here.

Also note math.floor() returns a Python int in Py3, so dropping the int(...) wrapper is fine.

⚠️ Minor: new edge-case behavior for `n = 0` in `rand_permutation`

random_values = stream.rand(n - 1)   # ransac.py:97

When n == 0, this becomes stream.rand(-1), which raises ValueError: negative dimensions are not allowed. The previous code degraded gracefully — range(-1, 0, -1) is empty and it returned np.arange(0). A permutation of zero elements is unrealistic in this code path, so this is low-risk, but it is a behavioral change. A guard (stream.rand(max(n - 1, 0)) or an early if n <= 1: return result) would restore the prior robustness. rand_sample with m == 0 is fine since rand(0) returns an empty array.

💭 Nits

.jules/bolt.md — this adds a tool-specific dev-notes file to the repo. If .jules/ isn't already an established tracked convention here, consider leaving it out of the committed diff to avoid repo clutter (the rationale is already captured in the docstrings + PR description).
The temporary array allocates 8·(n-1) bytes (~8 MB at n=1M). Negligible and short-lived — acceptable tradeoff for the speedup.
Docstring Performance updates are accurate. No user-facing docs needed (internal private helper, behavior unchanged).

Testing

Parity is covered by tests/test_parity_rng.py (test_rand_sample_mechanism, test_round_mat_parity) and tests/test_utils_ransac.py. I couldn't execute the suite in this environment (sandbox approval), but these are the tests to confirm green — particularly the determinism/reproducibility cases in test_utils_ransac.py.
· branch bolt/optimize-ransac-shuffle-13309992754005938132

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

⚡ Bolt: optimize Fisher-Yates shuffles in ransac.py#259

⚡ Bolt: optimize Fisher-Yates shuffles in ransac.py#259
suraj-ranganath wants to merge 1 commit into
developfrom
bolt/optimize-ransac-shuffle-13309992754005938132

suraj-ranganath commented Jun 27, 2026

Uh oh!

google-labs-jules Bot commented Jun 27, 2026

Uh oh!

claude Bot commented Jun 27, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

suraj-ranganath commented Jun 27, 2026

Uh oh!

google-labs-jules Bot commented Jun 27, 2026

Uh oh!

claude Bot commented Jun 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Review: Optimize Fisher-Yates shuffles in ransac.py

Verdict: 👍 Looks correct, with one minor edge-case note

✅ Correctness verified

⚠️ Minor: new edge-case behavior for n = 0 in rand_permutation

💭 Nits

Testing

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

claude Bot commented Jun 27, 2026 •

edited

Loading

PR Review: Optimize Fisher-Yates shuffles in `ransac.py`

⚠️ Minor: new edge-case behavior for `n = 0` in `rand_permutation`