feat(blockchain): cap aggregation job children to 2 (binary merge) by MegaRedHand · Pull Request #482 · lambdaclass/ethlambda

MegaRedHand · 2026-06-30T20:43:35Z

Summary

select_proofs_greedily (committee-signature aggregation worker, interval 2) picked every existing proof that added new validator coverage, with no upper bound on how many became children of one aggregation job.
Caps the combined new+known selection at 2 children per job, forcing pairwise/binary merges instead of N-way merges.
Proofs left uncovered by the cap aren't dropped — they remain in the new/known payload buffers and get merged in a subsequent aggregation round.

Why

Uncapped child counts have previously driven debug builds to stack-overflow in leanVM's rec_aggregation, and an unbounded N-way merge approaches leanVM's own hard cap (MAX_RECURSIONS = 16), beyond which a group fails to aggregate outright. Bounding to 2 keeps recursive verification cost per round small and predictable.

Scoped to the interval-2 aggregation worker (crates/blockchain/src/aggregation.rs) only; block-building compaction and reaggregate.rs have their own greedy-selection sites left untouched.

Test plan

cargo test -p ethlambda-blockchain --lib — 42 passed (3 new tests covering: cap within one set, cap across new+known combined, early-stop when no new coverage)
cargo fmt --all -- --check
cargo clippy -p ethlambda-blockchain --all-targets -- -D warnings

select_proofs_greedily picked as many existing proofs as fit the greedy coverage criterion, with no upper bound. Uncapped child counts have previously driven debug builds to stack-overflow in leanVM's rec_aggregation, and an unbounded N-way merge isn't far from leanVM's own hard cap (MAX_RECURSIONS = 16), at which point a group would fail to aggregate outright. Cap each aggregation job at 2 children, forcing pairwise (binary) merges. Proofs left uncovered by the cap stay in the new/known buffers and get merged in a later aggregation round instead of being dropped.

github-actions · 2026-06-30T20:44:59Z

🤖 Kimi Code Review

Overall Assessment: This is a well-executed safety fix that prevents stack overflow in recursive verification by capping aggregation arity to binary merges. The implementation is correct and the test coverage is thorough.

Minor Suggestions

crates/blockchain/src/aggregation.rs

Lines 324-330: Consider adding a debug_assert!(MAX_AGGREGATION_CHILDREN >= 2, "binary merge requires at least 2 children") near the constant definition to prevent accidental misconfiguration if this value is ever made configurable.
Lines 489-504 (test helper): The proof_for_validator helper uses vid as usize + 1 for the bit vector length. While validator indices in consensus are bounded by the active validator set size (well within usize limits), consider saturating or asserting on vid to avoid theoretical overflow on 32-bit platforms during tests:
```
let len = (vid as usize).checked_add(1).expect("validator index too large");
```
Line 350: The while condition selected.len() < MAX_AGGREGATION_CHILDREN correctly implements the cap. However, since MAX_AGGREGATION_CHILDREN is a small constant (2), you could make the bound explicit in the type system using smallvec or ArrayVec to avoid heap allocations for selected, though this is purely an optimization and not necessary for correctness.

Consensus Safety Check

Line 327: The comment notes that uncapped greedy selection previously caused stack overflows in rec_aggregation. Confirm that the binary merge constraint (max 2 children) produces a verification tree depth of O(log n), which is safe for the leanVM recursion limit. The mathematical associativity of BLS aggregation ensures that binary merging produces the same final signature as N-way merging, so this does not affect consensus validity.

Code Quality

Lines 487-537: The unit tests effectively verify the capping behavior across the three critical cases (within one set, across both sets, and early termination). The test naming is descriptive.

Conclusion: LGTM. The change correctly bounds recursion depth while preserving liveness (uncovered proofs remain in buffers for subsequent rounds) and safety (aggregation correctness is maintained).

Automated review by Kimi (Moonshot AI) · kimi-k2.5 · custom prompt

github-actions · 2026-06-30T20:45:11Z

🤖 Codex Code Review

Findings

The new “later round will pick them up” assumption is not true for known-only leftovers, so the cap can strand partial merges permanently. In aggregation.rs, pure recursive merge work is only scheduled from new_payload_keys(). After a capped round, the merged proof and any leftover proof are promoted from new to known at store.rs. On the next interval, if there are no fresh gossip sigs and no fresh new proof for that data_root, snapshot_aggregation_inputs returns no job, so the remaining known proofs never get another merge pass. With 3 disjoint proofs, this regresses from “fully converge in one round” to “can stop forever at two proofs”.
The cap is applied before considering known_proofs, which can discard the highest-coverage reuse candidates and force unnecessary XMSS aggregation from scratch. In aggregation.rs, the loop fully consumes the 2-child budget from new_proofs before looking at known_proofs. A case like new = [{0}, {1}], known = [{0..63}] now selects the two tiny new proofs and ignores the large existing proof entirely. That is a performance regression, and on a tight interval-2 budget it also increases the chance of missing publication windows.

Notes

I did not see a direct memory-safety or consensus-validation bug in the narrow diff itself, but Item 1 is a real aggregation-progress regression.

I could not run cargo test here because the sandboxed environment blocked the required Rust toolchain/network setup.

Automated review by OpenAI Codex · gpt-5.4 · custom prompt

github-actions · 2026-06-30T20:45:57Z

🤖 Claude Code Review

Good, confirms test helpers are valid. I now have enough context to write the review.

Review: cap aggregation job children to 2 (binary merge)

Overall: Sound, narrowly-scoped fix with good test coverage. The core logic in select_proofs_greedily (crates/blockchain/src/aggregation.rs:340-381) correctly caps the combined selection across new_proofs/known_proofs at MAX_AGGREGATION_CHILDREN, and the early break at line 375-377 is a legitimate micro-optimization (avoids materializing a remaining Vec for known_proofs once the cap is already hit), not dead code.

A few things worth confirming/calling out:

No buffer leak from capping — I verified that proofs left unselected this round aren't orphaned: PayloadBuffer::push (crates/storage/src/store.rs:172-218) already prunes any existing proof that becomes a strict subset of an incoming one. Since a newly-aggregated proof is always a superset of its (≤2) children's participants, those children get auto-evicted on insert. So repeated binary merges converge correctly over subsequent rounds without buffer growth — good, this wasn't obviously guaranteed from the diff alone.
Residual unbounded merges elsewhere (acknowledged scope limit, but worth tracking as a follow-up) — crates/blockchain/src/reaggregate.rs:146-175 still builds an uncapped children vec from candidate.local_partials before calling aggregate_proofs, and block-building compaction has its own greedy-selection site too. If the stack-overflow/MAX_RECURSIONS=16 risk this PR addresses is real for interval-2 aggregation, the same N-way recursive merge risk exists on the reaggregation path triggered by importing a block with many local partials. The PR description calls this out explicitly as out-of-scope, which is reasonable for a minimal fix, but it'd be good to confirm there's a tracked follow-up rather than this being forgotten.
Liveness/latency tradeoff, not a bug — capping to binary merges means full coverage of heavily-fragmented proof sets (e.g., after a partition or catch-up) now takes O(log n) interval-2 rounds instead of one. Worth confirming this doesn't risk attestations missing the interval-4 promotion window under adversarial fragmentation (many small proofs for the same data root), even though each individual round is bounded and cheap. No action needed unless devnet testing shows multi-round convergence interacting badly with the 750ms per-session deadline.
Tests — the three new unit tests correctly target the meaningful cases (intra-set cap, cross-set combined cap, early-stop on no new coverage). No notes.
Style — doc comment on MAX_AGGREGATION_CHILDREN is clear and explains the why (debug stack overflow, leanVM MAX_RECURSIONS), matching the project's convention of justifying non-obvious constants.

No correctness, security, or memory-safety issues found in the diff itself. The change is a safe, well-justified narrowing of existing behavior.

Automated review by Claude (Anthropic) · sonnet · custom prompt

greptile-apps · 2026-06-30T20:46:01Z

Greptile Summary

This PR changes interval-2 proof aggregation to use binary merges. The main changes are:

Adds a two-child cap for aggregation jobs.
Stops greedy proof selection once the combined cap is reached.
Adds unit tests for capped selection behavior.

Confidence Score: 5/5

The changed flow looks mergeable after a small cleanup to proof selection ordering.

Unselected proofs remain buffered for later aggregation rounds.
The new test helper uses symbols and call patterns already used in this crate.
Mixed new and known proof sets can choose lower-coverage children than needed when new proofs fill the cap first.

crates/blockchain/src/aggregation.rs

Important Files Changed

Filename	Overview
crates/blockchain/src/aggregation.rs	Adds the binary child cap and tests it, with one ordering issue where new proofs can fill the cap before better known proofs are considered.

Prompt To Fix All With AI

Fix the following 1 code review issue. Work through them one at a time, proposing concise fixes.

---

### Issue 1 of 1
crates/blockchain/src/aggregation.rs:347
**Priority Bucket Hides Better Proofs**

When `new_proofs` contains two small disjoint proofs, this loop reaches the child cap before `known_proofs` is examined. A known proof for the same root that covers many more validators is skipped, so `build_job` receives a smaller `covered` set and can produce a heavier, lower-coverage aggregate even though the same two-child limit could have reused the better proof.

_{Reviews (1): Last reviewed commit: "feat(blockchain): cap aggregation job ch..." | Re-trigger Greptile}

greptile-apps · 2026-06-30T20:46:05Z

@@ -337,7 +347,7 @@ fn select_proofs_greedily(
    for proof_set in [new_proofs, known_proofs] {


Priority Bucket Hides Better Proofs

When new_proofs contains two small disjoint proofs, this loop reaches the child cap before known_proofs is examined. A known proof for the same root that covers many more validators is skipped, so build_job receives a smaller covered set and can produce a heavier, lower-coverage aggregate even though the same two-child limit could have reused the better proof.

Prompt To Fix With AI

This is a comment left during a code review. Path: crates/blockchain/src/aggregation.rs Line: 347 Comment: **Priority Bucket Hides Better Proofs** When `new_proofs` contains two small disjoint proofs, this loop reaches the child cap before `known_proofs` is examined. A known proof for the same root that covers many more validators is skipped, so `build_job` receives a smaller `covered` set and can produce a heavier, lower-coverage aggregate even though the same two-child limit could have reused the better proof. How can I resolve this? If you propose a fix, please make it concise.

greptile-apps Bot reviewed Jun 30, 2026

View reviewed changes

MegaRedHand added 2 commits June 30, 2026 17:46

test: remove unit tests

dcba9da

docs: improve documentation

a93dc42

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(blockchain): cap aggregation job children to 2 (binary merge)#482

feat(blockchain): cap aggregation job children to 2 (binary merge)#482
MegaRedHand wants to merge 3 commits into
mainfrom
limit-aggregation-children-to-2

MegaRedHand commented Jun 30, 2026

Uh oh!

github-actions Bot commented Jun 30, 2026

Uh oh!

github-actions Bot commented Jun 30, 2026

Uh oh!

github-actions Bot commented Jun 30, 2026

Uh oh!

greptile-apps Bot commented Jun 30, 2026

Important Files Changed

Uh oh!

greptile-apps Bot Jun 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

		@@ -337,7 +347,7 @@ fn select_proofs_greedily(
		for proof_set in [new_proofs, known_proofs] {

Uh oh!

Conversation

MegaRedHand commented Jun 30, 2026

Summary

Why

Test plan

Uh oh!

github-actions Bot commented Jun 30, 2026

🤖 Kimi Code Review

Uh oh!

github-actions Bot commented Jun 30, 2026

🤖 Codex Code Review

Uh oh!

github-actions Bot commented Jun 30, 2026

🤖 Claude Code Review

Review: cap aggregation job children to 2 (binary merge)

Uh oh!

greptile-apps Bot commented Jun 30, 2026

Greptile Summary

Confidence Score: 5/5

Important Files Changed

Uh oh!

greptile-apps Bot Jun 30, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant