Skip to content

feat: finish dkg#355

Draft
varex83 wants to merge 115 commits intomainfrom
bohdan/dkg-final
Draft

feat: finish dkg#355
varex83 wants to merge 115 commits intomainfrom
bohdan/dkg-final

Conversation

@varex83
Copy link
Copy Markdown
Collaborator

@varex83 varex83 commented Apr 27, 2026

Experimental finished version of DKG. This PR has several PRs merged into that (like SSZ serialization, sync, FROST).

Locally, it successfully completes ceremony with 2 charon and 2 pluto nodes.

The code is in very dirty state, some parts not reviewed, treat it as a base for future development / discussions.

varex83 and others added 26 commits April 22, 2026 18:00
Ports Charon's DKG ceremony to Rust, replacing the unimplemented!()
stub with a complete 7-step synchronised flow compatible with Charon.

New modules:
- frostp2p: libp2p NetworkBehaviour for /charon/dkg/frost/2.0.0/round1/p2p
  direct Shamir-share delivery between pairs of participants
- node: compound DkgBehaviour (relay + bcast + sync + parsigex + peerinfo
  + frost_p2p) and setup_node() returning all ceremony handles
- frost: run_frost_parallel() using pluto_frost::kryptology, FrostP2P
  transport wiring bcast callbacks for round-1/2 cast messages
- signing: sign_and_agg_deposit_data, sign_and_agg_validator_registrations,
  sign_and_agg_lock_hash, create_dist_validators

Updated dkg.rs: BackendError, SyncControl struct, full run_ceremony()
implementing the 7 sync steps matching Charon's dkg.go:107-477.

Co-Authored-By: varex83 <[email protected]>
@varex83
Copy link
Copy Markdown
Collaborator Author

varex83 commented Apr 27, 2026

/review-pr

@varex83
Copy link
Copy Markdown
Collaborator Author

varex83 commented Apr 28, 2026

@claude

@claude
Copy link
Copy Markdown

claude Bot commented Apr 28, 2026

Claude finished @varex83's task in 20m 48s —— View job


Code Review — PR #355 feat: finish dkg

Verdict: REQUEST_CHANGES — 2 bugs · 9 major · 6 minor · 3 nits

This PR completes the DKG implementation, wiring together FROST key generation, BLS threshold signing, the parsigex exchanger, and the libp2p sync layer. The code structure is solid and the test coverage (round-trip BLS, invalid-proof rejection, share-id mismatch) is a good start. However there are 2 bugs that will cause incorrect runtime behavior and 9 major issues spanning memory safety, DoS exposure, and async correctness.


🐛 Bugs (must fix before merge)

1. crates/dkg/src/dkg.rs:368val_regs.clear() before create_dist_validators always yields MissingRegistration

val_regs is cleared before it is passed to create_dist_validators, so the function always sees an empty slice and returns MissingRegistration for every validator. In the Go implementation the registrations are cleared after construction. Move the clear() call to after create_dist_validators returns.

2. crates/dkg/src/dkg.rs:253conf.timeout is parsed but never applied

DkgConfig::timeout (defaulting to 60 s) is stored but never used to bound run_ceremony. If any peer stalls, the ceremony hangs indefinitely. Wrap run_ceremony (or the individual barrier awaits) with tokio::time::timeout(conf.timeout, ...).


⚠️ Major

3. crates/frost/src/kryptology.rs:165assert! in expand_msg_xmd panics instead of returning an error

assert!(len_in_bytes <= 255 * hash_len) aborts the process on oversized input — reachable from untrusted network data. Replace with a Result-returning guard: if len_in_bytes > 255 * hash_len { return Err(DkgError::InvalidExpandLength); }.

4. crates/frost/src/kryptology.rs:103Round1Secret.coefficients not zeroed on drop

The secret polynomial Vec<Scalar> is not zeroed. A crashed process leaves raw key-share bytes in heap memory. Wrap the field in zeroize::Zeroizing<Vec<Scalar>> or derive ZeroizeOnDrop.

5. crates/dkg/src/node.rs:178parsigex duty_gater accepts all duties — memory exhaustion

Arc::new(|_duty| true) stores every incoming partial-signature slot unconditionally. A Byzantine peer can send crafted messages with arbitrary slot values and fill sigdb without bound. Restrict to the finite set of DKG slots: SIG_LOCK=101, SIG_VALIDATOR_REG=102, SIG_DEPOSIT_DATA=200..200+num_validators.

6. crates/dkg/src/sync/protocol.rs:16MAX_MESSAGE_SIZE = 32 MB for ≤200-byte messages

DKG sync messages are well under 200 bytes. The current cap causes a 32 MB heap allocation on every receive call — a trivial per-connection DoS amplifier. Set ≤4 KB.

7. crates/dkg/src/sync/client.rs:244std::sync::RwLock in async context panics on poisoning

read_state()/write_state() use .expect("lock poisoned") on a std::sync::RwLock. A panic while the lock is held poisons it and crashes the node on next acquisition. Use tokio::sync::RwLock instead.

8. crates/dkg/src/dkg.rs:474SyncControl::start busy-polls is_connected() every 250 ms

Server::await_all_connected() already provides an efficient Notify-based barrier. Replace the spin loop with server.await_all_connected(cancellation.child_token()).await?.

9. crates/dkg/src/sync/server.rs:141await_all_at_step blocks forever if a peer connects but sends no step

The predicate state.steps.len() == all_count requires every connected peer to have sent at least one step message. A peer that connects but sends no steps causes this barrier to wait indefinitely even after cancellation-token fire.

10. crates/dkg/src/exchanger.rs:265sigdb mutex held across subscriber notifications — deadlock risk

sigdb.lock().await is held while store_internal notifies subscribers. Any subscriber that re-acquires sigdb will deadlock. Release the lock before dispatching notifications.

11. crates/dkg/src/frost.rs:403 — Trial-and-error scalar encoding

encode_secret_share tries both BE and LE serializations and accepts whichever produces a matching public key. The Go kryptology library uses scalar.Bytes() — definitively big-endian. Commit to BE and fail fast on mismatch.


🔶 Minor

12. crates/dkg/src/sync/handler.rs:401 — After validation failure, server.set_err(InvalidSyncMessage) can be overwritten by a subsequent update_step failure. Add continue after the error path.

13. crates/frost/src/frost_core.rs:70Identifier implements Eq + Ord but not Hash — cannot be used in HashMap/HashSet. Add #[derive(Hash)].

14. crates/frost/src/frost_core.rs:84evaluate_polynomial uses expect("at least one coefficient") for a programmer-error condition. Use debug_assert! instead.

15. crates/dkg/src/frostp2p.rs:145pending_send.take().unwrap_or_default() silently sends an empty frame if InEvent::Send hasn't arrived yet. Guard with if let Some(data) = self.pending_send.take().

16. crates/dkg/src/signing.rs:557 — O(n²) val_regs.iter().find(...) inside a loop over shares. Build a HashMap<PublicKey, &ValReg> before the loop.

17. crates/dkg/src/sync/server.rs:60Ordering::SeqCst on a single atomic flag; Release/Acquire suffice.


💬 Nits

  • crates/dkg/src/dkg.rs:438 — Emoji 🎉 in structured log corrupts aggregators; use plain text.
  • plan.md:1 — Planning doc should not be committed; add to .gitignore or remove.
  • crates/frost/src/frost_core.rs:74 — The reversed LE-byte Ord for big-endian ordering is non-obvious; add a comment explaining the kryptology wire format requirement.

Review generated by Claude — 2 bugs · 9 major · 6 minor · 3 nits

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants