Skip to content

FROST/ROAST readiness branch#3866

Draft
mswilkison wants to merge 481 commits into
mainfrom
feat/frost-schnorr-migration-scaffold
Draft

FROST/ROAST readiness branch#3866
mswilkison wants to merge 481 commits into
mainfrom
feat/frost-schnorr-migration-scaffold

Conversation

@mswilkison

@mswilkison mswilkison commented Feb 19, 2026

Copy link
Copy Markdown
Contributor

Current State (as of 2026-05-17)

This draft PR is the umbrella readiness branch for feat/frost-schnorr-migration-scaffold.
It is being kept current with main so it can become a direct merge target if the FROST/ROAST stack is approved for activation.

It remains in draft until the remaining phase-gate, governance, and cross-repository readiness items are closed.

Canonical Status Sources

  • Cross-repo migration tracker: docs/frost-migration/external-repository-tracking.md (in tlabs-xyz/tbtc)
  • Companion tBTC umbrella draft: https://github.com/tlabs-xyz/tbtc/pull/10
  • Latest readiness audit: docs/reviews/frost-roast-production-readiness-2026-05-16.md (in tlabs-xyz/tbtc)

Latest Refresh

  • Merged current main into this branch.
  • Local verification passed for the FROST signing package and tBTC signer backend paths, with and without frost_native.
  • Local verification also passed the native TBTC signer-path tests covering the FFI signing primitive and signing executor.

Remaining Cross-Repo Closure Items

  • Wait for CI from the latest refresh to complete.
  • Capture the first post-fix funded nightly live run artifact for Phase 4.
  • Record final approver signoff in the Phase 4 decision/packet docs.
  • Execute external org archive/redirect mapping and record results.

Notes

  • Keep this PR in draft until the activation decision is explicit.
  • Treat it as the readiness branch for the integrated keep-core side of the stack, not only a historical index.

@mswilkison mswilkison changed the title Draft: Add Schnorr/FROST migration scaffold package and RFC Draft: Add Schnorr/FROST scaffold and tBTC runtime signing adapter slice Feb 20, 2026
mswilkison added a commit that referenced this pull request Feb 26, 2026
## Summary
- cut over the `frost_tbtc_signer` bootstrap path to return coarse
tbtc-signer signature output on successful `RunDKG -> StartSignRound ->
FinalizeSignRound`
- keep legacy signing fallback only for verified coarse-path failures
(bridge errors, decode failures, or structural divergence)
- wire `BuildTaprootTx` through the transitional native tbtc-signer
orchestration path
- gate `BuildTaprootTx` signing substitution on strict native-vs-legacy
transaction input/output equivalence checks
- add coarse success/fallback telemetry and observer-registration guards
- expand unit and integration coverage for coarse cutover,
retry/attempt-variation behavior, and `BuildTaprootTx` substitution
safety

## Stack Context
- base branch: `feat/frost-schnorr-migration-scaffold` (`#3866`)
- recommended review order:
  1. review `#3866` for scaffold/runtime seams
  2. review this PR as the cutover + hardening delta

## Review Guide (hot paths)
- coarse cutover + fallback semantics:
- `pkg/frost/signing/native_ffi_primitive_transitional_frost_native.go`
-
`pkg/frost/signing/native_frost_engine_tbtc_signer_registration_frost_native.go`
- `BuildTaprootTx` wiring and substitution gating:
  - `pkg/tbtc/wallet.go`
-
`pkg/tbtc/native_tbtc_signer_build_taproot_tx_frost_native_tbtc_signer.go`
  - `pkg/bitcoin/transaction_builder.go`
- coverage for tx assembly/substitution and bridge safety:
  - `pkg/tbtc/wallet_sign_transaction_build_taproot_tx_test.go`
  - `pkg/bitcoin/transaction_builder_test.go`
-
`pkg/frost/signing/native_frost_engine_tbtc_signer_registration_frost_native_test.go`

## Scope Boundaries
- in scope: bootstrap/coarse-path cutover hardening and safe
`BuildTaprootTx` integration
- out of scope: full production signer-runtime replacement and later
migration phase gates
@mswilkison mswilkison changed the title Draft: Add Schnorr/FROST scaffold and tBTC runtime signing adapter slice Draft (Umbrella): keep-core FROST/ROAST migration scaffold tracker (not for direct merge) Mar 1, 2026
@mswilkison mswilkison changed the title Draft (Umbrella): keep-core FROST/ROAST migration scaffold tracker (not for direct merge) Draft: keep-core FROST/ROAST readiness branch May 17, 2026
@mswilkison

Copy link
Copy Markdown
Contributor Author

Readiness evidence update for the tBTC Schnorr FROST/ROAST migration stack, 2026-05-20.

From a clean worktree at PR head 37b2ce78348c4ab1c4a98eda8adcf99fa3d9aa1e, the focused integration-tag package lane passed:

go test -timeout 20m -tags 'integration frost_native frost_tbtc_signer' ./pkg/frost/... ./pkg/tbtc

Observed package coverage:

  • pkg/frost
  • pkg/frost/retry
  • pkg/frost/roast
  • pkg/frost/signing
  • pkg/tbtc (221.475s)

This narrows the keep-core evidence gap for FROST/tBTC focused package behavior, but it is not a production-readiness substitute for full keep-core integration/testnet coverage. The following remain open blockers for the tBTC FROST/ROAST readiness gate:

  • full go test -tags=integration ./... or equivalent full-stack current integration evidence
  • client-integration-test
  • deployment/testnet lanes
  • funded production-like wallet/sign/deposit/redemption/fraud/rollback run
  • operator rehearsal and signoff
  • final maintainer/security/runtime/governance acceptance

The corresponding tBTC evidence docs were pushed in tlabs-xyz/tbtc#402.

@mswilkison mswilkison marked this pull request as ready for review May 22, 2026 20:07
@mswilkison mswilkison changed the title Draft: keep-core FROST/ROAST readiness branch FROST/ROAST readiness branch May 22, 2026
@mswilkison mswilkison marked this pull request as draft May 22, 2026 20:32
mswilkison added a commit that referenced this pull request May 22, 2026
… at init (#3958)

## Summary

Addresses three FFI-safety findings from an independent review of #3866:

- **H3 (init-time panic)**:
`RegisterNativeExecutionFFISigningPrimitiveForBuild` and
`registerNativeExecutionAdapterForBuild` (frost_native) panic on
registration failure. Both are invoked from
`pkg/frost/signing/native_adapter_registration.go`'s package `init()`,
so a transient registration failure crashes the binary at startup.
Downstream code (`pkg/frost/signing/backend.go`) already returns
`ErrNativeCryptographyUnavailable` when no native adapter is registered,
so the legacy execution backend remains the safe-by-default path —
panicking at init turned a recoverable degradation into an outage.

Replace panics with structured `logger.Warnf` plus a package-level
`lastRegistrationError` and `LastNativeRegistrationError()` accessor.
Callers that want to fail startup on a registration error can opt in by
checking that accessor after `RegisterNativeExecutionAdapterForBuild`;
default callers continue booting with the legacy backend, exactly as if
`frost_native` was never enabled. The existing
`TestRegisterNativeExecutionFFISigningPrimitiveForBuild_ProviderErrorPanics`
becomes `..._ProviderErrorIsRecordedNotPanicked` and asserts the new
behavior.

- **M1 (nil ptr free)**: `parseBuildTaggedTBTCSignerResult`
unconditionally deferred `C.tbtc_signer_free_buffer(result.buffer.ptr,
result.buffer.len)` even when the C wrapper's status-code -1 path
returned `result.buffer.ptr == NULL`. The C wrapper checks the
`frost_tbtc_free_buffer` symbol for NULL but does not check the buffer
pointer, so a future Rust-side change that dereferenced its ptr argument
without a NULL guard would crash. Skip the defer when `result.buffer.ptr
== nil`.

- **M6 (unbounded length)**: `unmarshalSignerMaterialFromPersistence`
accepted any uvarint length within the data buffer. A corrupted state
file or hostile peer carrying a multi-hundred-MiB envelope would
allocate that many bytes before the existing bounds check ran. Cap the
format length at 256 bytes and the payload length at 256 KiB —
comfortably above any real signer material envelope — and reject earlier
with a clear error. New regression tests
`TestUnmarshalSignerMaterialFromPersistence_RejectsOversizedFormatLength`
and `..._RejectsOversizedPayloadLength`.

## Out of scope (deferred)

The remaining placeholder-fencing findings from the same review (H1:
\`KeyGroupSource == \"legacy-wallet-pubkey\"\` fallback; H2: DKG
placeholder participant pubkeys; H4: silent key-group substitution when
source is legacy) require maintainer policy alignment on whether to gate
the \`frost_tbtc_signer\` build behind an opt-in flag or
refuse-by-default. Not included here.

Several MED findings around Bitcoin witness preservation, FROST message
channel back-pressure, and replay-error string matching also require
behavior decisions and are not included in this safety-hygiene slice.

## Verification

Local (GOCACHE under \`/private/tmp\`):

- \`go test ./pkg/frost/...\` — PASS
- \`go test -tags 'frost_native frost_tbtc_signer' ./pkg/frost/...\` —
PASS
- \`go test ./pkg/tbtc -run
'TestUnmarshalSignerMaterial|TestMarshalSigner|TestSignerMarshalling|TestFuzzDecodeNativeSignerMaterial'\`
— PASS
- \`go test -tags 'frost_native frost_tbtc_signer' ./pkg/tbtc -run
'TestConfigureFrostSigningBackend|TestNewNode_ConfiguresFrostSigningBackend|TestSigningExecutor_Sign|TestRegisterSignerMaterialResolverForBuild'\`
— PASS
- \`go vet ./pkg/frost/... ./pkg/tbtc\` — clean
mswilkison added a commit that referenced this pull request May 22, 2026
… message hygiene (#3959)

## Summary

Bundles four findings from the independent PR #3866 review that all sit
in the same code seam (frost_native scaffold path + receive loops).
Stacked on #3958.

### H1+H4 — scaffold key-group must be opt-in (was silently accepted)

\`signer_material_resolver_build_frost_native_tbtc_signer.go\` built
signer material with \`KeyGroupSource: \"legacy-wallet-pubkey\"\` (a
sha256 placeholder, not a DKG output) and the FFI primitive in
\`native_ffi_primitive_transitional_frost_native.go\` silently
substituted the Rust signer's RunDKG key group when the source was that
placeholder. Production deployments with placeholder material would have
signed through whatever key group the Rust side returned without
operator-facing signal.

Add a refuse-by-default opt-in:
\`KEEP_CORE_FROST_TBTC_SIGNER_ACCEPT_SCAFFOLD_KEY_GROUP=1\`. The new
\`signing.AcceptScaffoldKeyGroupEnabled\` helper is per-call (not
cached), so flipping the env unset recovers fail-closed behavior without
restart. Both the resolver and the FFI primitive check the flag; both
refuse with a clear error that names the env var and the placeholder
source. New regression test pins the refuse-by-default path; existing
scaffold-using tests opt in via \`t.Setenv\`.

### M2+M3 — Bitcoin witness restoration refuses unsupported shapes

\`ReplaceUnsignedTransaction\`'s restoration path handled only
single-element previous witnesses (P2WSH redeem script). Multi-element
witnesses (P2TR script-path) were silently dropped. Replace with an
explicit switch: 0 elements → leave empty, 1 → restore as before, ≥2 →
fail loudly. Removes the tautological inner \`len(replacedInput.X) ==
0\` checks that the outer refusals already guarantee. New regression
test
\`TestTransactionBuilder_ReplaceUnsignedTransaction_RejectsMultiElementPreviousWitness\`.

### M5 — first-write-wins on peer messages

Three round-message receive loops (tbtc-signer contribution, FROST round
one, FROST round two) did last-write-wins, letting a peer mutate its own
contribution after first send. Switch to first-write-wins with
byte-equal retransmissions idempotent and conflicting retransmissions
logged via a new \`protocolLogger\` channel. Three message-equality
helpers cover the three message types.

## Out of scope (deferred to separate PRs)

- **H2** — DKG placeholder participant pubkeys
(\`buildTaggedTBTCSignerDKGPlaceholderPublicKeyHex\`) needs either
wiring real \`MembershipValidator\` pubkeys through or fencing under the
same env flag.
- **M4** — ROAST-compliant bounded transition evidence for the
non-blocking message channel. Multi-PR effort.
- **M7** — Real ROAST-aware retry replacing the byte-identical tECDSA
shuffle in \`pkg/frost/retry/retry.go\`. Multi-PR effort.
- **L5** — FFI status-code semantics for replay detection. Paired with a
tbtc-signer follow-up.

## Verification

Local (GOCACHE under \`/private/tmp\`):

- \`go test ./pkg/frost/... ./pkg/bitcoin\` — PASS
- \`go test -tags 'frost_native frost_tbtc_signer' ./pkg/frost/...
./pkg/bitcoin\` — PASS
- \`go test -tags 'frost_native frost_tbtc_signer' ./pkg/tbtc -run
'TestConfigureFrostSigningBackend|TestNewNode_ConfiguresFrostSigningBackend|TestSigningExecutor_Sign|TestRegisterSignerMaterialResolverForBuild|TestBuildTaggedTBTCSignerRoundKeyGroup|TestBuildTaggedLegacyCompatibleNativeExecutionFFISigningPrimitive|TestTransactionBuilder_ReplaceUnsignedTransaction'\`
— PASS
mswilkison added a commit that referenced this pull request May 22, 2026
…3962)

## Summary

Adds **RFC-21** as the design doc that scopes the M4 (transition
evidence) and M7 (ROAST-aware retry) findings from the independent
review of #3866 into a single layered design and a phased,
PR-sized implementation plan.

This PR is **doc-only**. It introduces no behaviour change. Subsequent
implementation PRs reference RFC-21 in their descriptions.

Stacked on #3961.

## Why one design, not two

M4 and M7 share the same notion of *attempt context* and *transition
evidence*:

- Fixing M4 alone produces evidence that no consumer reads.
- Fixing M7 alone gives the consumer nothing to drive retry decisions
on.

The RFC treats them as one design split into linear phases.

## Phasing

- **Phase 0** -- this RFC.
- **Phase 1** -- `AttemptContext` type + canonical hash; protocol
  messages carry attempt-context binding (optional during migration).
- **Phase 2** -- receiver overflow tracking (M4 layer A) plumbed
  through the three `select { default }` drop sites, default no-op.
- **Phase 3** -- coordinator state machine: `BeginAttempt`,
  `RecordEvidence`, `NextAttempt`. Deterministic
  `(AttemptContext, TransitionEvidence) -> AttemptContext` map.
- **Phase 4** -- wire receiver to coordinator behind
  `frost_roast_retry` build tag.
- **Phase 5** -- retry adapter +
  `EvaluateRoastRetryForSigning`; migrate first call site behind
  the build tag with readiness-gate guard.
- **Phase 6** -- migrate remaining call sites; delete the
  byte-identical-to-tECDSA shuffle once unused.
- **Phase 7** -- flip the readiness manifest to `present` once Phase
  6 ships and integration tests run against a real testnet (only
  then; no early flip).

## Open questions called out explicitly

The RFC lists four open design questions that need cross-team
review before Phase 3 lands:

1. Cross-process coordinator agreement -- gossip topic choice.
2. Persistence across signer restart.
3. FFI surface (Rust signer error-code style; follows the L5
   pattern from #425 / #3961).
4. Backward-compat horizon for the `AttemptContextHash` field.

## Out of scope

- DKG retry (separate RFC).
- Bitcoin transaction-builder changes.
- Operator UX changes (CLI, dashboards) -- land alongside Phase 5/6.
- Cross-domain ROAST between keep-core and tbtc-signer.

## Test plan

- [ ] Reviewer reads RFC end-to-end.
- [ ] Reviewer flags any phase that should be split further or
  reordered before Phase 1 begins.
- [ ] Reviewer answers the four open questions or marks them
  defer-to-Phase-3.

No code change in this PR, so no CI test run is meaningful beyond
asciidoc rendering.
mswilkison added a commit that referenced this pull request May 22, 2026
…ild tag (#3965)

## Summary

Forward-fix for #3866 CI: the Phase 1B binding file and test
referenced message types defined in \`//go:build frost_native\`
files but were themselves untagged. Untagged staticcheck on
the integration branch (#3866) then reported
\`undefined: nativeFROSTRoundOneCommitmentMessage\` and the
client-lint job failed.

Adds \`//go:build frost_native\` to:

- \`pkg/frost/signing/attempt_context_binding.go\`
- \`pkg/frost/signing/attempt_context_binding_test.go\`

The helpers and tests are only exercised by gated code paths
(the three message-type methods all live behind \`frost_native\`),
so the build tag is the right locus.

## Why now

PRs #3963 (Phase 1A) and #3964 (Phase 1B) were merged into the
\`feat/frost-schnorr-migration-scaffold\` branch before #3866's
integration CI ran. Once the merges landed, #3866's
\`client-lint\` job rebuilt under the untagged staticcheck pass
and exposed the missing tag. This PR is the smallest possible
fix.

## Verification

Locally with module-pinned staticcheck 2025.1.1:

\`\`\`
go build ./...
go build -tags 'frost_native frost_tbtc_signer' ./pkg/frost/...
go test  -tags 'frost_native frost_tbtc_signer' ./pkg/frost/signing/
staticcheck -checks \"-SA1019\" ./... # whole repo, silent
staticcheck -checks \"-SA1019\" ./pkg/frost/signing  # silent
\`\`\`

## Test plan

- [ ] CI green: client-lint, client-vet, client-scan,
  client-build-test-publish all pass.
- [ ] #3866 lint job recovers once this merges into
  \`feat/frost-schnorr-migration-scaffold\`.
mswilkison added a commit that referenced this pull request May 23, 2026
…3988)

## Summary

Closes the **M4 gap** from the original PR #3866 review by adding
the two evidence categories the RFC-21 Phase-2 work left as future
work: **validation-rejection evidence** and **first-write-wins-conflict
evidence**.

With this PR, the \`NextAttempt\` policy can permanently exclude
misbehaving peers on all four ROAST blame channels --
transport-overflow, validation-reject, equivocation-conflict, and
silence -- instead of just overflow + silence.

## Why this matters

A peer that only sends **malformed messages** (validation rejects,
never overflows the channel) was previously indistinguishable from
a silent peer. The transient silence-parking policy would
bench-and-reinstate them indefinitely, never permanently excluding
the malicious behaviour. Same for a peer **equivocating mid-attempt**:
the existing first-write-wins assembly correctly dropped the
conflicting retransmission but only logged the event -- the bundle
carried no structured evidence the coordinator's policy could act
on.

## What lands

### Recorder API

| Surface | Notes |
|---|---|
| \`RecordReject(sender, reason)\` | reason captured verbatim;
per-reason quota counter |
| \`RecordConflict(sender)\` | saturates at conflict quota |
| \`RejectQuotaDefault = 8\`, \`ConflictQuotaDefault = 4\` | matches
RFC-21 Layer A categoryQuota |
| Per-reason quotas independent | peer cannot saturate one reason to
mask another |

### Wire types

| Type | Sort order | Cap |
|---|---|---|
| \`RejectEntry{Sender, Reason, Count}\` | asc by Sender, then asc by
Reason | per-attempt evidence size bounded by Σ quotas |
| \`ConflictEntry{Sender, Count}\` | asc by Sender | per-attempt
evidence size bounded by Σ quotas |

Both fields use \`omitempty\` so pre-PR snapshots round-trip without
the new fields. \`Validate()\` enforces sorted-ascending invariants.

### NextAttempt policy

| Threshold | Value | Source |
|---|---|---|
| \`RejectExclusionThreshold\` | 1 | RFC-21 Layer B ("any non-transport
reject is sufficient cause") |
| \`ConflictExclusionThreshold\` | 1 | A single conflict is byzantine
evidence |

\`computeNextAttempt\` merges \`overflowBlamed\`, \`rejectBlamed\`,
\`conflictBlamed\` into the permanent ExcludedSet. The
\`blamedSenders\` helper is factored out so all three categories
share the deterministic sort + threshold-comparison logic.

### Receive-loop wiring

Three reject sites and three conflict sites updated across the two
files that house the three FROST/tbtc-signer receive loops:

| Site | Was | Now |
|---|---|---|
| \`shouldAcceptNativeFROSTMessage\` returns false | silent drop |
\`evidence.RecordReject(senderID, "validation_gate_rejected")\` + drop |
| First-write-wins conflict in assembly loop | warn log only |
\`evidence.RecordConflict(senderID)\` + warn log |

## Test coverage (15 new cases)

- 7 recorder tests: accumulation, per-reason quota saturation,
per-reason independence, conflict saturation, all-categories-present,
NoOp-inert, RFC-constant assertions
- 5 policy tests: single reject excludes, single conflict excludes,
reject+conflict on different senders, empty evidence (sanity),
threshold-constant assertions
- Receive-loop wiring is covered indirectly by the recorder unit tests;
the NoOp default keeps pre-RFC-21 receive semantics observably unchanged
so no integration-level test is required.

## Verification

| Command | Result |
|---|---|
| \`go build ./...\` + \`go build -tags 'frost_native frost_tbtc_signer
frost_roast_retry' ./...\` | both clean |
| \`go test ./pkg/frost/...\` + race | pass |
| \`go test -tags 'frost_native frost_tbtc_signer frost_roast_retry'
./pkg/frost/...\` | pass (5 packages) |
| \`staticcheck -checks '-SA1019' ./pkg/frost/...\` | silent |
| \`go vet ./pkg/frost/...\` + \`gofmt -l ./pkg/frost/\` | clean |

## RFC-21 status

With this PR, all four ROAST evidence categories are operational.
M4 from the original PR #3866 review is **fully closed**. The
keep-core code arc for RFC-21 is now feature-complete; remaining
work is operations-side (integration testnet, manifest flip).

## Test plan

- [ ] CI green.
- [ ] Reviewer confirms the per-reason quota independence is the right
semantics (alternative: single per-sender reject counter).
- [ ] Reviewer confirms threshold = 1 for both reject and conflict
(alternative: higher to absorb noise; trade-off is faster vs slower
exclusion of misbehaving peers).
mswilkison added a commit that referenced this pull request May 24, 2026
#3993)

## Why

The RFC-21 Phase 6 review decided which orchestration errors are
fallback-eligible (static config errors → safe to fall back to legacy
retry path) and which must hard-fail (runtime per-attempt errors → no
fallback, since per-participant divergence creates split-brain group
fracture). The rationale lived in commit messages, the RFC text, and
inline comments on individual sentinels — distributed enough that a
future maintainer reading just \`roast_retry_orchestration.go\` could
miss the load-bearing constraint.

This PR adds a top-of-file design-rationale block that centralises the
decision in the place that enforces it.

## What changed

- One file changed: \`pkg/frost/signing/roast_retry_orchestration.go\`
- Pure documentation: no behavior change, no test changes, no API change
- 49 lines added (one comment block)

## What it captures

1. **STATIC vs RUNTIME classification** — explicit definitions, with the
sentinel (\`ErrNoRoastRetryCoordinatorRegistered\`) and detection
mechanism (\`errors.Is\` in \`signing_loop_roast_dispatcher.go\`) named.
2. **Why static-error fallback is safe** — every honest signer observes
the same node-local config at startup, so the fallback decision is
deterministic across the group.
3. **Why runtime-error fallback is unsafe** — per-attempt protocol state
errors can be observed by some participants and not others within the
same attempt; fallback would put some operators on new code and others
on legacy for the same attempt.
4. **Enforcement rule** — any error surfaced from this package that is
intended to permit fallback MUST be the sentinel; wrapping ANY runtime
error in the sentinel is a safety regression that PR reviewers should
reject.
5. **Historical redirect** — the earlier design had \`BeginAttempt\`
failures fall back, on the assumption that BeginAttempt was cheap
idempotent setup. Review identified that BeginAttempt mutates
per-attempt state and can fail from races with concurrent receives; the
taxonomy was tightened so only true configuration errors are
fallback-eligible.

## Lineage

Surfaced in the cross-PR review re-evaluation following PR #3866
follow-up landings. Originally tracked as "Document static-vs-runtime
classification canonically" — initially flagged as "available if you
want," now elevated because the rationale was the most important
architectural decision in the RFC-21 stack and is currently the easiest
piece of design context to lose.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
mswilkison added a commit that referenced this pull request May 25, 2026
## Summary
- add FROST WalletRegistry and FrostDkgValidator bindings plus config
and chain attachment
- implement v4 FROST DKG result digest assembly with full vs active
member types and fixture-backed parity tests
- add the native FROST DKG engine boundary, P2P round protocol, result
signing, coordinator lifecycle, challenge monitoring, and wallet ID
handling for x-only output keys

## Notes
- Stacked on #3866 / `feat/frost-schnorr-migration-scaffold`.
- Runtime DKG still requires the concrete native DKG engine registration
from the frost-uniffi-sdk UDL/Rust export work.
- The digest fixture now records the tBTC TypeScript generator source
and regeneration command. A paired tBTC PR should still commit the
mirror fixture at `docs/test-vectors/frost-dkg-result-digest-v1.json`
and add the TS-side emitter/test; until then, the keep-core test
verifies the pinned bytes and metadata but does not compare against a
checked-in tBTC mirror file.

## Validation
- `go test ./pkg/frost/registry ./pkg/chain/ethereum
./pkg/chain/ethereum/frost/gen/...`
- `go test ./pkg/tbtc -run
"TestFrostDKGSignatureThreshold|TestBoundedFrostDKGRecoveryStartBlock|TestFrostDKGRecoveryLookBackBlocks"
-count=1`
- `go test -tags "frost_native frost_tbtc_signer" ./pkg/tbtc -run
"TestLowestLocalActiveMemberIndex|TestFrostMisbehavedMemberIndices|TestFrostDKGSignatureThreshold|TestBoundedFrostDKGRecoveryStartBlock|TestFrostDKGRecoveryLookBackBlocks"
-count=1`
- CI `client-build-test-publish` passes on the prior pushed commit;
rerunning for the latest follow-up commit after push.

## Local Note
- Full local `go test ./pkg/tbtc` currently fails in standalone
`TestWatchCoordinationWindows`; this reproduces when run by itself and
appears unrelated to the FROST DKG coordinator changes.
@coderabbitai

coderabbitai Bot commented May 27, 2026

Copy link
Copy Markdown

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: fdcec10e-dc7e-4046-9c05-1e3e3081f380

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/frost-schnorr-migration-scaffold

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands.

mswilkison added a commit that referenced this pull request Jun 2, 2026
## Summary

Stacked on #3866.

This PR implements Taproot-native key-path wallet signing for the FROST
migration path. It adds P2TR script handling, BIP341 SIGHASH_DEFAULT
computation, BIP340 Schnorr signature verification, and single-element
Taproot witness application in the Bitcoin transaction builder.

The wallet transaction executor now routes all-P2TR transactions through
the Schnorr/Taproot witness path. Mixed Taproot plus legacy inputs are
rejected before signing, so this does not introduce a dual-signing
model.

## Details

- Add P2TR script helpers and x-only output key extraction.
- Add Taproot key-path sighash generation without a repo-wide btcd
upgrade.
- Add `AddTaprootKeyPathSignatures` for 64-byte BIP340 signatures.
- Preserve canonical 32-byte FROST signing messages when `big.Int`
strips leading zero bytes.
- Add builder and wallet tests covering all-P2TR signing and mixed-input
rejection.

## Validation

- `go test ./pkg/bitcoin ./pkg/tbtc`
- `go test -tags=frost_native ./pkg/frost/signing`
## Summary

- route tBTC operator recognition, authorization, and sortition-pool
checks through the FROST wallet registry when FROST authorization is
configured
- use the FROST sortition pool for pool/reward/chaosnet status in FROST
mode
- update generated FROST bindings and DKG result conversion types for
the current contract ABI

## Why

The local FROST testnet run showed that allowlisted operators could be
provisioned on-chain, but keep-core still needed to use the FROST
authorization source and sortition pool instead of the legacy
ECDSA/staking path for operator lifecycle checks.

## Validation

- `go test ./pkg/chain/ethereum ./pkg/tbtc`
## Summary

- register the build-tagged tbtc-signer bridge as the native FROST DKG
engine and the full native FROST signing engine
- add Go request/response adapters for the new interactive Rust ABI from
#4011
- keep old/missing signer builds fail-closed through
`ErrNativeCryptographyUnavailable`
- disable legacy ECDSA DKG execution when the configured pre-parameters
pool size is zero, so FROST-only local/testnet runs do not require an
unused ECDSA preparams pipeline
- reject Schnorr/FROST signing executors for non-Taproot transaction
inputs before threshold signing begins
- add registration coverage plus an optional linked-dylib smoke test
that runs DKG, signing, aggregation, and BIP340 verification when the
Rust symbols are present

## Stack

- base: #4010
- requires Rust signer ABI: #4011

## Testnet rehearsal note

The local FROST testnet reached threshold signing for a real revealed
Bitcoin testnet deposit and produced a native aggregate signature. The
action then failed while applying the signature because the deposit
sweep input was a legacy P2WSH deposit output requiring ECDSA semantics,
not a Taproot key-path wallet input.

This PR now fails closed earlier for that class of mismatch: a
Schnorr/FROST signing executor will not sign non-P2TR transaction
inputs. This does not make legacy deposits sweepable by FROST; it
prevents wasting a threshold signing round and producing a misleading
invalid-signature failure.

## Validation

- `go test -tags 'frost_native frost_tbtc_signer' ./pkg/frost/signing`
-
`CGO_LDFLAGS='-L/private/tmp/keep-core-pr4005-worktree/pkg/tbtc/signer/target/debug
-lfrost_tbtc'
DYLD_LIBRARY_PATH=/private/tmp/keep-core-pr4005-worktree/pkg/tbtc/signer/target/debug
go test -tags 'frost_native frost_tbtc_signer' ./pkg/frost/signing -run
TestBuildTaggedTBTCSignerInteractiveFROSTBridge_WithLinkedSigner
-count=1 -v`
- `KEEP_CORE_FROST_TBTC_SIGNER_ACCEPT_SCAFFOLD_KEY_GROUP=true go test
-tags 'frost_native frost_tbtc_signer' ./pkg/tbtc`
- `go test ./pkg/tbtc -run
'TestWalletTransactionExecutor_SignTransaction_RejectsSchnorrForLegacyInputsBeforeSigning|TestWalletTransactionExecutor_SignTransaction_RejectsMixedTaprootAndLegacyInputsBeforeSigning|TestWalletTransactionExecutor_SignTransaction_AddsTaprootKeyPathWitness|TestDkgExecutor_DisablesECDSAPreParamsWhenPoolSizeZero'`
- `go test -tags frost_native ./pkg/tbtc -run
'TestWalletTransactionExecutor_SignTransaction_RejectsSchnorrForLegacyInputsBeforeSigning|TestWalletTransactionExecutor_SignTransaction_RejectsMixedTaprootAndLegacyInputsBeforeSigning|TestWalletTransactionExecutor_SignTransaction_AddsTaprootKeyPathWitness|TestDkgExecutor_DisablesECDSAPreParamsWhenPoolSizeZero|TestCalculateWalletIDForSigner_FrostUniFFIV2UsesXOnlyOutputKey'`

Note: without
`KEEP_CORE_FROST_TBTC_SIGNER_ACCEPT_SCAFFOLD_KEY_GROUP=true`, the broad
`pkg/tbtc` run fails on existing scaffold-era legacy private-key-share
fixtures, not on this bridge registration change.
mswilkison and others added 3 commits June 26, 2026 12:19
Three defensive fixes at the Go<->Rust signer FFI boundary in
pkg/frost/signing, so a malformed/oversized response or a panic from the native
signer fails one attempt instead of crashing the node or leaking secrets:

- parseBuildTaggedTBTCSignerResult: bounds-check the response buffer length
  before the size_t -> C.int narrowing in C.GoBytes. A length >= 2^31 would
  overflow to a negative value and panic ("length out of range") at the cgo
  boundary, or silently truncate to a wrong length; reject it (the buffer is
  still released by the deferred free).

- the request-call helper: scrub the secret request bytes from the C heap (the
  C.CBytes copy) before C.free. The request can carry signing-share / nonce
  material, and plain C.free does not overwrite; this mirrors the existing
  Go-side zeroBytes hygiene applied to the caller's own copy.

- nativeExecutionFFIExecutorAdapter.Execute: recover panics raised along the
  cgo signing path and surface them as a failed attempt, so a single malformed
  native-signer response cannot take down the signing process. Adds a unit test
  with a panicking primitive (verified to crash the process without the
  recover).

The cgo paths are compile-verified under -tags 'frost_native frost_tbtc_signer
cgo'; the recover is exercised by an untagged unit test. The two cgo-tagged
changes still need a runtime pass in the cgo + linked-libfrost_tbtc
environment.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…usions only

NextAttempt's infeasibility check used the post-parking IncludedSet (which
subtracts transiently-parked and silenced members) to decide PERMANENT session
failure (ErrAttemptInfeasible). Silence-parking has no accuser-quorum gate, so a
single transient mass-silence event -- or one byzantine member that is the
elected coordinator for one attempt and omits snapshots -- could drop the
IncludedSet below threshold and permanently kill a signing session, even though
the original signer set could still complete it. This contradicts the file's own
step-4 contract ("silence parking is strictly transient; a falsely-silenced
honest peer recovers without intervention") and ErrAttemptInfeasible's own doc
("the session can no longer make progress with the original signer set"), and it
routes a permanent failure around the very accuser-quorum defense that exists to
stop a byzantine minority from grinding the group to ErrAttemptInfeasible.

Evaluate feasibility against the permanently-available set (original \\
ExcludedSet): only permanent exclusions (established reject/conflict/
equivocation) can render a session infeasible. The next attempt's IncludedSet is
still feasible \\ parkSet (it may fall below threshold for one attempt; the
parked members are reinstated next attempt). When parking would empty the
IncludedSet, reinstate the parked members rather than producing an
unconstructable empty attempt. Pathological grinding under sustained malicious
silence stays bounded by the outer tBTC signingAttemptsLimit.

The two existing tests asserted the buggy behavior (silence below threshold ->
permanent fail); retarget them to genuine permanent-exclusion infeasibility and
add transient-silence-recovers coverage (single-step, full harness, and a
two-attempt park-then-reinstate cycle).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…proot

signTransaction unconditionally ran the native (Rust cgo) BuildTaprootTx
parity/substitution path at the top of the function -- for EVERY transaction,
including legacy ECDSA redemptions/sweeps/moving-funds. The native builder is a
Taproot builder, so running it for a legacy (non-Taproot) transaction is
meaningless, and a hard error from it (other than ErrNativeCryptographyUnavailable)
would fail the signing of that legacy transaction. In the default build the
native builder is a no-op, so this only bit the frost_native+cgo build, but it is
a latent regression on the legacy signing path.

Gate the native-build/substitution path on unsignedTx.HasOnlyTaprootKeyPathInputs()
-- the same predicate that already governs FROST signing later in the function --
so the native Taproot builder runs only for all-Taproot transactions. Extract the
path into maybeSubstituteNativeBuildTaprootTx.

The substitution LOGIC (observational logging, divergence rejection, matching-IO
acceptance) remains covered directly by the
TestEvaluateNativeUnsignedTransactionForSigning_* tests. Replace the four
legacy-P2PKH substitution-through-signTransaction integration tests (which
asserted the now-removed behaviour) with two gate tests --
SkipsNativeBuildForLegacyTransaction and InvokesNativeBuildForTaprootTransaction --
and retarget the two native-build error-propagation tests to a Taproot
transaction (the only path on which the native build now runs).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…package

pkg/frost/retry/retry.go was a byte-for-byte copy of pkg/tecdsa/retry/retry.go --
two hand-synchronized copies of the security-critical retry participant-selection
algorithm (EvaluateRetryParticipantsForSigning / ...ForKeyGeneration). A fix
applied to one and not the other would silently make ECDSA and FROST select
different qualified-operator sets for the same seed.

This selection is structurally shared across schemes: it is the scheme-agnostic
"pick the base signing group from the ready members" used for the initial attempt
of every signing (and for DKG). FROST's per-attempt robustness diverges only on
retries, via ROAST (NextAttempt); the initial selection does not -- ROAST is a
transition-from-previous function and cannot produce attempt 0, so the initial
selection is permanently shared.

Move the algorithm (byte-for-byte, verified identical) into a neutral package,
pkg/protocol/retry, and point both callers (pkg/tbtc/dkg_loop.go and
signing_loop_legacy_selector.go -- the only two) at it. Delete pkg/tecdsa/retry
and pkg/frost/retry. Keep the superset test (the tecdsa copy had an extra
triplet-seat-count case the frost copy lacked). No behavior change; a single
source eliminates the drift hazard.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…d-set acceptance

When a seat enters lost-sync from a transition bundle for an attempt it never
observed, the triggering bundle is operator-authenticated -- log the sending seat
and the claimed attempt-context hash once per lost-sync episode, so the
operational runbook can attribute and remove/slash a member spamming
bogus-attempt bundles. (This is an authenticated-insider liveness halt the blame
bridge does NOT close, because a never-observed attempt yields no evidence to
attribute it.)

markLostSync now uses CompareAndSwap and returns whether it transitioned, so the
attribution is logged exactly once even while the listener keeps receiving such
bundles. No change to the fail-closed semantics.

Also document that this residual is accepted under the PERMISSIONED operator set
(attributable + liveness-only + governance-removable) and MUST be revisited
before any move to a permissionless set, where the f+1 snapshot-corroboration /
resync fix would be warranted instead of accepting it.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
During the ECDSA->FROST migration an operator is a member of BOTH sortition
pools at once: existing ECDSA wallets keep draining via redemptions while FROST
is live. The seven sortition.Chain methods on TbtcChain switch on
hasFrostAuthorization() (true whenever FROST is configured), and a single
MonitorPool loop consumed them -- so the loop labeled "legacy ECDSA sortition
pool monitoring" actually maintained the FROST pool during overlap, and
post-cutover (DisableLegacyECDSA=true) the only loop stopped, leaving the FROST
pool -- the one new FROST wallet DKG selects from -- unmonitored.

Bind monitoring explicitly per pool. Two sortition.Chain views
(ecdsaSortitionChain, frostSortitionChain) route directly to their own
registry/pool with no hasFrostAuthorization() switch, and the node runs one
MonitorPool loop per pool:
- ECDSA loop: existing flags + policy, now correctly ECDSA-bound.
- FROST loop: new DisableFrostSortitionPoolMonitoring flag (default-on), beta
  policy only (the ECDSA pre-params gate does not apply to FROST DKG), gated on
  FROST being configured and INDEPENDENT of DisableLegacyECDSA so the FROST pool
  stays monitored during the drain and after the legacy pool is retired.

The operator is not necessarily registered in both pools, so the FROST loop
treats sortition.ErrOperatorUnknown (now exported) as non-fatal: it warns and
leaves FROST monitoring inactive rather than aborting node startup. The legacy
loop keeps its existing fail-fast (the operator is ECDSA-registered during the
drain). TbtcChain's own sortition.Chain methods are left unchanged, so heartbeat
and other callers are unaffected; GetOperatorID stays ECDSA-bound. Both loops'
join/update txs already share tc.transactionMutex + tc.nonceManager, so they
cannot race on the operator account nonce.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
mswilkison and others added 16 commits June 27, 2026 10:44
…uted DKG (#4110)

Stacks on the FROST/ROAST readiness branch (#3866). Adds two
separate-process ("shape B") real-crypto e2es over **real libp2p**,
complementing the in-process shape-A multinode test. Each re-execs the
test binary as N worker processes that each link `libfrost_tbtc`.

## What's added

- **shape-B** (`roast_shapeb_libp2p_multiproc_e2e_…`): dealer DKG run
once in a bootstrap subprocess, the encrypted key group copied into each
worker's own state dir; every worker drives the **ROAST
interactive-signing runner** over real libp2p and independently
aggregates the same BIP-340 signature (n winners, vs the shared-engine
shape-A's one). Covers per-node engine/state isolation + the libp2p
outer framing for the runner + ROAST + transport seam.
- **distributed-DKG** (`roast_distributed_dkg_libp2p_multiproc_e2e_…`):
every worker runs the **real distributed FROST DKG** (`part1/2/3`) over
libp2p, with round-2 per-recipient secret shares **sealed via secp256k1
ECDH + AES-256-GCM** (cleartext round-2 over a broadcast bus would let
any node sum `f_i(j)` and reconstruct a peer's share), so each node
holds **only its own key package**; then threshold-signs via the
stateless low-level path. Closes the dealer-DKG key-custody gap end to
end.

## Verification

Both green standalone (stable across repeated runs) and under the full
cgo gate, linking `libfrost_tbtc` built from the pinned signer ref
(`ci/frost-signer-pin.env` = `6e3718ba0`, unchanged):

```
CGO_ENABLED=1 KEEP_CORE_FROST_REQUIRE_CGO=true \
  go test -tags "frost_native frost_tbtc_signer" -run TestRealCgoInteractiveSigning ./pkg/frost/signing/
```

All 6 real-crypto tests pass together on this branch (4 shape-A +
shape-B + distributed-DKG).

## CI

The `frost-cgo-integration` gate already exercises both new tests — its
`-run 'TestRealCgoInteractiveSigning'` matches them by prefix, and the
pinned crate already exports the `dkg_part1/2/3` + low-level sign FFI
they use. No workflow change needed. (Heads-up: these are multi-process
+ gossipsub tests; robust locally via retransmission + warmup, but
mesh-convergence time varies — if shape-B ever flakes on a slow runner,
move it to a non-blocking job.)

## Note

Building the distributed-DKG test confirmed `dkg_part1/2/3` interoperate
over the transport — but the node's DKG path
(`executeTBTCSignerFROSTDKG`) still uses the dealer `RunDKGWithSeed`.
Wiring distributed DKG into the node remains the readiness-gate-blocking
work.

🤖 Generated with [Claude Code](https://claude.com/claude-code)
…sient parking

After the transient-silence feasibility fix, NextAttempt's infeasibility test is
on the non-excluded (feasible) set, not the post-parking IncludedSet: a next
IncludedSet can fall below threshold due to transient parking without returning
ErrAttemptInfeasible (parked members reinstate next attempt). Update the
ErrAttemptInfeasible doc + the NextAttempt step-6 contract (and the now-misleading
'included set below threshold' error string) to match.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…lags

#4120 added Config.Tbtc.DisableFrostSortitionPoolMonitoring (default-on FROST
monitoring) but initTbtcFlags did not bind a CLI flag for it, unlike the legacy
opt-out -- so operators starting the node via CLI flags could not exercise the
advertised opt-out. Bind --tbtc.disableFrostSortitionPoolMonitoring and cover it
in the flags test.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ror type

GetWallet derived a legacy wallet ID on ANY error from the canonical walletID
accessor. For a FROST wallet on a canonical Bridge, a transient call failure
would silently yield the left-padded legacy ID, and callers use
WalletChainData.WalletID to choose P2TR (FROST) vs P2WPKH (legacy) scripts, so it
would build or search the wrong wallet script.

The error type cannot reliably tell a legacy on-chain Bridge -- where the walletID
eth_call returns a normal RPC/ABI error even with the current binding -- from a
transient failure, so distinguishing by error is fragile and breaks legacy
deployments (Codex P1 on the first revision of this PR). Route the fallback by
SCHEME instead, using the wallet's ECDSA wallet ID (zero => FROST, non-zero =>
legacy ECDSA), which GetWallet already has:

  - Legacy ECDSA wallet: its canonical wallet ID equals the legacy derivation, so
    fall back on any accessor error (and it is the only option on a legacy Bridge
    lacking the accessor).
  - FROST wallet: requires the canonical ID; surface the error rather than return
    a wrong legacy ID. A FROST wallet only exists on a canonical-ID Bridge, so the
    error is genuinely transient.

Extracted into resolveWalletID with TestResolveWalletID covering all four cases.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
## Summary

Follow-up to #3866 (Go FROST/ROAST coordinator). Three defensive fixes
at the cgo Go↔Rust signer FFI boundary in `pkg/frost/signing`, so a
malformed/oversized response — or a panic — from the native signer fails
a single attempt instead of crashing the node or leaking secrets.

## Fixes

1. **Bounds-check the response length before `C.GoBytes`**
(`parseBuildTaggedTBTCSignerResult`). The Rust-supplied `buffer.len`
(`C.size_t`) was narrowed to `C.int` with no check: a length `≥ 2³¹`
overflows to a negative value → `C.GoBytes` panics (`length out of
range`) at the cgo boundary, or silently truncates to a wrong length.
Now rejected with a clear error (the buffer is still freed by the
deferred free).

2. **Zeroize the secret request bytes on the C heap before `C.free`**
(the request-call helper). `C.CBytes(requestPayload)` copies the request
(which can carry signing-share / nonce material) to the C heap; plain
`C.free` does not overwrite, so the secret lingered in freed memory. Now
scrubbed via the existing `zeroBytes`, mirroring the Go-side hygiene
already applied to the caller's own copy.

3. **`recover()` at the FFI boundary**
(`nativeExecutionFFIExecutorAdapter.Execute`). A panic anywhere along
the cgo signing path (e.g. fix #1's overflow panic, or a nil-deref
decoding a malformed engine response) previously took down the whole
signing process. It's now converted to a failed attempt the outer tBTC
`signingRetryLoop` handles cleanly.

## Tests / verification

-
`TestNativeExecutionFFIExecutorAdapter_Execute_RecoversCgoBoundaryPanic`
— a panicking primitive; verified to **crash the process without the
recover** and pass with it. Full untagged `pkg/frost/signing` suite
stays green.
- The cgo-tagged changes (#1, #2) are **compile-verified** under `-tags
'frost_native frost_tbtc_signer cgo'` (build + `go vet`). They still
need a **runtime pass in the cgo + linked-`libfrost_tbtc` environment**
— I can't exercise the real FFI here.

Addresses the cgo-boundary cluster from the review of #3866
(length-narrowing crash, secret-in-C-heap, missing recover). _Found
during review of #3866._

🤖 Generated with [Claude Code](https://claude.com/claude-code)
…usions only (#4116)

## Summary

Follow-up to #3866 (Go FROST/ROAST coordinator). Fixes a robustness bug
in the ROAST `NextAttempt` policy where a **transient** event could
**permanently** kill a signing session.

`computeNextAttempt`'s infeasibility check used the post-parking
`IncludedSet` — which subtracts transiently-parked *and silenced*
members — to decide permanent session failure (`ErrAttemptInfeasible`).
Silence-parking deliberately has **no** accuser-quorum gate (it's meant
to be strictly transient), so a single transient mass-silence event — or
one byzantine member that is the elected coordinator for one attempt and
omits snapshots — could drop the `IncludedSet` below threshold and
permanently fail the session, even though the original signer set could
still complete it.

This contradicts the file's own design contract and hands a *single*
byzantine member (elected coordinator for one attempt) the power to
permanently kill a session — exactly the grind-to-`ErrAttemptInfeasible`
outcome the accuser-quorum machinery exists to prevent, routed around
via the ungated silence path:

- **step 4:** "Silence parking (strictly transient)… the attempt after
that automatically reinstates them, so a falsely-silenced honest peer
recovers without intervention."
- **`ErrAttemptInfeasible` doc:** returned when "the session can no
longer make progress **with the original signer set**."

## Fix

Evaluate feasibility against the **permanently-available set**
(`original \ ExcludedSet`): only permanent exclusions (established
reject / conflict / coordinator-equivocation) can render a session
infeasible. The next attempt's `IncludedSet` is still `feasible \
parkSet` — it may fall below threshold for one attempt, but the parked
members are reinstated next attempt (burning one attempt instead of
failing the session). When parking would *empty* the `IncludedSet`, the
parked members are reinstated now rather than producing an
unconstructable empty attempt.

Pathological grinding under *sustained* malicious silence stays bounded
by the outer tBTC `signingAttemptsLimit` — the inner chain no longer
needs a permanent-fail to stop grinding.

## Tests

The two existing tests asserted the buggy behavior (silence below
threshold → permanent fail, using a degenerate n-of-n config).
Retargeted to genuine infeasibility and added recovery coverage:

- `TestNextAttempt_InfeasibilityWhenPermanentExclusionsBelowThreshold` —
permanent exclusion below threshold still fails (correctly).
- `TestSoak_TransientSilenceBelowThresholdRecovers` — full
multi-coordinator harness: silence does **not** fail; silenced members
are parked, not excluded.
- `TestNextAttempt_TransientSilenceBelowThresholdDoesNotPermanentlyFail`
— unit: silence parks (transient), never excludes (permanent).
- `TestNextAttempt_TransientSilenceRecoversAcrossTwoAttempts` —
end-to-end: silence → park → reinstate, included set returns to full.

The "original signer set preserved" invariant (`|Inc|+|Exc|+|Park|` =
original size) holds in both branches. `gofmt`, `go vet`, and the full
`pkg/frost/roast` suite pass; builds clean untagged and under `-tags
'frost_native frost_tbtc_signer cgo frost_roast_retry'`.

_Found during review of #3866._

🤖 Generated with [Claude Code](https://claude.com/claude-code)
…proot (#4117)

## Summary

Follow-up to #3866 (Go FROST/ROAST coordinator). Gates the native (Rust
cgo) `BuildTaprootTx` parity/substitution path so it runs only for
Taproot transactions.

`walletTransactionExecutor.signTransaction` ran the native
`BuildTaprootTx` path **unconditionally**, at the very top, for
**every** transaction a wallet signs — including legacy ECDSA
redemptions / sweeps / moving-funds — before any scheme/Taproot check.
The native builder is a Taproot builder, so running it for a legacy
(non-Taproot) transaction is meaningless, and a hard error from it
(other than `ErrNativeCryptographyUnavailable`) would fail the signing
of that legacy transaction. In the default build the native builder is a
no-op (`("", nil)`), so this only bit the `frost_native+cgo` build — but
it's a latent regression on the legacy signing path.

## Fix

Gate the native-build/substitution path on
`unsignedTx.HasOnlyTaprootKeyPathInputs()` — the **same predicate that
already governs FROST signing** later in the same function (`"cannot
apply FROST signatures to non-taproot transaction inputs"`). The native
Taproot builder now runs only for all-Taproot transactions; legacy
transactions skip it. The path is extracted into
`maybeSubstituteNativeBuildTaprootTx`.

> Gate signal chosen per a Codex second opinion: the tx shape
(`HasOnlyTaprootKeyPathInputs`), not the wallet scheme — "the native
builder's applicability is about the unsigned tx it is asked to build."

## Tests

The four legacy-P2PKH substitution-through-`signTransaction` integration
tests asserted the now-removed behaviour (substitution for a legacy tx).
They are replaced with:

- `SkipsNativeBuildForLegacyTransaction` — a legacy (P2PKH) tx: the
native build is **not** invoked even with substitution enabled, and the
tx signs via the Go path. (Verified to fail if the gate is removed.)
- `SubstitutesNativeBuildForTaprootTransaction` — an all-Taproot tx: the
native build **is** invoked, and a matching native tx is substituted
(`ReplaceUnsignedTransaction` + the substitution info log) and signed
with a Taproot witness.

The two native-build error-propagation tests are retargeted to a Taproot
tx (the only path on which the native build now runs), via a shared
`buildTaprootKeyPathUnsignedTxForTest` helper. The substitution
**logic** remains covered directly by the
`TestEvaluateNativeUnsignedTransactionForSigning_*` tests.

gofmt + `go vet` clean; full untagged `pkg/tbtc` suite passes; builds
clean under `-tags 'frost_native frost_tbtc_signer cgo'`.

_Found during review of #3866._

🤖 Generated with [Claude Code](https://claude.com/claude-code)
…package (#4118)

## Summary

Follow-up to #3866. Eliminates a byte-for-byte duplicate of the retry
participant-selection algorithm by moving it into a shared,
scheme-neutral package.

`pkg/frost/retry/retry.go` was a **byte-for-byte copy** of
`pkg/tecdsa/retry/retry.go` — two hand-synchronized copies of the
security-critical retry participant-selection algorithm
(`EvaluateRetryParticipantsForSigning` / `…ForKeyGeneration`). A fix
applied to one and not the other would silently make ECDSA and FROST
select **different** qualified-operator sets for the same seed.

## Why a shared package (vs. one importing the other)

This selection is **structurally shared** across schemes: it's the
scheme-agnostic "pick the base signing group from the ready members"
used for the **initial attempt of every signing** (and for DKG). FROST's
per-attempt robustness diverges only on **retries**, via ROAST
(`NextAttempt`) — the initial selection does not. ROAST is a
transition-from-previous-attempt function and *cannot* produce attempt
0, so `roastSigningParticipantSelector` falls back to this selection for
the initial attempt (`ConsumeRoastTransitionForSelection` returns
`ErrRoastSelectionFallBackToLegacy` at `roastAttemptNumber == 0`). So
the initial selection is **permanently shared** — neither `tecdsa` nor
`frost` should own it.

## Change

- Move the algorithm **byte-for-byte** (verified `diff`-identical, no
logic change) into a new neutral package `pkg/protocol/retry` (package
`retry`, alongside `pkg/protocol/group`).
- Repoint the **only two** callers — `pkg/tbtc/dkg_loop.go` (DKG) and
`pkg/tbtc/signing_loop_legacy_selector.go` (FROST initial selection) —
at it (import-path change only; usage `retry.X` unchanged).
- Delete `pkg/tecdsa/retry` and `pkg/frost/retry`.
- Keep the **superset** test (the `tecdsa` copy had a
`TestExcludeOperatorTripletsCountsRightOperatorSeats` case the `frost`
copy lacked).
- Update three doc comments that referenced the old path.

## Verification

- `diff` confirms the moved algorithm is byte-identical to the original
(pure move).
- `go build ./...` clean (no dangling references to the deleted
packages); builds clean under `-tags 'frost_native frost_tbtc_signer cgo
frost_roast_retry'`.
- gofmt + `go vet` clean; the moved `pkg/protocol/retry` test passes
(incl. the superset case); the full `pkg/tbtc` suite passes.

No behavior change — a single source eliminates the silent-drift hazard.

_Found during review of #3866._

🤖 Generated with [Claude Code](https://claude.com/claude-code)
…d-set acceptance (#4119)

## Summary

Follow-up to #3866. A small, behavior-preserving hardening + decision
record for the ROAST transition-exchange lost-sync path (review finding
#2).

**Background.** In `onBundle`, a seat that receives a transition bundle
for an attempt it **never observed** trips `markLostSync()` — failing
the wallet's signing retry loop closed — *before* any verification (a
behind-seat lacks the observe handle full `VerifyBundle` needs). So an
authenticated group member can broadcast a structurally-valid bundle
with a bogus, never-committed attempt hash and halt every honest seat's
signing. The blame bridge (PR2b-2) does **not** close this: a
never-committed attempt produces no evidence to attribute the sender.

**Decision (after Codex review + threat analysis).** Under the
**permissioned** operator set this residual is **accepted**: it's
liveness-only (fail-closed — never an unsafe/divergent signature), the
triggering seat is operator-authenticated (so attribution is immediate),
and a misbehaving operator is governance-removable + economically
deterred. The proper fix (f+1 snapshot-corroboration, or a resync state)
is a real ROAST protocol change whose simple form can fracture the group
on legitimate *sparse-failure* bundles — disproportionate while the set
stays permissioned.

## Change

- **Attribution logging.** When a seat enters lost-sync from an
unobserved-attempt bundle, log the **sending seat** and the **claimed
attempt-context hash** — once per lost-sync episode — so the operational
runbook can identify and remove/slash a member spamming bogus-attempt
bundles.
- `markLostSync` now uses `CompareAndSwap` and returns whether it
transitioned, so the attribution is logged exactly once even while the
listener keeps receiving such bundles. **No change to the fail-closed
semantics.**
- **Decision record in-code:** documents the permissioned-set acceptance
and marks it as a **hard item to revisit before any move to a
permissionless operator set**, where an
anonymous/costless/non-attributable DoS would warrant the
corroboration/resync fix.

## Verification

`gofmt` + `go vet` clean; builds under `-tags 'frost_native
frost_roast_retry cgo frost_tbtc_signer'`; the transition-exchange /
lost-sync / bundle tests pass. The only caller of `markLostSync` is the
updated site.

_Found during review of #3866._

🤖 Generated with [Claude Code](https://claude.com/claude-code)
…4120)

## Summary

Follow-up to #3866 (review finding #3). Fixes sortition-pool monitoring
for the dual-pool window of the ECDSA→FROST migration.

During the migration an operator is a member of **both** sortition pools
at once: existing ECDSA wallets keep draining via redemptions while
FROST is live. The seven `sortition.Chain` methods on `TbtcChain` switch
on `hasFrostAuthorization()` (true whenever FROST is configured), and a
**single** `MonitorPool` loop consumed them — so:

- **During overlap** (`DisableLegacyECDSA=false`): the loop labeled
*"legacy ECDSA sortition pool monitoring"* actually maintained the
**FROST** pool; the ECDSA pool got no maintenance.
- **Post-cutover** (`DisableLegacyECDSA=true`): the only loop stopped
entirely, leaving the **FROST** pool — the one new FROST wallet DKG
selects from — **unmonitored**.

> The ECDSA drain itself is unaffected either way: signing an existing
wallet uses the locally-held key share + the wallet's fixed roster,
never sortition-pool state. This is a
pool-membership/selection-eligibility fix, not a fund-availability one.

## Change

Bind monitoring **explicitly per pool**. Two `sortition.Chain` views
(`ecdsaSortitionChain`, `frostSortitionChain`) route directly to their
own registry/pool with **no** `hasFrostAuthorization()` switch, and the
node runs **one `MonitorPool` loop per pool**:

- **ECDSA loop** — existing flags + policy, now correctly ECDSA-bound
(data path *and* beta policy read the ECDSA pool).
- **FROST loop** — new `DisableFrostSortitionPoolMonitoring` flag
(**default-on**), beta policy only (the ECDSA pre-params gate doesn't
apply to FROST DKG), gated on FROST being configured and **independent
of `DisableLegacyECDSA`** so the FROST pool stays monitored during the
drain *and* after the legacy pool is retired.

The operator isn't necessarily registered in both pools, so the FROST
loop treats `sortition.ErrOperatorUnknown` (now exported) as
**non-fatal** — it warns and leaves FROST monitoring inactive rather
than aborting startup. The legacy loop keeps its existing fail-fast (the
operator is ECDSA-registered during the drain).

`TbtcChain`'s own `sortition.Chain` methods are **left unchanged**, so
heartbeat and other callers are unaffected; `GetOperatorID` stays
ECDSA-bound. Both loops' join/update txs already share
`tc.transactionMutex` + `tc.nonceManager`, so they cannot race on the
operator account nonce.

### Design input baked in (per maintainer + Codex)
- FROST loop join policy = beta-operator only (no FROST equivalent of
the ECDSA pre-params gate; FROST DKG readiness is announced separately).
- Operator compensation is out-of-band, so ECDSA-pool staleness during
the drain is benign; the FROST loop is the load-bearing one.
- `GetOperatorID` asymmetry preserved (separate `GetFrostOperatorID`
exists).

## Known limitation
`MonitorPool` (shared with the beacon) hard-returns at startup without
starting its ticker, so an operator that registers for FROST *after*
node start needs a restart to begin FROST monitoring. Logged clearly;
deliberately not changing the shared `MonitorPool` here.

## Tests / verification
- New `tbtc_sortition_chain_views_test.go` pins each view to its
intended pool (legacy→ECDSA, FROST→FROST, unconfigured→nil) —
**negative-checked** (mis-binding the legacy view to the FROST pool
makes it fail).
- gofmt + `go vet` clean; untagged `go build ./...` clean; builds under
`-tags 'frost_native frost_tbtc_signer cgo frost_roast_retry'`;
`pkg/sortition` + `pkg/chain/ethereum` tests pass; full `pkg/tbtc` suite
passes.
- Adversarial review (3 angles → verify): 0 confirmed findings; adapter
fidelity confirmed line-for-line across all 13 methods; cross-loop nonce
safety confirmed.

_Found during review of #3866._

🤖 Generated with [Claude Code](https://claude.com/claude-code)
…ror type (#4122)

## Summary

Addresses a Codex finding (relayed via the #4119 review):
`TbtcChain.GetWallet` derived a **legacy** wallet ID on **any** error
from the canonical `walletID` accessor. For a FROST wallet on a
canonical Bridge, a transient call failure would silently yield the
left-padded legacy ID — and callers use `WalletChainData.WalletID` to
choose **P2TR (FROST)** vs **P2WPKH (legacy)** scripts, so the node
would build or search the **wrong wallet script**.

## Why route by scheme (revised after Codex P1)

The first revision distinguished by error type (a sentinel for the
missing accessor, surface everything else). Codex correctly flagged a
**P1 regression**: a *legacy on-chain Bridge* built with the *current*
generated binding still satisfies the accessor interface, so its missing
`walletID` function returns a normal RPC/ABI error — not the sentinel —
and that revision would surface it and **break `GetWallet` on exactly
the legacy deployments the fallback exists for**. Error type cannot
reliably separate "function absent on-chain" from "transient."

So this routes by **scheme**, using the wallet's `EcdsaWalletID` (which
`GetWallet` already reads, and which the codebase already uses to infer
scheme — zero ⇒ FROST):

- **Legacy ECDSA wallet** (`EcdsaWalletID != 0`): its canonical wallet
ID *equals* its legacy derivation, so fall back on **any** accessor
error — and it's the only option on a legacy Bridge lacking the
accessor.
- **FROST wallet** (`EcdsaWalletID == 0`): requires the canonical ID;
**surface** the error rather than return a wrong legacy ID. A FROST
wallet only exists on a canonical-ID Bridge, so such an error is
genuinely transient.

Logic is extracted into `resolveWalletID(bridge, walletPublicKeyHash,
ecdsaWalletID)`.

## Tests

`TestResolveWalletID` covers all four cases: accessor success →
canonical; FROST + accessor error → surfaced; **legacy + accessor error
→ legacy fallback** (the P1 regression guard — verified to fail if the
routing surfaces errors for legacy wallets); legacy + missing-accessor
binding → legacy fallback. gofmt + `go vet` clean; full
`pkg/chain/ethereum` suite passes.

_Found during the Codex review batch on #4115#4120; revised per the
Codex P1 re-review._

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

🤖 Generated with [Claude Code](https://claude.com/claude-code)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant