Speed up DIP3 and Spark reindex validation safely#1866
Conversation
Keep non-wallet VerifyDB paths from honoring fVerifySparkSpendProof=false for Spark spend proofs, while preserving wallet DB load behavior by marking that call as wallet validation. Extend the Spark tampered-proof regression test to cover both paths.
Do not mark non-stateful CheckBlock results as fully cached for blocks containing Spark spends. This keeps default proof verification and VerifyDB behavior intact while preventing soft-passed Spark proof-input cases from poisoning block.fChecked.\n\nAdd Spark regressions for VerifyDB cache bypass, wallet-load cheap validation with proof skipping, and BatchProofContainer collection around tampered spends.
|
@coderabbitai review |
WalkthroughThis PR refactors Spark spend proof verification to support deferred, batched, and context-scoped validation via a new ChangesSpark Proof Pipeline and DIP3 Cache Fixes
Sequence Diagram(s)sequenceDiagram
participant AcceptBlock
participant CheckBlock
participant CheckTransaction
participant ConnectBlock
participant SparkBatchCleanup
participant BatchProofContainer
participant ConnectBlockSpark
participant VerifyDB
AcceptBlock->>AcceptBlock: CanDeferSparkSpendProofVerificationOnImport(...)
AcceptBlock->>CheckBlock: CheckBlock(..., fDeferSparkSpendProofVerification)
CheckBlock->>CheckTransaction: per-tx, fVerifySparkSpendProof, sparkSpendProofResult
CheckTransaction-->>CheckBlock: SparkSpendProofVerificationResult (Verified/Deferred/Batched)
CheckBlock-->>AcceptBlock: fChecked only if all proofs complete
AcceptBlock->>ConnectBlock: ConnectBlock(..., fVerifyDB, sparkValidationContext)
ConnectBlock->>SparkBatchCleanup: construct (RAII cleanup guard)
ConnectBlock->>CheckTransaction: fVerifySparkSpendProof=true, sparkValidationContext
CheckTransaction->>BatchProofContainer: collect batched proof
ConnectBlock->>BatchProofContainer: finalize() + verify()
alt batch verify fails
BatchProofContainer-->>ConnectBlock: exception
ConnectBlock->>ConnectBlock: DoS bad-txns-sparkproof
else batch verify ok
ConnectBlock->>SparkBatchCleanup: dismiss()
ConnectBlock->>ConnectBlockSpark: sparkValidationContext
ConnectBlockSpark->>ConnectBlockSpark: route state through CSparkValidationContext
end
VerifyDB->>VerifyDB: ScopedCacheRestorer (scopes MN cache mutations)
VerifyDB->>VerifyDB: build Spark validation context
VerifyDB->>ConnectBlock: level-4 reconnect with sparkValidationContext
Estimated code review effort🎯 5 (Critical) | ⏱️ ~120 minutes Possibly related PRs
Suggested reviewers
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
@coderabbitai review |
✅ Action performedReview finished.
|
There was a problem hiding this comment.
Actionable comments posted: 4
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/libspark/test/grootle_test.cpp (1)
65-71:⚠️ Potential issue | 🟠 Major | ⚡ Quick winFix undefined writes in legacy root construction paths.
Line 69 and Line 143 call
reserve()and then write viaroot.data()without sizing the vector first. That is undefined behavior and can make tests flaky.🛠️ Proposed fix
- Scalar temp; - temp.randomize(); - std::vector<unsigned char> root; - root.reserve(SCALAR_ENCODING); - temp.serialize(root.data()); - roots.emplace_back(root); + roots.emplace_back(random_root());Apply the same replacement in both loops.
Also applies to: 138-144
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/libspark/test/grootle_test.cpp` around lines 65 - 71, The code in the legacy root construction paths calls reserve() on the root vector and then writes to root.data() without properly sizing the vector, which causes undefined behavior. Replace the reserve() call with resize() passing SCALAR_ENCODING as the argument to properly set the vector size before writing data via root.data(). This fix needs to be applied in both locations mentioned: the first occurrence around lines 65-71 where temp.serialize(root.data()) is called, and the second occurrence around lines 138-144 where the same pattern occurs.
🧹 Nitpick comments (2)
src/spark/state.cpp (1)
1000-1009: 💤 Low valueCache eviction clears entire cache on limit.
The current strategy clears the entire proof cache when it reaches 10,000 entries. During sustained reindex of Spark-heavy blocks, this could cause temporary performance degradation from repeated cache thrashing.
Consider an LRU eviction policy or removing the oldest entries instead of clearing the entire cache, though this is acceptable for the current optimization goals.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/spark/state.cpp` around lines 1000 - 1009, In the cache management block where gCheckedSparkSpendTransactions reaches MAX_CACHED_SPARK_SPEND_PROOFS, replace the current clear() operation with an LRU (Least Recently Used) eviction strategy that removes only the oldest entry instead of clearing the entire cache. This prevents the cache thrashing issue during sustained reindex operations by maintaining a partial cache rather than losing all entries at once. Implement this by removing just one or a few of the least recently used entries from gCheckedSparkSpendTransactions instead of calling clear() on the entire collection.src/validation.cpp (1)
2711-2725: ⚡ Quick winApply Linux brace style to changed definitions.
struct SparkBatchCleanupandCheckBlockplace the opening brace on the declaration line. As per coding guidelines, C++ code should use Linux brace style: new line for classes and function definitions.♻️ Proposed style fix
- struct SparkBatchCleanup { + struct SparkBatchCleanup + {-bool CheckBlock(const CBlock& block, CValidationState& state, const Consensus::Params& consensusParams, bool fCheckPOW, bool fCheckMerkleRoot, int nHeight, bool isVerifyDB, bool fDeferSparkSpendProofVerification) { +bool CheckBlock(const CBlock& block, CValidationState& state, const Consensus::Params& consensusParams, bool fCheckPOW, bool fCheckMerkleRoot, int nHeight, bool isVerifyDB, bool fDeferSparkSpendProofVerification) +{Also applies to: 4201-4201
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/validation.cpp` around lines 2711 - 2725, The struct SparkBatchCleanup definition does not follow Linux brace style where the opening brace should be placed on a new line instead of at the end of the struct declaration line. Move the opening brace of struct SparkBatchCleanup to a new line so the struct name and opening brace are on separate lines. Apply the same formatting fix to the CheckBlock struct definition mentioned at line 4201 to maintain consistent code style throughout the file.Source: Coding guidelines
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/init.cpp`:
- Around line 767-788: The early return statements in the reindex cleanup block
do not signal the genesis wait condition variable before exiting, which can
cause deadlock if the genesis wait path is waiting on condvar_GenesisWait.
Before each of the four early return statements (after ActivateBestChain fails,
after ShutdownRequested check, after VerifyPendingSparkBatch fails, and after
WriteReindexing fails), ensure the genesis wait condition variable is signaled
to wake up any threads waiting on it. This prevents the initialization from
blocking indefinitely when the reindex cleanup path exits early.
In `@src/test/spark_tests.cpp`:
- Around line 834-837: The MobileArgGuard struct's destructor is hard-coding the
`-mobile` argument to "0" instead of restoring its original state, which causes
test pollution. Modify the MobileArgGuard struct to capture the original value
of `-mobile` in the constructor (before setting it to "1"), then restore that
original value in the destructor instead of forcing it to "0". This ensures the
global argument state is properly restored to its pre-test condition rather than
leaving it in a fixed state that affects subsequent tests.
In `@src/validation.cpp`:
- Around line 2846-2851: After the successful verify() call in the
batchProofContainer verification block, call batchProofContainer->clear() before
dismissing the sparkBatchCleanup guard. This ensures that the temporary proof
state variables (fCollectProofs, tempSparkTransactions, and sparkTransactions)
are properly reset after verification completes, preventing state contamination
for subsequent validations or flush batches.
In `@src/validation.h`:
- Around line 55-58: The namespace spark block in validation.h (lines 55-58) is
missing a forward declaration for CSparkValidationContext, which is used as a
pointer type at lines 427 and 559. Add a forward declaration for class
CSparkValidationContext in the existing namespace spark block alongside the
other forward declarations (CSparkTxInfo and SparkSpendProofVerificationResult)
to resolve the compile failures in translation units that include validation.h
without transitively including spark/state.h.
---
Outside diff comments:
In `@src/libspark/test/grootle_test.cpp`:
- Around line 65-71: The code in the legacy root construction paths calls
reserve() on the root vector and then writes to root.data() without properly
sizing the vector, which causes undefined behavior. Replace the reserve() call
with resize() passing SCALAR_ENCODING as the argument to properly set the vector
size before writing data via root.data(). This fix needs to be applied in both
locations mentioned: the first occurrence around lines 65-71 where
temp.serialize(root.data()) is called, and the second occurrence around lines
138-144 where the same pattern occurs.
---
Nitpick comments:
In `@src/spark/state.cpp`:
- Around line 1000-1009: In the cache management block where
gCheckedSparkSpendTransactions reaches MAX_CACHED_SPARK_SPEND_PROOFS, replace
the current clear() operation with an LRU (Least Recently Used) eviction
strategy that removes only the oldest entry instead of clearing the entire
cache. This prevents the cache thrashing issue during sustained reindex
operations by maintaining a partial cache rather than losing all entries at
once. Implement this by removing just one or a few of the least recently used
entries from gCheckedSparkSpendTransactions instead of calling clear() on the
entire collection.
In `@src/validation.cpp`:
- Around line 2711-2725: The struct SparkBatchCleanup definition does not follow
Linux brace style where the opening brace should be placed on a new line instead
of at the end of the struct declaration line. Move the opening brace of struct
SparkBatchCleanup to a new line so the struct name and opening brace are on
separate lines. Apply the same formatting fix to the CheckBlock struct
definition mentioned at line 4201 to maintain consistent code style throughout
the file.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: a5d1673d-a5aa-4e67-b568-323429fcd23f
📒 Files selected for processing (25)
qa/rpc-tests/dip3-deterministicmns.pysrc/batchproof_container.cppsrc/batchproof_container.hsrc/dsnotificationinterface.cppsrc/evo/cbtx.cppsrc/evo/cbtx.hsrc/evo/deterministicmns.cppsrc/evo/deterministicmns.hsrc/evo/specialtx.cppsrc/init.cppsrc/libspark/grootle.cppsrc/libspark/spend_transaction.cppsrc/libspark/test/grootle_test.cppsrc/secp256k1/src/cpp/MultiExponent.cppsrc/spark/state.cppsrc/spark/state.hsrc/sparkname.cppsrc/sparkname.hsrc/test/evo_deterministicmns_tests.cppsrc/test/evo_simplifiedmns_tests.cppsrc/test/spark_tests.cppsrc/test/sparkname_tests.cppsrc/validation.cppsrc/validation.hsrc/wallet/walletdb.cpp
| struct MobileArgGuard { | ||
| MobileArgGuard() { ForceSetArg("-mobile", "1"); } | ||
| ~MobileArgGuard() { ForceSetArg("-mobile", "0"); } | ||
| } mobileArgGuard; |
There was a problem hiding this comment.
Restore -mobile to its original state instead of forcing "0" on teardown.
Line 836 hard-codes -mobile to "0" in the destructor. That leaks global arg state into later tests (especially when prior state was unset or nonzero), which can cause order-dependent behavior.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/test/spark_tests.cpp` around lines 834 - 837, The MobileArgGuard struct's
destructor is hard-coding the `-mobile` argument to "0" instead of restoring its
original state, which causes test pollution. Modify the MobileArgGuard struct to
capture the original value of `-mobile` in the constructor (before setting it to
"1"), then restore that original value in the destructor instead of forcing it
to "0". This ensures the global argument state is properly restored to its
pre-test condition rather than leaving it in a fixed state that affects
subsequent tests.
| if (batchProofContainer->fCollectProofs) { | ||
| try { | ||
| batchProofContainer->finalize(); | ||
| batchProofContainer->verify(); | ||
| sparkBatchCleanup.dismiss(); | ||
| } |
There was a problem hiding this comment.
Reset the Spark batch container on the successful path.
The success path verifies the singleton batch and then dismisses the RAII cleanup, so this scope never calls clear(). That leaves fCollectProofs/temporary proof state dependent on verify() internals and can contaminate the next validation or flush batch. Clear after successful verification, or leave the cleanup guard armed. Based on provided PR context, BatchProofContainer::clear() resets tempSparkTransactions, sparkTransactions, and fCollectProofs.
🐛 Proposed fix
try {
batchProofContainer->finalize();
batchProofContainer->verify();
+ batchProofContainer->clear();
sparkBatchCleanup.dismiss();
}🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/validation.cpp` around lines 2846 - 2851, After the successful verify()
call in the batchProofContainer verification block, call
batchProofContainer->clear() before dismissing the sparkBatchCleanup guard. This
ensures that the temporary proof state variables (fCollectProofs,
tempSparkTransactions, and sparkTransactions) are properly reset after
verification completes, preventing state contamination for subsequent
validations or flush batches.
| namespace spark { | ||
| class CSparkTxInfo; | ||
| enum class SparkSpendProofVerificationResult; | ||
| } // namespace spark |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "Inspect spark forward declarations in validation.h"
rg -n -C2 'namespace spark|CSparkTxInfo|CSparkValidationContext|SparkSpendProofVerificationResult' src/validation.h
echo
echo "Locate concrete CSparkValidationContext declaration"
rg -n -C2 'class CSparkValidationContext|struct CSparkValidationContext' src/spark/state.hRepository: firoorg/firo
Length of output: 1481
Forward-declare spark::CSparkValidationContext in validation.h.
Lines 427 and 559 use spark::CSparkValidationContext*, but this type is not forward-declared in the namespace spark block (lines 55-58). This causes compile failures in translation units that include validation.h without a transitive include of spark/state.h.
💡 Proposed fix
namespace spark {
class CSparkTxInfo;
+struct CSparkValidationContext;
enum class SparkSpendProofVerificationResult;
} // namespace spark🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/validation.h` around lines 55 - 58, The namespace spark block in
validation.h (lines 55-58) is missing a forward declaration for
CSparkValidationContext, which is used as a pointer type at lines 427 and 559.
Add a forward declaration for class CSparkValidationContext in the existing
namespace spark block alongside the other forward declarations (CSparkTxInfo and
SparkSpendProofVerificationResult) to resolve the compile failures in
translation units that include validation.h without transitively including
spark/state.h.
There was a problem hiding this comment.
🧹 Nitpick comments (1)
src/init.cpp (1)
833-839: 💤 Low valuePost-failure execution continues without early return.
After
ActivateBestChainfails, the code callsNotifyGenesisWaitFailure()andStartShutdown()but does not return, so subsequent initialization code (masternode init, mempool loading, etc. on lines 850-866) still executes. WhileShutdownRequested()is set, this differs from the explicit early returns in the reindex block above.Consider adding an early return after
StartShutdown()for consistency, or verify that the subsequent operations are safe to execute in this failure state.♻️ Optional: Add early return for consistency
if (!ActivateBestChain(state, chainparams)) { LogPrintf("Failed to connect best block"); NotifyGenesisWaitFailure(); StartShutdown(); + return; }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/init.cpp` around lines 833 - 839, The ActivateBestChain failure block in the initialization sequence calls NotifyGenesisWaitFailure() and StartShutdown() but does not return early, allowing subsequent initialization code (masternode init, mempool loading, etc.) to continue executing. This is inconsistent with error handling patterns in the reindex block above. Add an early return statement immediately after the StartShutdown() call within the !ActivateBestChain conditional block to prevent further initialization code from executing when this critical failure occurs.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@src/init.cpp`:
- Around line 833-839: The ActivateBestChain failure block in the initialization
sequence calls NotifyGenesisWaitFailure() and StartShutdown() but does not
return early, allowing subsequent initialization code (masternode init, mempool
loading, etc.) to continue executing. This is inconsistent with error handling
patterns in the reindex block above. Add an early return statement immediately
after the StartShutdown() call within the !ActivateBestChain conditional block
to prevent further initialization code from executing when this critical failure
occurs.
|
User rahimi.nv@gmail.com does not have a PR Review subscription. Go to Team management and add this email to the PR Review subscription. |
| bind_weight.randomize(); | ||
| } | ||
|
|
||
| // Bind the commitment lists |
There was a problem hiding this comment.
Please make a benchmark comparison with the changes in this .cpp versus without those changes. The changes actually look reasonable and simplify the code, but our Multiexponent algorithm complexity is not linear, so we need to understand the impact of this. In case it's not much or even makes the performance worse, please revert the changes, as this is a very critical part.
There was a problem hiding this comment.
Sure. Thanks for your comment. I am running the benchmarks. Will put the results here once finished. It will take few hours though.
Overall this PR should result in
-reindexprocess 2x-3x faster.This PR speeds up early -reindex around the DIP3 deterministic masternode activation window by avoiding redundant cbTx masternode-list merkle-root work. During block connection, Firo already builds the deterministic masternode list and diff for the block; the patch reuses that freshly computed list for cbTx MN merkle validation, and skips rebuilding the simplified MN-list merkle root when a diff only changes fields that are not serialized into CSimplifiedMNListEntry. Consensus-relevant MN changes still force recomputation.
For Spark-heavy reindex, this PR defers Spark spend proof verification during raw block import and batch-verifies proofs before validation state is persisted, while preserving fail-closed behavior. It also avoids rebuilding unnecessary full cover sets for Grootle batch checks, keeps proof-cache entries tied to exact proof inputs, verifies deferred batches before flush/shutdown/reindex completion, and gates high-volume Spark validation trace logs behind -debug=validation to reduce log I/O during reindex.
First part of this PR already evaluated/approved by @levonpetrosyan93 at #1864