Skip to content

fix(compaction): keep compact_now() Send + fix large-payload corruption (closes #3)#4

Merged
farhan-syah merged 4 commits into
NodeDB-Lab:mainfrom
emanzx:fix/compact-now-send
Jun 15, 2026
Merged

fix(compaction): keep compact_now() Send + fix large-payload corruption (closes #3)#4
farhan-syah merged 4 commits into
NodeDB-Lab:mainfrom
emanzx:fix/compact-now-send

Conversation

@emanzx

@emanzx emanzx commented Jun 14, 2026

Copy link
Copy Markdown
Contributor

Summary

Two related compaction fixes:

  1. compact_now() future stays Send (original PR) — swap the EnteredSpan guard for tracing::Instrument, so compaction can be called from Send async contexts (e.g. an #[async_trait] StorageEngine impl).
  2. Compaction no longer corrupts the store on large payloadscloses compact_now() corrupts store on large payloads: 'payload too large' → 'AEAD tag verification failed' #3.

Closes #3compact_now() corrupts store on large payloads

Root cause: the dense repack rebuilt trees with bulk_load, which could only inline values. collect_range resolves overflow chains back to inline bytes, so any value above the inline threshold (page_size/4) overflowed leaf capacity → payload too large, and the partial in-place rewrite then left pages whose AEAD tag wouldn't verify → store bricked.

Fixes:

  • Overflow-aware bulk_load — values past the inline threshold are spilled to overflow chains, exactly like the put path (shared overflow::inline_value_threshold). A repack now reproduces the original storage shape instead of failing.
  • Atomic repack via scratch file + rename — compaction builds the compacted trees into main.db.compact and atomically renames it over main.db. The live file is never modified until the rename, so a failure or crash before it leaves the original store fully intact (an orphaned scratch is cleaned up on the next open). This is the LMDB/SQLite-VACUUM model.
  • Honest compact_step — a full rewrite can't be safely chunked across writer-lock releases without dropping concurrent commits, so compact_step now performs the whole atomic compaction in one call and reports completion. The old fake-incremental relocation machinery (and its dead watermark codec) is removed. Sustained-write growth is already bounded continuously by the durable free-list, so reclaiming to the OS is a maintenance operation.

Tests

Large/overflow-value compaction round-trips, a partial-compaction-then-commit reopen guard, and a crash-before-rename safety test. Full suite green; clippy and rustfmt clean.

Notes

  • Rebased onto current main (which already carries the durable free-list work this depends on).
  • Peak disk during compaction ≈ old + compacted file — inherent to any atomic shrink.

emanzx and others added 4 commits June 15, 2026 06:37
compact_now() held a tracing::EnteredSpan guard across its .await points.
EnteredSpan is !Send, so holding it across an await makes the returned
future !Send — uncallable from a Send context (e.g. an #[async_trait]
impl whose methods require Send). Replace the entered-span guard with
tracing::Instrument so the span still covers the async work while the
future stays Send. Behavior-preserving: same span name, fields, logging.
@farhan-syah farhan-syah force-pushed the fix/compact-now-send branch from 5453608 to 1312214 Compare June 15, 2026 00:08
@farhan-syah farhan-syah changed the title fix(compaction): keep compact_now() future Send fix(compaction): keep compact_now() Send + fix large-payload corruption (closes #3) Jun 15, 2026
@farhan-syah farhan-syah merged commit dad7f26 into NodeDB-Lab:main Jun 15, 2026
18 checks passed
emanzx added a commit to emanzx/nodedb-lite that referenced this pull request Jun 15, 2026
Add compact() to the StorageEngine trait (default no-op), overridden by the
pagedb-backed engine to call pagedb::Db::compact_now() and map CompactStats
to a lite-owned CompactionOutcome. Add a NodeDbLite::compact() forwarder.
In-memory engine keeps the no-op default. Lets consumers reclaim deferred-free
space / bound on-disk growth between writes.

Requires pagedb compact_now() to be Send-callable (NodeDB-Lab/pagedb#4).
Addresses NodeDB-Lab/pagedb#2 (compaction not exposed to consumers).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

compact_now() corrupts store on large payloads: 'payload too large' → 'AEAD tag verification failed'

2 participants