bf_tree migration away from diskann-providers#1020
bf_tree migration away from diskann-providers#1020JordanMaples wants to merge 6 commits intomainfrom
Conversation
Add bf_tree insert benchmarks (Criterion + IAI)
Add pre-migration bf_tree provider benchmarks mirroring the existing
inmem insert benchmark. Both Criterion (wall-clock) and IAI-Callgrind
(instruction-count) benchmarks use the same sift-256 dataset and PQ
training as the inmem benchmarks for apples-to-apples comparison.
Gated behind #[cfg(feature = "bf_tree")] so existing benchmarks
continue to work without the feature flag.
Run with: cargo bench --bench bench_main --features virtual_storage,bf_tree
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Add bf_tree search benchmarks for migration parity
Add search benchmarks (Criterion + IAI-Callgrind) alongside existing
insert benchmarks. Search exercises the read path through
BfTreeProvider (vector lookups, neighbor access, greedy search
traversal) which is critical for detecting regressions post-migration.
The search benchmark builds the index once during setup, then measures
10 KNN queries (k=10, l=20) against the populated index.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Extract bf_tree provider into diskann-bf_tree-provider crate
Move the bf_tree provider and caching modules from diskann-providers
into the dedicated diskann-bf_tree-provider crate. This makes the
bf-tree dependency explicit at the crate level rather than hidden
behind a feature flag.
Changes:
- Move bf_tree/ and caching/ modules to diskann-bf_tree-provider/src/
- Set up Cargo.toml with proper dependencies (bf-tree, diskann,
diskann-providers, diskann-utils, diskann-vector, etc.)
- Convert all crate:: imports to diskann_providers:: imports
- Bump visibility of postprocess module, AsDeletionCheck/DeletionCheck
traits, and TableDeleteProviderAsync methods to pub
- Remove bf_tree feature flag and optional deps from diskann-providers
- Remove cfg(feature = "bf_tree") gates from table_delete_provider.rs
- Rename provider/provider.rs to provider/bf_tree_provider.rs (clippy)
- Disable caching example.rs tests (cross-crate cfg(test) limitation)
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Move bf_tree benchmarks to diskann-bf_tree-provider
Migrate Criterion and IAI-Callgrind benchmark targets from
diskann-providers to the new diskann-bf_tree-provider crate.
Update imports to use the new crate's module paths. Remove
bf_tree benchmark remnants (cfg gates, modules) from
diskann-providers benches.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
cargo.lock got missed by last commit
Move delete bitmap serialization to diskann-bf_tree-provider
Move to_bytes/from_bytes logic out of TableDeleteProviderAsync (where
it was bf_tree-scoped) into a dedicated delete_bitmap_serde module in
the new crate. The reimplementation works through the public API and
produces byte-identical output (LE u32 words). Revert the methods in
diskann-providers to pub(crate) + #[cfg(test)].
Adds format-level compatibility tests including known fixtures and
padding bit rejection.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Move DeletionCheck traits and RemoveDeletedIdsAndCopy to diskann::graph::glue
These are core to the inplace delete algorithm's search post-processing
and belong in the diskann crate alongside SearchPostProcess, CopyIds,
FilterStartPoints, and Pipeline. This removes the need to widen
diskann-providers' postprocess module to pub for external crates.
Also removes the now-dead to_bytes/from_bytes methods and their tests
from TableDeleteProviderAsync in diskann-providers, since that
serialization logic has been moved to diskann-bf_tree-provider.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Use workspace_root() for benchmark test data paths
Replace env!("CARGO_MANIFEST_DIR")-relative paths with
diskann_utils::workspace_root() so benchmarks find test_data/
without requiring symlinks in each crate directory.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The bf_tree feature was removed from diskann-providers when the provider was extracted into its own crate (diskann-bf_tree-provider). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
b94f4df to
44be934
Compare
Codecov Report❌ Patch coverage is ❌ Your patch status has failed because the patch coverage (88.48%) is below the target coverage (90.00%). You can increase the patch coverage or adjust the target coverage. Additional details and impacted files@@ Coverage Diff @@
## main #1020 +/- ##
==========================================
+ Coverage 89.51% 89.92% +0.40%
==========================================
Files 460 480 +20
Lines 85424 90781 +5357
==========================================
+ Hits 76469 81631 +5162
- Misses 8955 9150 +195
Flags with carried forward coverage won't be shown. Click here to find out more.
🚀 New features to boost your workflow:
|
- Rewrite QuantVectorProvider to use Arc<Poly<dyn Quantizer>> - Add local HybridComputer and QuantQueryComputer wrapper - Update CreateQuantProvider impl for Poly<dyn Quantizer> - Update serialization to use Quantizer::serialize/try_deserialize - Rewrite all tests and doc examples for spherical quantizer - Update benchmarks to use SphericalQuantizer::train with ReciprocalMeanNorm - Move rand to regular dependencies (needed by doc tests) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
| //! bits. Bit `b` of word `w` corresponds to vector ID `w * 32 + b`. The number of words | ||
| //! is `ceil(max_size / 32)`. | ||
|
|
||
| use diskann_providers::model::graph::provider::async_::TableDeleteProviderAsync; |
There was a problem hiding this comment.
Weren't we getting rid of this?
There was a problem hiding this comment.
this was from the earlier "migration only" phase of the changes. This'll be removed when I attack the D param removal.
| .quantizer | ||
| .fused_query_computer( | ||
| &query_f32, | ||
| QueryLayout::FullPrecision, |
There was a problem hiding this comment.
One minor detail - for index construction, you might want to leave this SameAsData so the distances are comparable to the distances computed during pruning.
So far, this is a mostly mechanical movement of the bf_tree prorovider from inside of diskann-providers crate into its own crate. This is largely a copilot authored PR, guided by me.
The general flow of the work was:
There are a couple remaining things to do here before we take this PR:
Some concerns that need to be addressed before this is ready for review.
As part of the move, there are some concepts that were moved into glue.rs and had their visibility expanded. This will need to be cleaned up and moved back to their previous location (and visibility level) once the tableprovider for the delete dealt with.