Remove synthetic/structured data generation from diskann-providers#963
Merged
JordanMaples merged 5 commits intomainfrom Apr 22, 2026
Merged
Remove synthetic/structured data generation from diskann-providers#963JordanMaples merged 5 commits intomainfrom
JordanMaples merged 5 commits intomainfrom
Conversation
Replace generate_structured_data.rs and generate_synthetic_labels_utils.rs with diskann::graph::test::synthetic::Grid and diskann-tools' own label generator respectively. - Rewrite GenerateGrid trait impls to delegate to Grid::data/data_as - Replace all adj list generation calls with Grid::neighbors - Rewire generate_synthetic_labels binary to diskann-tools version - Delete both redundant utility files and clean up exports Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Contributor
There was a problem hiding this comment.
Pull request overview
This PR removes redundant synthetic/structured data generation utilities from diskann-providers by migrating grid generation to diskann::graph::test::synthetic::Grid and rewiring the synthetic label generation CLI to use diskann-tools’ label generator.
Changes:
- Add
Grid::from_dim()helper (plus tests) to construct a supportedGridfrom a dimension count. - Update
diskann-providersasync/caching tests to generate grid vectors + adjacency lists viaGrid::{data,data_as,neighbors}. - Rewire
diskann-toolsgenerate_synthetic_labelsbinary to calldiskann_tools::utils::generate_labelsand delete the olddiskann-providerslabel/grid generator utilities + exports.
Reviewed changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
diskann/src/graph/test/synthetic.rs |
Adds Grid::from_dim() and tests to support dimension→Grid construction for migrated callers. |
diskann-tools/src/bin/generate_synthetic_labels.rs |
Switches the CLI to diskann-tools label generation util and passes a storage provider. |
diskann-providers/src/utils/mod.rs |
Removes public exports for the deleted structured-data + synthetic-label utils. |
diskann-providers/src/utils/generate_synthetic_labels_utils.rs |
Deletes the old label generator implementation (now sourced from diskann-tools). |
diskann-providers/src/utils/generate_structured_data.rs |
Deletes the old grid/circle structured data generator implementation (now sourced from diskann::graph::test::synthetic::Grid). |
diskann-providers/src/model/graph/provider/async_/caching/example.rs |
Updates adjacency generation to use Grid::neighbors via async_tests::grid_from_dim. |
diskann-providers/src/index/diskann_async.rs |
Rewrites GenerateGrid impls + adjacency generation to delegate to Grid::data/data_as/neighbors. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
b6846d0 to
0aa62ba
Compare
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #963 +/- ##
==========================================
+ Coverage 89.31% 90.47% +1.16%
==========================================
Files 448 446 -2
Lines 83329 83076 -253
==========================================
+ Hits 74422 75160 +738
+ Misses 8907 7916 -991
Flags with carried forward coverage won't be shown. Click here to find out more.
🚀 New features to boost your workflow:
|
harsha-simhadri
approved these changes
Apr 21, 2026
metajack
approved these changes
Apr 22, 2026
Merged
arkrishn94
added a commit
that referenced
this pull request
Apr 22, 2026
Bumping to 0.50.1 to propagate changes to consumers. Changes since previous bump: ## What's Changed * Add more agentic guard rails by @hildebrandmw in #871 * Cleanup `diskann-benchmark-runner` and friends. by @hildebrandmw in #865 * Use `--all-targets` for the no-default-features CI run. by @hildebrandmw in #874 * Remove unused `normalizing_util.rs` from `diskann-providers` by @Copilot in #902 * Benchmark Support for A/B Tests by @hildebrandmw in #900 * [diskann-garnet] Bump diskann-garnet to 1.0.26 by @tiagonapoli in #925 * Remove the `AdjacencyList` from `diskann-providers` by @hildebrandmw in #915 * [PQ cleanup] Part 1: Move pq_scratch, quantizer_preprocess and pq_dataset to `diskann-disk` by @arkrishn94 in #930 * Forbid Debug in diskann-benchmark by @arrayka in #914 * Remove DebugProvider by @JordanMaples in #923 * [diskann-garnet] Create workflow to publish to nuget by @tiagonapoli in #926 * Move k-means implementation from diskann-providers to diskann-disk by @Copilot in #933 * Inline minmax distance evaluations by @arkrishn94 in #935 * Use `rust-toolchain.toml` in CI by @hildebrandmw in #934 * Add a globally blocking CI gate. by @hildebrandmw in #932 * Remove `utils/math_util.rs` from `diskann-providers` by @Copilot in #921 * Bump rand from 0.9.2 to 0.9.3 by @dependabot[bot] in #945 * Remove OPQ and friends by @arkrishn94 in #947 * Migrate test_flaky_consolidate from diskann_providers to diskann by @JordanMaples in #942 * Remove GraphDataType from diskann-providers by @wuw92 in #950 * Remove unused method extract_best_l_candidates in NeighborPriorityQueue by @doliawu in #951 * Add `Debug` bounds to `VectorRepr`'s distance GATs. by @hildebrandmw in #948 * Add benchmark pipeline with Rust-native A/B validation by @YuanyuanTian-hh in #912 * Remove unnecessary `Default` bound from `Neighbor`'s `VectorIdType` by @doliawu in #956 * Replace `AlignedBoxWithSlice` with plain `Vec` / `Matrix` where alignment is unused by @wuw92 in #955 * [minmax] 8-bit benchmark by @arkrishn94 in #959 * Add `MultiInsertStrategy` implementations for `BfTreeProvider` by @hildebrandmw in #949 * Replace `AlignedBoxWithSlice` with `Vec` in PQScratch and disk fp vector caches by @wuw92 in #960 * Adding unit tests for paged_search by @JordanMaples in #962 * Remove AlignedBoxWithSlice wrapper and add alias to Poly<[T], AlignedAllocator> by @JordanMaples in #965 * Remove synthetic/structured data generation from diskann-providers by @JordanMaples in #963 * added tests and some baselines for range_search by @JordanMaples in #961 ## New Contributors * @JordanMaples made their first contribution in #923 * @wuw92 made their first contribution in #950 * @doliawu made their first contribution in #951 * @YuanyuanTian-hh made their first contribution in #912 **Full Changelog**: v0.50.0...v0.50.1
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This is a follow up PR for #904, which has an outdated baseline and can be closed with this superseding it.
Essentially the same work was done here just without the provider/async_/caching changes. This closes #903.
When #953 merges, the from_dim definition will need to be updated to support the '2' case.
Replace generate_structured_data.rs and generate_synthetic_labels_utils.rs with diskann::graph::test::synthetic::Grid and diskann-tools' own label generator respectively.