[PQ Cleanup] Part 2: Relocate calculate_chunk_offsets* and remove redundant distance impls#976
[PQ Cleanup] Part 2: Relocate calculate_chunk_offsets* and remove redundant distance impls#976arkrishn94 wants to merge 5 commits intomainfrom
calculate_chunk_offsets* and remove redundant distance impls#976Conversation
Relocates the object pool module so that it is available to crates that depend on diskann-utils but not diskann (notably diskann-quantization, which will gain pool-aware distance-table allocation in a follow-up). diskann::utils::object_pool stays as a re-export for backwards compatibility. Direct importers in diskann-providers, diskann-disk, and diskann-garnet are switched to use diskann_utils::object_pool directly. Internal diskann users continue to use the re-export.
calculate_chunk_offsets* and remove redundant distance impls and calculate_chunk_offsets* and remove redundant distance impls and
There was a problem hiding this comment.
Pull request overview
Note
Copilot was unable to run its full agentic suite in this review.
Relocates PQ-related chunk offset helpers into diskann-quantization, centralizes the cosine→L2 fallback rationale for disk PQ preprocessing, and removes redundant distance trampoline impls by leaning on accessor element refs.
Changes:
- Moved
calculate_chunk_offsets[_auto]fromdiskann-providersintodiskann-quantization::viewsand updated call sites. - Migrated
object_poolusage todiskann-utils::object_poolacross crates and removeddiskann::utils::object_pool. - Deduplicated cosine/L2 fallback commentary and removed redundant distance impls.
Reviewed changes
Copilot reviewed 20 out of 21 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| diskann/src/utils/mod.rs | Removes utils::object_pool module exposure. |
| diskann/src/graph/search/scratch.rs | Switches AsPooled import to diskann_utils. |
| diskann/src/graph/index.rs | Switches ObjectPool/PooledRef import to diskann_utils. |
| diskann-utils/src/lib.rs | Exposes object_pool module publicly from diskann-utils. |
| diskann-quantization/src/views.rs | Adds calculate_chunk_offsets[_auto] helpers. |
| diskann-providers/src/model/pq/pq_construction.rs | Removes local chunk-offset helpers and imports from quantization crate. |
| diskann-providers/src/model/pq/mod.rs | Stops re-exporting chunk-offset helpers from providers PQ module. |
| diskann-providers/src/model/pq/distance/test_utils.rs | Updates import path for calculate_chunk_offsets_auto. |
| diskann-providers/src/model/pq/distance/l2.rs | Switches object pool import to diskann_utils. |
| diskann-providers/src/model/pq/distance/innerproduct.rs | Switches object pool import to diskann_utils. |
| diskann-providers/src/model/pq/distance/dynamic.rs | Switches object pool import and removes redundant trait impls. |
| diskann-providers/src/model/pq/distance/common.rs | Switches object pool import to diskann_utils. |
| diskann-providers/src/model/mod.rs | Removes re-export of calculate_chunk_offsets_auto. |
| diskann-providers/src/model/graph/provider/async_/memory_quant_vector_provider.rs | Switches object pool import to diskann_utils. |
| diskann-providers/src/model/graph/provider/async_/fast_memory_quant_vector_provider.rs | Switches object pool import and updates tests for new argument types. |
| diskann-providers/src/model/graph/provider/async_/bf_tree/quant_vector_provider.rs | Switches object pool import to diskann_utils. |
| diskann-garnet/src/provider.rs | Switches object pool imports to diskann_utils. |
| diskann-disk/src/search/provider/disk_provider.rs | Switches object pool imports to diskann_utils. |
| diskann-disk/src/search/pq/quantizer_preprocess.rs | Centralizes cosine→L2 fallback rationale into module docs. |
| diskann-benchmark/src/backend/exhaustive/product.rs | Updates chunk-offset helper call site to diskann-quantization. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| pub fn calculate_chunk_offsets(dimensions: usize, num_pq_chunks: usize, offsets: &mut [usize]) { | ||
| // Calculate each chunk's offset | ||
| // If we have 8 dimension and 3 chunks then offsets would be [0,3,6,8] | ||
| let mut chunk_offset: usize = 0; | ||
| offsets[0] = chunk_offset; | ||
| for chunk_index in 0..num_pq_chunks { | ||
| chunk_offset += dimensions / num_pq_chunks; | ||
| if chunk_index < (dimensions % num_pq_chunks) { | ||
| chunk_offset += 1; | ||
| } | ||
| offsets[chunk_index + 1] = chunk_offset; | ||
| } | ||
| } |
There was a problem hiding this comment.
As a public helper, this can panic with an unhelpful message when num_pq_chunks == 0 (division/mod by zero) or when offsets.len() != num_pq_chunks + 1 (out-of-bounds on offsets[0] / offsets[chunk_index + 1]). Consider adding an explicit check (e.g., assert!(num_pq_chunks > 0, ...) and assert_eq!(offsets.len(), num_pq_chunks + 1, ...)) or changing the API to return a Result with a clear error.
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #976 +/- ##
=======================================
Coverage 89.43% 89.44%
=======================================
Files 449 449
Lines 83779 83755 -24
=======================================
- Hits 74926 74911 -15
+ Misses 8853 8844 -9
Flags with carried forward coverage won't be shown. Click here to find out more.
🚀 New features to boost your workflow:
|
…ng_data to test helper
calculate_chunk_offsets* and remove redundant distance impls and calculate_chunk_offsets* and remove redundant distance impls
| *a += *b; | ||
| }); | ||
| }); | ||
| } |
There was a problem hiding this comment.
Would it make sense to have a map_rows_mut/some generalized method instead of a bespoke accum_row_inplace?
There was a problem hiding this comment.
Yeah good point. On it.
Couple of small independent cleanups for PQ.
1. Move
calculate_chunk_offsets[_auto]todiskann-quantizationThese two functions are pure prefix-sum math over
(dimensions, num_pq_chunks)— no PQ training state, no providers concerns. They belong next toChunkOffsetsBase/ChunkOffsetsViewindiskann-quantization::views. I'm open to the idea that may belong as methods of these structs but, as usual, the refactor ofpq_construction.rsis blocking this.All in-repo call sites have been updated.
2. Remove redundant deref impls in
pq::distance::dynamicQueryComputerandDistanceComputerhad six trampoline impls forwarding&Vec<u8>and&&[u8]arguments to the canonical&[u8]impls.ElementRefin the accessor now allows us to get rid of these!3. Minor changes
accum_row_inplacetodiskann-utils::viewsget_chunk_from_training_datain pq_construction.rs from public API into tests where it is used.