Skip to content

[PQ Cleanup] Part 2: Relocate calculate_chunk_offsets* and remove redundant distance impls#976

Open
arkrishn94 wants to merge 5 commits intomainfrom
u/adkrishnan/pq-cleanup-3
Open

[PQ Cleanup] Part 2: Relocate calculate_chunk_offsets* and remove redundant distance impls#976
arkrishn94 wants to merge 5 commits intomainfrom
u/adkrishnan/pq-cleanup-3

Conversation

@arkrishn94
Copy link
Copy Markdown
Contributor

@arkrishn94 arkrishn94 commented Apr 24, 2026

Couple of small independent cleanups for PQ.

1. Move calculate_chunk_offsets[_auto] to diskann-quantization

These two functions are pure prefix-sum math over (dimensions, num_pq_chunks) — no PQ training state, no providers concerns. They belong next to ChunkOffsetsBase / ChunkOffsetsView in diskann-quantization::views. I'm open to the idea that may belong as methods of these structs but, as usual, the refactor of pq_construction.rs is blocking this.

All in-repo call sites have been updated.

2. Remove redundant deref impls in pq::distance::dynamic

QueryComputer and DistanceComputer had six trampoline impls forwarding &Vec<u8> and &&[u8] arguments to the canonical &[u8] impls. ElementRef in the accessor now allows us to get rid of these!

3. Minor changes

  • Centralize the CosineNormalized → L2 fallback rationale/comments
  • Move accum_row_inplace to diskann-utils::views
  • Move get_chunk_from_training_data in pq_construction.rs from public API into tests where it is used.

Relocates the object pool module so that it is available to crates that depend on diskann-utils but not diskann (notably diskann-quantization, which will gain pool-aware distance-table allocation in a follow-up). diskann::utils::object_pool stays as a re-export for backwards compatibility.

Direct importers in diskann-providers, diskann-disk, and diskann-garnet are switched to use diskann_utils::object_pool directly. Internal diskann users continue to use the re-export.
@arkrishn94 arkrishn94 requested review from a team and Copilot April 24, 2026 18:50
@arkrishn94 arkrishn94 changed the title {PQ Cleanup] Part 2: Relocate calculate_chunk_offsets* and remove redundant distance impls and [PQ Cleanup] Part 2: Relocate calculate_chunk_offsets* and remove redundant distance impls and Apr 24, 2026
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

Relocates PQ-related chunk offset helpers into diskann-quantization, centralizes the cosine→L2 fallback rationale for disk PQ preprocessing, and removes redundant distance trampoline impls by leaning on accessor element refs.

Changes:

  • Moved calculate_chunk_offsets[_auto] from diskann-providers into diskann-quantization::views and updated call sites.
  • Migrated object_pool usage to diskann-utils::object_pool across crates and removed diskann::utils::object_pool.
  • Deduplicated cosine/L2 fallback commentary and removed redundant distance impls.

Reviewed changes

Copilot reviewed 20 out of 21 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
diskann/src/utils/mod.rs Removes utils::object_pool module exposure.
diskann/src/graph/search/scratch.rs Switches AsPooled import to diskann_utils.
diskann/src/graph/index.rs Switches ObjectPool/PooledRef import to diskann_utils.
diskann-utils/src/lib.rs Exposes object_pool module publicly from diskann-utils.
diskann-quantization/src/views.rs Adds calculate_chunk_offsets[_auto] helpers.
diskann-providers/src/model/pq/pq_construction.rs Removes local chunk-offset helpers and imports from quantization crate.
diskann-providers/src/model/pq/mod.rs Stops re-exporting chunk-offset helpers from providers PQ module.
diskann-providers/src/model/pq/distance/test_utils.rs Updates import path for calculate_chunk_offsets_auto.
diskann-providers/src/model/pq/distance/l2.rs Switches object pool import to diskann_utils.
diskann-providers/src/model/pq/distance/innerproduct.rs Switches object pool import to diskann_utils.
diskann-providers/src/model/pq/distance/dynamic.rs Switches object pool import and removes redundant trait impls.
diskann-providers/src/model/pq/distance/common.rs Switches object pool import to diskann_utils.
diskann-providers/src/model/mod.rs Removes re-export of calculate_chunk_offsets_auto.
diskann-providers/src/model/graph/provider/async_/memory_quant_vector_provider.rs Switches object pool import to diskann_utils.
diskann-providers/src/model/graph/provider/async_/fast_memory_quant_vector_provider.rs Switches object pool import and updates tests for new argument types.
diskann-providers/src/model/graph/provider/async_/bf_tree/quant_vector_provider.rs Switches object pool import to diskann_utils.
diskann-garnet/src/provider.rs Switches object pool imports to diskann_utils.
diskann-disk/src/search/provider/disk_provider.rs Switches object pool imports to diskann_utils.
diskann-disk/src/search/pq/quantizer_preprocess.rs Centralizes cosine→L2 fallback rationale into module docs.
diskann-benchmark/src/backend/exhaustive/product.rs Updates chunk-offset helper call site to diskann-quantization.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +219 to +231
pub fn calculate_chunk_offsets(dimensions: usize, num_pq_chunks: usize, offsets: &mut [usize]) {
// Calculate each chunk's offset
// If we have 8 dimension and 3 chunks then offsets would be [0,3,6,8]
let mut chunk_offset: usize = 0;
offsets[0] = chunk_offset;
for chunk_index in 0..num_pq_chunks {
chunk_offset += dimensions / num_pq_chunks;
if chunk_index < (dimensions % num_pq_chunks) {
chunk_offset += 1;
}
offsets[chunk_index + 1] = chunk_offset;
}
}
Copy link

Copilot AI Apr 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a public helper, this can panic with an unhelpful message when num_pq_chunks == 0 (division/mod by zero) or when offsets.len() != num_pq_chunks + 1 (out-of-bounds on offsets[0] / offsets[chunk_index + 1]). Consider adding an explicit check (e.g., assert!(num_pq_chunks > 0, ...) and assert_eq!(offsets.len(), num_pq_chunks + 1, ...)) or changing the API to return a Result with a clear error.

Copilot uses AI. Check for mistakes.
Comment thread diskann/src/utils/mod.rs
Comment thread diskann-providers/src/model/pq/mod.rs
Comment thread diskann-disk/src/search/pq/quantizer_preprocess.rs
Comment thread diskann-quantization/src/views.rs
@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented Apr 24, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 89.44%. Comparing base (3a20042) to head (bf08286).
⚠️ Report is 1 commits behind head on main.

Additional details and impacted files

Impacted file tree graph

@@           Coverage Diff           @@
##             main     #976   +/-   ##
=======================================
  Coverage   89.43%   89.44%           
=======================================
  Files         449      449           
  Lines       83779    83755   -24     
=======================================
- Hits        74926    74911   -15     
+ Misses       8853     8844    -9     
Flag Coverage Δ
miri 89.44% <100.00%> (+<0.01%) ⬆️
unittests 89.28% <100.00%> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
...iskann-benchmark/src/backend/exhaustive/product.rs 100.00% <ø> (ø)
diskann-disk/src/search/pq/quantizer_preprocess.rs 88.00% <ø> (ø)
diskann-disk/src/search/provider/disk_provider.rs 90.89% <ø> (ø)
diskann-disk/src/storage/quant/pq/pq_generation.rs 93.33% <ø> (ø)
diskann-garnet/src/provider.rs 83.36% <ø> (ø)
...ovider/async_/fast_memory_quant_vector_provider.rs 98.46% <100.00%> (-0.01%) ⬇️
...ph/provider/async_/memory_quant_vector_provider.rs 98.36% <ø> (ø)
diskann-providers/src/model/pq/distance/common.rs 100.00% <ø> (ø)
diskann-providers/src/model/pq/distance/dynamic.rs 94.11% <ø> (+7.20%) ⬆️
...nn-providers/src/model/pq/distance/innerproduct.rs 100.00% <ø> (ø)
... and 9 more
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@arkrishn94 arkrishn94 changed the title [PQ Cleanup] Part 2: Relocate calculate_chunk_offsets* and remove redundant distance impls and [PQ Cleanup] Part 2: Relocate calculate_chunk_offsets* and remove redundant distance impls Apr 25, 2026
*a += *b;
});
});
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense to have a map_rows_mut/some generalized method instead of a bespoke accum_row_inplace?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah good point. On it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants