Skip to content

[diskann-vector] Support truly unaligned distances.#981

Open
hildebrandmw wants to merge 4 commits intomainfrom
mhildebr/super-unaligned
Open

[diskann-vector] Support truly unaligned distances.#981
hildebrandmw wants to merge 4 commits intomainfrom
mhildebr/super-unaligned

Conversation

@hildebrandmw
Copy link
Copy Markdown
Contributor

@hildebrandmw hildebrandmw commented Apr 28, 2026

An internal user has a case where full-precision vectors (e.g. f32) are stored in completely unaligned buffers (e.g. align of 1), requiring a data copy to align the data before the slices can be safely constructed. However, our distance function implementations use SIMDVector::load_unaligned under the hood, which are compatible with under-aligned pointers.

This PR exposes a proper API to the DistanceProvider trait (via the Distance type) for invoking the SIMD implementations with unaligned pointers.

Suggested Reviewing Order

  • diskann-wide: The implementations of SIMDVector::load* and SIMDVector::store* already support underaligned pointers. This PR updates the documentation and restructures the load/store tests to verify this property (we were already using this property in some of the quantized distance kernels). The new load/store tests successfully pass Miri.

  • unaligned.rs - a new UnalignedSlice is added for unaligned slices. This is just a pointer + length pair with some validity requirements but no alignment requirement. Conversions from &[T] and &[T; N] are added and the trait AsUnaligned replaces the use of AsRef<[T]> and the internal ToSlice traits.

    A test-only Buffer is used to purposely offset simple types to exercise the unaligned cases.

  • distance/simd.rs: The simd_op kernel is tweaked to accept AsUnaligned instead of AsRef. Checks have been added to the existing tests to ensure that the under-unaligned versions are both Miri compatible and yield the exact same results as their properly aligned counterparts.

  • distance/implementation.rs: The architecture hooks and specialization are changed to use AsUnaligned. I've investigated the code generation and the checks for impl FTarget<...> for Specialize<N, F> are sufficient to trigger constant propagation and the full unrolling of small fixed-sized kernels.

  • distance/distance_provider.rs: The Distance type is changed to pass UnalignedSlices across the function pointer boundary rather than raw slices. We can keep the existing API for slices trivially via AsUnaligned.

Code Generation

Unfortunately, the order in which functions are code-generated seems to have changed with this PR. That said, the fixed-sized specializations I have spot-checked result in identical assembly with this PR as with main, which is to be expected.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds first-class support in diskann-vector for computing SIMD-accelerated distances over truly under-aligned vector buffers (e.g., alignment 1), avoiding the need to copy data just to form &[T].

Changes:

  • Introduces UnalignedSlice + AsUnaligned and re-exports them from the crate root.
  • Updates SIMD distance kernels and specialization/dispatch plumbing to accept AsUnaligned inputs.
  • Extends Distance with call_unaligned and adds tests that exercise intentionally misaligned buffers.

Reviewed changes

Copilot reviewed 8 out of 9 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
diskann-vector/src/unaligned.rs Adds UnalignedSlice, AsUnaligned, and a test-only Buffer to create intentionally misaligned data.
diskann-vector/src/lib.rs Exposes the new unaligned APIs from the crate root.
diskann-vector/src/test_util.rs Refactors test harness to accept a &mut dyn DistanceChecker (trait object).
diskann-vector/src/distance/simd.rs Changes simd_op to accept AsUnaligned and adds tests validating unaligned correctness/Miri safety.
diskann-vector/src/distance/implementations.rs Updates architecture hooks and fixed-size specialization to operate on AsUnaligned / UnalignedSlice.
diskann-vector/src/distance/distance_provider.rs Switches dispatched function signature to UnalignedSlice and adds Distance::call_unaligned.
diskann-vector/Cargo.toml Adds bytemuck (dev) and enables half/bytemuck for tests.
diskann-providers/src/model/pq/distance/multi.rs Adjusts reference distance calls to pass slices via explicit deref (&*...).
Cargo.lock Records the new bytemuck dependency resolution.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread diskann-vector/src/distance/implementations.rs Outdated
Comment thread diskann-vector/src/distance/simd.rs
@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented Apr 28, 2026

Codecov Report

❌ Patch coverage is 95.16908% with 10 lines in your changes missing coverage. Please review.
✅ Project coverage is 90.63%. Comparing base (f458cf6) to head (c3c2f66).

Files with missing lines Patch % Lines
diskann-vector/src/unaligned.rs 90.90% 6 Missing ⚠️
diskann-vector/src/distance/distance_provider.rs 66.66% 4 Missing ⚠️
Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main     #981      +/-   ##
==========================================
+ Coverage   89.48%   90.63%   +1.14%     
==========================================
  Files         448      449       +1     
  Lines       84081    84206     +125     
==========================================
+ Hits        75239    76318    +1079     
+ Misses       8842     7888     -954     
Flag Coverage Δ
miri 90.63% <95.16%> (+1.14%) ⬆️
unittests 90.59% <95.16%> (+1.26%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
diskann-providers/src/model/pq/distance/multi.rs 96.11% <100.00%> (ø)
diskann-vector/src/distance/implementations.rs 96.81% <100.00%> (+0.87%) ⬆️
diskann-vector/src/distance/simd.rs 90.14% <100.00%> (+12.92%) ⬆️
diskann-vector/src/lib.rs 44.44% <ø> (ø)
diskann-vector/src/test_util.rs 100.00% <100.00%> (ø)
diskann-vector/src/distance/distance_provider.rs 98.58% <66.66%> (-1.42%) ⬇️
diskann-vector/src/unaligned.rs 90.90% <90.90%> (ø)

... and 37 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 8 out of 9 changed files in this pull request and generated 2 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 58 to +60
A: Architecture,
F: for<'a, 'b> diskann_wide::arch::Target2<A, T, &'a [L; N], &'b [R; N]> + Default,
F: for<'a, 'b> diskann_wide::arch::Target2<A, T, UnalignedSlice<'a, L>, UnalignedSlice<'b, R>>
+ Default,
Copy link

Copilot AI Apr 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the Specialize bound, for<'a, 'b> Target2<..., UnalignedSlice<'a, L>, UnalignedSlice<'a, R>> declares two lifetimes but uses 'a for both arguments (leaving 'b unused). This looks like a typo and unnecessarily couples the left/right lifetimes. Consider changing the second argument to UnalignedSlice<'b, R> (or removing 'b entirely if same-lifetime is intended) to keep the specialization constraints correct and future-proof.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is fixed.

Comment thread diskann-vector/src/unaligned.rs
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants