Skip to content

[benchmark/filtered-search prep] Make benchmarks stateful#995

Open
hildebrandmw wants to merge 3 commits intomainfrom
mhildebr/stateful-benchmarks
Open

[benchmark/filtered-search prep] Make benchmarks stateful#995
hildebrandmw wants to merge 3 commits intomainfrom
mhildebr/stateful-benchmarks

Conversation

@hildebrandmw
Copy link
Copy Markdown
Contributor

@hildebrandmw hildebrandmw commented Apr 29, 2026

A recurring problem with our current benchmark infrastructure is the SearchPhase enum (selecting what kind of search is conducted) does its job a little too well: every time a new variant is added, we need to either update all users of SearchPhase (bloating compile times) or explicitly opt-out of a particular search phase, which is brittle especially with respect to Benchmark::try_match consistency.

For example see:

This PR takes the first step towards systematically solving this problem by allowing benchmarks registered with diskann_benchmark_runner::Benchmarks to have state rather
than being purely type-level constructs. Stateful benchmarks can have "search plugins" dynamically registered at construction time. These plugins participate in
Benchmark::try_match, Benchmark::description, and Benchmark::run, allowing individual benchmarks to opt into new search-phase variants without requiring changes across all
benchmarks. See #996 as a follow-up implementing this idea

Suggested Reviewing Order

In diskann-benchmark-runner:

  • benchmark.rs: This is the main change. It simply changes the Benchmark and Regression traits to receive by &self.
  • registry.rs: Change the signatures of Benchmarks::register and Benchmarks::register_regression to receive the benchmark type by-value.
  • The rest of the changes are updates to the test infrastructure.

In diskann-benchmark: The main changes involve cleaning up the 'static hack and removing the BuildAndSearch/BuildAndDynamicRun indirection traits that are no longer necessary.

In diskann-benchmark-simd: Feel free to skip.

@hildebrandmw hildebrandmw marked this pull request as ready for review April 29, 2026 01:59
@hildebrandmw hildebrandmw requested review from a team and Copilot April 29, 2026 01:59
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates the benchmark framework to support stateful benchmarks by switching Benchmark/Regression APIs from type-level (static) methods to instance methods (&self) and updating the registry to register benchmark values (enabling future dynamic “search plugin” registration per benchmark instance).

Changes:

  • Convert Benchmark and Regression trait methods (try_match, description, run, check) to take &self.
  • Update Benchmarks::register / register_regression to accept a benchmark instance by value and store it behind a type-erased wrapper.
  • Refactor benchmark implementations across diskann-benchmark and diskann-benchmark-simd to remove the prior 'static/dispatcher indirection patterns.

Reviewed changes

Copilot reviewed 16 out of 16 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
diskann-benchmark/src/utils/mod.rs Updates stub benchmark registration and trait method signatures to the new &self API.
diskann-benchmark/src/backend/index/spherical.rs Refactors spherical index benchmarks to unit-like/stateful benchmarks and inlines prior dispatch indirection.
diskann-benchmark/src/backend/index/scalar.rs Refactors scalar quantized index benchmarks to constructable/stateful benchmark instances.
diskann-benchmark/src/backend/index/product.rs Refactors PQ index benchmark to a constructable/stateful benchmark instance.
diskann-benchmark/src/backend/index/benchmarks.rs Removes BuildAndSearch/BuildAndDynamicRun indirection and moves logic directly into Benchmark::run(&self, ...).
diskann-benchmark/src/backend/filters/benchmark.rs Converts metadata index benchmark into a stateful/unit-like benchmark and extracts run logic into a free function.
diskann-benchmark/src/backend/exhaustive/spherical.rs Refactors exhaustive spherical benchmarks to unit-like/stateful benchmarks.
diskann-benchmark/src/backend/exhaustive/product.rs Refactors exhaustive product benchmarks to unit-like/stateful benchmarks.
diskann-benchmark/src/backend/exhaustive/minmax.rs Refactors exhaustive minmax benchmarks to unit-like/stateful benchmarks.
diskann-benchmark/src/backend/disk_index/benchmarks.rs Refactors disk index benchmark/regression to stateful benchmarks and updates registration accordingly.
diskann-benchmark-simd/src/lib.rs Updates SIMD regression benchmarks to be instance-based and adjusts kernel execution to pass arch at call time.
diskann-benchmark-runner/src/test/typed.rs Updates typed test benchmarks/regressions to instance-based implementations (adds constructors).
diskann-benchmark-runner/src/test/mod.rs Updates benchmark registration in test harness to pass benchmark instances.
diskann-benchmark-runner/src/test/dim.rs Updates dim test benchmarks/regression to the new &self trait signatures.
diskann-benchmark-runner/src/registry.rs Changes registry APIs to accept benchmark instances and stores them in the wrapper.
diskann-benchmark-runner/src/benchmark.rs Changes core traits to &self methods and updates internal type-erased wrapper and regression plumbing.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread diskann-benchmark-runner/src/benchmark.rs
Comment thread diskann-benchmark/src/backend/exhaustive/product.rs Outdated
Comment thread diskann-benchmark/src/backend/index/scalar.rs Outdated
@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented Apr 29, 2026

Codecov Report

❌ Patch coverage is 79.55801% with 74 lines in your changes missing coverage. Please review.
✅ Project coverage is 90.64%. Comparing base (f458cf6) to head (66d599a).

Files with missing lines Patch % Lines
diskann-benchmark/src/backend/index/benchmarks.rs 45.21% 63 Missing ⚠️
diskann-benchmark-simd/src/lib.rs 95.65% 5 Missing ⚠️
diskann-benchmark/src/backend/filters/benchmark.rs 91.93% 5 Missing ⚠️
diskann-benchmark/src/utils/mod.rs 75.00% 1 Missing ⚠️
Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main     #995      +/-   ##
==========================================
+ Coverage   89.48%   90.64%   +1.15%     
==========================================
  Files         448      448              
  Lines       84081    84153      +72     
==========================================
+ Hits        75239    76279    +1040     
+ Misses       8842     7874     -968     
Flag Coverage Δ
miri 90.64% <79.55%> (+1.15%) ⬆️
unittests 90.60% <79.55%> (+1.27%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
diskann-benchmark-runner/src/benchmark.rs 89.21% <100.00%> (-0.60%) ⬇️
diskann-benchmark-runner/src/registry.rs 88.26% <100.00%> (+0.26%) ⬆️
diskann-benchmark-runner/src/test/dim.rs 89.21% <100.00%> (+1.30%) ⬆️
diskann-benchmark-runner/src/test/mod.rs 100.00% <100.00%> (ø)
diskann-benchmark-runner/src/test/typed.rs 96.47% <100.00%> (+0.51%) ⬆️
diskann-benchmark/src/backend/exhaustive/minmax.rs 100.00% <ø> (ø)
...iskann-benchmark/src/backend/exhaustive/product.rs 100.00% <ø> (ø)
...kann-benchmark/src/backend/exhaustive/spherical.rs 100.00% <ø> (ø)
diskann-benchmark/src/backend/index/product.rs 100.00% <ø> (ø)
diskann-benchmark/src/backend/index/scalar.rs 100.00% <ø> (ø)
... and 5 more

... and 38 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
@hildebrandmw hildebrandmw requested a review from Copilot April 29, 2026 17:04
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 16 out of 16 changed files in this pull request and generated 3 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +205 to 208
impl<T> Regression for T
where
T: super::Regression,
{
Copy link

Copilot AI Apr 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The blanket implementation impl<T> Regression for T where T: super::Regression is self-referential and will overlap with any concrete impl Regression for SomeType, causing coherence conflicts. Additionally, calling self.check(...) inside this impl risks unbounded recursion because it resolves to the same Regression::check being implemented. Remove this blanket impl; instead, keep regression behavior on the concrete benchmark types and have the internal JSON/check adapter call the benchmark’s check via the wrapped benchmark instance (e.g., wrapper.benchmark.check(...)) or via a dedicated internal adapter type/extension trait that is not an impl of Regression itself.

Copilot uses AI. Check for mistakes.
input: &IndexOperation,
checkpoint: Checkpoint<'_>,
output: &mut dyn Output,
mut output: &mut dyn Output,
Copy link

Copilot AI Apr 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mut output: &mut dyn Output makes the binding mutable, but the binding doesn’t appear to be reassigned in this function. This commonly triggers unused_mut warnings (including on parameters) and provides no benefit because interior mutability is already available through &mut. Drop the mut on the parameter unless you actually rebind output.

Suggested change
mut output: &mut dyn Output,
output: &mut dyn Output,

Copilot uses AI. Check for mistakes.
Comment on lines +308 to +313
#[cfg(target_arch = "x86_64")]
{
dispatcher.register_regression(
"simd-op-f32xf32-x86_64_V4",
Kernel::<diskann_wide::arch::x86_64::V4, f32, f32>::new(),
);
Copy link

Copilot AI Apr 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The change removes the small register! macro and introduces repeated #[cfg(target_arch = ...)] blocks with mostly identical registration calls. This increases the chance of drift (e.g., missing an arch variant or name mismatch) when updating the registration list. Consider reinstating a small helper macro/function for arch-gated registration to keep the list declarative while still passing benchmark instances (e.g., macro that expands to dispatcher.register_regression($name, Kernel::<...>::new()) under the appropriate cfg).

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants