Skip to content

FIX: Hnsw single thread build (port to 4.0 dev)#25160

Closed
cpegeric wants to merge 2 commits into
matrixorigin:4.0-devfrom
cpegeric:hnsw_fix_build_4.0-dev
Closed

FIX: Hnsw single thread build (port to 4.0 dev)#25160
cpegeric wants to merge 2 commits into
matrixorigin:4.0-devfrom
cpegeric:hnsw_fix_build_4.0-dev

Conversation

@cpegeric

Copy link
Copy Markdown
Contributor

What type of PR is this?

  • API-change
  • BUG
  • Improvement
  • Documentation
  • Feature
  • Test and CI
  • Code Refactoring

Which issue(s) this PR fixes:

issue #24977

What this PR does / why we need it:

fix the bug to keep ThreadsBuild to 1 in all build path. cherry pick from main

cpegeric added 2 commits June 25, 2026 10:37
…earch matrixorigin#735) (matrixorigin#24867)

set number of thread for build to 1.

this is a known issue in usearch. unum-cloud/USearch#735

Approved by: @fengttt, @XuPeng-SH

(cherry picked from commit 57a991e)
…ixorigin#24993)

fix the bug to keep ThreadsBuild to 1 in all build path.

Approved by: @XuPeng-SH

(cherry picked from commit 9b4b808)
@qodo-code-review

Copy link
Copy Markdown

Qodo reviews are paused for this user.

Troubleshooting steps vary by plan Learn more →

On a Teams plan?
Reviews resume once this user has a paid seat and their Git account is linked in Qodo.
Link Git account →

Using GitHub Enterprise Server, GitLab Self-Managed, or Bitbucket Data Center?
These require an Enterprise plan - Contact us
Contact us →

@XuPeng-SH XuPeng-SH left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I re-checked the current head. I do not see a new logic bug in the single-thread workaround itself, but I still think the regression coverage is below the bar for the exact behavior this PR is fixing.

Right now the only test that actually demonstrates the underlying USearch orphan-node problem is pkg/vectorindex/hnsw/zz_orphan_test.go, and it is still permanently skipped. The normal CI-visible tests do not really pin the fix:

  • the build test only requires recall to stay above a broad threshold
  • the sync tests mainly assert RunOnce() returns nil
  • there is no deterministic assertion that the effective build concurrency is forced to 1 on the relevant paths

That means reverting build.go / sync.go back to multi-threaded add could still slip through CI while silently reintroducing the exact-match reachability bug this PR is supposed to prevent.

Concrete suggestions

  1. Add a non-skipped unit test around the actual HNSW build/sync entry points that asserts the effective build/add concurrency is forced to 1 even when config asks for more.
  2. Add a deterministic exact-self-match / reachability assertion in a CI-stable test if you have a small enough synthetic dataset for it.
  3. At minimum, add focused tests for:
    • GetConcurrencyForSingleThreadBuild()
    • NewHnswBuild(...) choosing single-thread mode
    • NewHnswSync(...) overriding hnsw_threads_build to 1 on the CDC/sync path

I would keep this at request changes until the core workaround is protected by a non-skipped automated regression test, not only by a manual repro file.

@cpegeric cpegeric closed this Jun 25, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

kind/bug Something isn't working size/M Denotes a PR that changes [100,499] lines

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants