FIX: Hnsw single thread build (port to 4.0 dev)#25160
Conversation
…earch matrixorigin#735) (matrixorigin#24867) set number of thread for build to 1. this is a known issue in usearch. unum-cloud/USearch#735 Approved by: @fengttt, @XuPeng-SH (cherry picked from commit 57a991e)
…ixorigin#24993) fix the bug to keep ThreadsBuild to 1 in all build path. Approved by: @XuPeng-SH (cherry picked from commit 9b4b808)
Qodo reviews are paused for this user.Troubleshooting steps vary by plan Learn more → On a Teams plan? Using GitHub Enterprise Server, GitLab Self-Managed, or Bitbucket Data Center? |
XuPeng-SH
left a comment
There was a problem hiding this comment.
I re-checked the current head. I do not see a new logic bug in the single-thread workaround itself, but I still think the regression coverage is below the bar for the exact behavior this PR is fixing.
Right now the only test that actually demonstrates the underlying USearch orphan-node problem is pkg/vectorindex/hnsw/zz_orphan_test.go, and it is still permanently skipped. The normal CI-visible tests do not really pin the fix:
- the build test only requires recall to stay above a broad threshold
- the sync tests mainly assert
RunOnce()returns nil - there is no deterministic assertion that the effective build concurrency is forced to 1 on the relevant paths
That means reverting build.go / sync.go back to multi-threaded add could still slip through CI while silently reintroducing the exact-match reachability bug this PR is supposed to prevent.
Concrete suggestions
- Add a non-skipped unit test around the actual HNSW build/sync entry points that asserts the effective build/add concurrency is forced to 1 even when config asks for more.
- Add a deterministic exact-self-match / reachability assertion in a CI-stable test if you have a small enough synthetic dataset for it.
- At minimum, add focused tests for:
GetConcurrencyForSingleThreadBuild()NewHnswBuild(...)choosing single-thread modeNewHnswSync(...)overridinghnsw_threads_buildto 1 on the CDC/sync path
I would keep this at request changes until the core workaround is protected by a non-skipped automated regression test, not only by a manual repro file.
What type of PR is this?
Which issue(s) this PR fixes:
issue #24977
What this PR does / why we need it:
fix the bug to keep ThreadsBuild to 1 in all build path. cherry pick from main