Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
881 commits
Select commit Hold shift + click to select a range
8c20654
better plugin integration
cpegeric May 19, 2026
064e545
AlterTableCloneBehavior
cpegeric May 19, 2026
52fea2d
iscp plugin
cpegeric May 19, 2026
90771d8
ivfpq and cagra plugin iscp integration
cpegeric May 19, 2026
618cec0
cuvs sync
cpegeric May 19, 2026
07a30c8
add tests
cpegeric May 19, 2026
c95e057
force sync with ivfpq/cagra and add hook for AlterReIndex
cpegeric May 19, 2026
aba3b13
idxcron integration
cpegeric May 19, 2026
9d7ed3a
idxcron hook
cpegeric May 20, 2026
392fcbe
move cuvs CDC framing into its own pkg/vectorindex/cuvs
cpegeric May 20, 2026
505a188
Merge branch 'gpu_cdc' into gpu_plugin_cuvs
cpegeric May 20, 2026
c0c246a
migrate cagra_ivfpq tablefunc tests to plugin sub-packages
cpegeric May 20, 2026
d6e7b92
migrate IVF-FLAT idxcron body to its plugin sub-package
cpegeric May 20, 2026
10b8b5a
fix(iscp): attach default ResolveVariable to ProcessInitSQL
cpegeric May 20, 2026
ecb1845
Merge branch 'iscp_resolve_variable' into gpu_plugin_cuvs
cpegeric May 20, 2026
14ad4f8
feat(cuvs): small-tail CDC fallback for cagra_create / ivfpq_create
cpegeric May 20, 2026
acffa28
test(idxcron): cover per-algo Updatable wrappers
cpegeric May 20, 2026
f28f73d
feat(cuvs): tag=1 CdcOpHeader for small-data-only search
cpegeric May 20, 2026
ced8132
refactor(cuvs): move colMetaJSON from record to chunk frame header
cpegeric May 20, 2026
6e61f2d
refactor(cuvs): standardize colMetaJSON producer via MarshalColMetaJSON
cpegeric May 20, 2026
2c8a559
feat(executor): explicit IsFrontend signal on proc + Options
cpegeric May 20, 2026
df7ffbd
docs(executor): fix stale IsBackground comment references
cpegeric May 20, 2026
d78a834
chore(plugin): milestone logging across plugin Hook surface
cpegeric May 20, 2026
2a62d76
ivfpq_create sync
cpegeric May 21, 2026
92ab176
fix(compile): propagate IsFrontend through sub-Compiles + relocate De…
cpegeric May 21, 2026
54dbeb8
feat(gpumode): runtime gpu_mode toggle for vector-index dispatch
cpegeric May 21, 2026
ab48d0c
fix(idxcron): skip nil sysvar values in BuildIdxcronMetadata
cpegeric May 21, 2026
33b08ce
fix(idxcron): restore FrontendProbeVar as second-level gate
cpegeric May 21, 2026
cefa641
fix(compile): resolveVariableOrDefault falls back on per-session nil
cpegeric May 21, 2026
bb80c0d
Merge branch 'gpu_async_search' into gpu_plugin_all
cpegeric May 21, 2026
49c59e7
fix(frontend): GetSessionSysVar falls back to gSysVarsDefs default on…
cpegeric May 21, 2026
a3f1300
Merge branch 'fix_session_sysvar_default' into gpu_plugin_all
cpegeric May 21, 2026
d4606b2
test(iscp): remove unused newTestCuvsConsumerInfo helper
cpegeric May 21, 2026
e83bdd3
Merge branch 'gpu_plugin_all' into gpu_plugin_logging
cpegeric May 21, 2026
3c9f93c
more log
cpegeric May 21, 2026
de7321c
bug fix cuvs cdc
cpegeric May 21, 2026
0765e3a
Merge branch 'main' into gpu_async_search
cpegeric May 21, 2026
b035180
Merge branch 'gpu_async_search' into gpu_plugin_all
cpegeric May 21, 2026
005993b
max overflow size for effective brute force index
cpegeric May 21, 2026
adcf739
fix UT
cpegeric May 22, 2026
d7df4c7
fix: HLC is ahead of wall clock. iscp failed to update
cpegeric May 22, 2026
a778880
fix avoid neighbour MAX_INT32 junk and return -1 for invalid neighbou…
cpegeric May 22, 2026
73c9720
bug fix REINDEX with options
cpegeric May 22, 2026
e3d5984
fix ivfpq/cagra idxcron
cpegeric May 22, 2026
a0561f3
merge fix
cpegeric May 28, 2026
a3f1b3b
merge fix
cpegeric May 28, 2026
50c6b3c
gofmt
cpegeric May 28, 2026
3b2193c
revert merge fix
cpegeric May 28, 2026
9eb8ebe
update usearch
cpegeric May 28, 2026
2d2ca92
merge fix - update usearch
cpegeric May 28, 2026
9238a33
merge main fix
cpegeric May 28, 2026
c7b7bf2
fix avoid neighbour MAX_INT32 junk and return -1 for invalid neighbou…
cpegeric May 22, 2026
bfd0c71
support gojieba parser
cpegeric May 28, 2026
b793ffe
remove stringzilla
cpegeric May 28, 2026
7c79170
add stringzilla
cpegeric May 28, 2026
dec8d9c
remove stringzilla
cpegeric May 28, 2026
6d14459
cleanup for final review
cpegeric May 28, 2026
fce1151
exception handling and fix gen_code sm_120 for RTX5070
cpegeric May 28, 2026
22ffc21
dim mismatch guard and cleanup
cpegeric May 28, 2026
1842f40
Merge branch 'gpu_async_search' into gpu_plugin_all
cpegeric May 28, 2026
381ae74
merge fix sql parser
cpegeric May 28, 2026
8658559
merge fix sql parser
cpegeric May 28, 2026
f1a1dd1
Merge branch 'main' into gpu_async_search
cpegeric May 29, 2026
b125361
merge fix
cpegeric May 29, 2026
37d4131
bug fix: index search failed when topk (limit) > index size
cpegeric May 29, 2026
832ffb5
Merge branch 'gpu_async_search' into gpu_plugin_all
cpegeric May 29, 2026
c9be49b
merge fix
cpegeric Jun 2, 2026
c39b380
Merge branch 'main' into gpu_plugin_all
cpegeric Jun 3, 2026
16d77b2
fix: GPU index Load resource leak + concurrent-SQL race on one txn
cpegeric Jun 3, 2026
7cab47e
fix: error out on oversized CDC record instead of silently dropping rows
cpegeric Jun 3, 2026
217a05f
merge fix
cpegeric Jun 3, 2026
4664f5b
merge fix
cpegeric Jun 4, 2026
6e07021
bvt gpu cases
cpegeric Jun 4, 2026
fa4057a
add ut tests
cpegeric Jun 4, 2026
d4cafed
fix concurrent index build/extend & multi gpu simulation
cpegeric Jun 4, 2026
c85d254
merge fix sql parser
cpegeric Jun 4, 2026
8024889
Merge branch 'gpu_plugin_all' into gpu_multi_simulate
cpegeric Jun 4, 2026
e177d04
more tests for DDL
cpegeric Jun 4, 2026
fd54044
fix(cgo/cuvs): don't install the RMM pool as the per-device global de…
cpegeric Jun 5, 2026
2f398e4
feat(indexplugin): catalog hooks for supported vector / PK / include-…
cpegeric Jun 5, 2026
92e4e18
test(cuvs): GPU metric-support test for CAGRA / IVF-PQ
cpegeric Jun 5, 2026
7de3a23
merge fix
cpegeric Jun 5, 2026
212c29d
fix(gpu): plug GPU index leak on commit failure + surface unsupported…
cpegeric Jun 5, 2026
8c30302
cleanup(gpu): drop dead record-shape derivation + refresh plan-C / sn…
cpegeric Jun 5, 2026
219636c
Merge branch 'gpu_plugin_all' into gpu_multi_simulate
cpegeric Jun 5, 2026
32aa19c
fix(cgo/cuvs): key SHARDED delete-bitset cache by rank, not device_id
cpegeric Jun 5, 2026
3ab2acd
chore(cgo/cuvs): serialize brute_force build per device + fix rank-re…
cpegeric Jun 5, 2026
8789d7e
Merge branch 'main' into gpu_multi_simulate
mergify[bot] Jun 8, 2026
1b89044
fix unittest with SupportIncludeColumnType
cpegeric Jun 8, 2026
8b5bce0
Merge branch 'main' into gpu_plugin_all
cpegeric Jun 8, 2026
5c1abc8
Merge branch 'gpu_plugin_all' into gpu_multi_simulate
cpegeric Jun 8, 2026
825aaa2
disable omp for bitmap computation to push QPS
Jun 8, 2026
9b7c74d
Merge branch 'gpu_multi_simulate' of github.com:cpegeric/matrixone in…
Jun 8, 2026
f02e10a
fix: base cuVS small-tail cutoff on non-NULL vector count
cpegeric Jun 8, 2026
171ec8d
fix: make whole-index clone skip an explicit plugin policy
cpegeric Jun 8, 2026
7cb86f1
refactor: route index-plugin SQL through shared sqlquote helper
cpegeric Jun 8, 2026
c3e8c16
fix: CuvsCdcWriter errors on wrong-type vectors instead of silent DELETE
cpegeric Jun 8, 2026
4e4a246
test: CPU rejection of GPU-only vector indexes (IVF-PQ, CAGRA)
cpegeric Jun 8, 2026
91bd412
test: remove build-specific ivfpq/cagra CPU-rejection case
cpegeric Jun 8, 2026
ddc4e9a
test: CPU coverage for index-plugin dispatch + experimental flags
cpegeric Jun 8, 2026
7a80cb9
test: GPU snapshot/restore coverage for IVF-PQ and CAGRA indexes
cpegeric Jun 8, 2026
4d3fc8b
test: account-level RESTORE coverage for IVF-PQ / CAGRA indexes
cpegeric Jun 8, 2026
fe319e6
index plugin framework with Restore
cpegeric Jun 8, 2026
154ec00
fix sca missing imterface
cpegeric Jun 9, 2026
6da8975
merge fix sql parser
cpegeric Jun 9, 2026
07a7f2c
restore: rebuild indexes with create-time session vars + plugin Resto…
cpegeric Jun 9, 2026
a539c70
vectorindex: promote kmeans/max_index_capacity to CREATE INDEX params
cpegeric Jun 10, 2026
c2e3143
vectorindex/idxcron: read build params from algo_params, not Metadata
cpegeric Jun 10, 2026
469a0e4
vectorindex: stop capturing experimental_*_index for background rebuilds
cpegeric Jun 10, 2026
9432ba0
merge fix remove index capacity
cpegeric Jun 10, 2026
5e7ef74
fix bvt test and gofmt
cpegeric Jun 10, 2026
1b51f3e
merge fix sql parser
cpegeric Jun 10, 2026
52d09b1
sql/plan: remove dead cagra/ivfpq table-function builders
cpegeric Jun 10, 2026
e2ca8ee
test(idxcron): prove clone/restore registers mo_index_update row
cpegeric Jun 11, 2026
f7c2a37
feat(vector): add vecbf16/vecf16/vecint8 narrow vector column types
cpegeric Jun 11, 2026
bb33a5e
Merge origin/archsimd into vecf16 (HEAD-preferred; keep SIMD only)
cpegeric Jun 11, 2026
efee5c1
Makefile: thread $(GO) and $(GOEXPERIMENT_OPT) for the archsimd SIMD …
cpegeric Jun 11, 2026
44c6705
metric: restore distance_func_f32_test.go from archsimd
cpegeric Jun 11, 2026
587a731
metric: extract resolvers + GoPairWiseDistance into untagged resolve.go
cpegeric Jun 11, 2026
90785ed
add go tag
cpegeric Jun 11, 2026
984508f
metric: pure-Go narrow distance kernels (bf16/f16/int8)
cpegeric Jun 11, 2026
d7a0b82
metric: benchmark narrow kernels vs f32/f64
cpegeric Jun 11, 2026
64f49e3
metric: concrete bf16/f16 kernels (inline decode, ~3.5x faster)
cpegeric Jun 11, 2026
4b3ac13
metric: branchless f16 decode (f16fast), exhaustively verified
cpegeric Jun 11, 2026
1cc58ee
metric: AVX-512 SIMD kernels for bf16 / int8 narrow distance (f16 mea…
cpegeric Jun 11, 2026
98ee77d
ivfflat: support narrow vector types (bf16/f16/int8), f32 centroids
cpegeric Jun 11, 2026
6ebfcee
Merge branch 'vecf16' of github.com:cpegeric/matrixone into vecf16
cpegeric Jun 11, 2026
d1c9644
metric: AVX-512 SIMD kernels for bf16 / f16 / int8 narrow distance
cpegeric Jun 11, 2026
b596836
Merge branch 'vecf16' of github.com:cpegeric/matrixone into vecf16
cpegeric Jun 11, 2026
1652497
Merge branch 'vecf16' of github.com:cpegeric/matrixone into vecf16
cpegeric Jun 11, 2026
790b209
metric: AVX2 fallback tier for narrow distance kernels (bf16/f16/int8)
cpegeric Jun 11, 2026
c261aa8
Merge branch 'vecf16' of github.com:cpegeric/matrixone into vecf16
cpegeric Jun 11, 2026
629dc5f
ivfflat: QUANTIZATION= option (int8/float16 entries), narrow vector t…
cpegeric Jun 11, 2026
0b1d34b
fix: materialize narrow vector constants (bf16/f16/int8) in const exe…
cpegeric Jun 11, 2026
1269d26
ivfflat: int8 scalar quantizer (trained global symmetric scale)
cpegeric Jun 11, 2026
034decc
ivfflat: int8 quantizer -> cuVS-style asymmetric (min,max), full int8…
cpegeric Jun 11, 2026
ef2e813
ivfflat: QUANTIZATION supports any base (incl. f64) down-cast to f32/…
cpegeric Jun 11, 2026
1a79948
test: unit tests for quantization helpers, ParamsFromTree, trainInt8M…
cpegeric Jun 11, 2026
132987e
test(bvt): ivfflat narrow base + QUANTIZATION down-cast end-to-end
cpegeric Jun 11, 2026
7831616
test: narrow vector top-k scan + constant-fold unit coverage
cpegeric Jun 11, 2026
508f2df
test: narrow kernel edge cases + mo-tester-verified BVT result
cpegeric Jun 11, 2026
82b0892
test: more quantizer edge cases (param ranges, trainInt8MinMax)
cpegeric Jun 11, 2026
739c9ef
test: complete narrow-array support in function-UT framework + VecFro…
cpegeric Jun 11, 2026
f1fa2a7
narrow vec: enable ORDER BY / GROUP BY (partition + group key width) …
cpegeric Jun 11, 2026
27cc69b
fix(ivf-quant): adversarial-review fixes for int8 quantization
cpegeric Jun 12, 2026
6a8c1e1
test(ivf-quant): DDL/maintenance matrix BVT for int8 quantization
cpegeric Jun 12, 2026
67f8014
test(ivf-quant): async ISCP/CDC BVT for int8 quantization
cpegeric Jun 12, 2026
35b55a1
refactor(ivf-quant): extract quantizer into pkg/vectorindex/quantizer
cpegeric Jun 12, 2026
5e2b1a7
types: named constants for the vec* SQL type names
cpegeric Jun 12, 2026
47b18a0
fix(show-create): include dimension for vecbf16/vecf16/vecint8 columns
cpegeric Jun 12, 2026
edd8bfb
test(ivf-quant): cover bf16 and float16 in the async ISCP BVT
cpegeric Jun 12, 2026
99244ad
feat(vector): add vecuint8 narrow type + uint8 ivfflat quantization
cpegeric Jun 12, 2026
500b39c
test(ivf-quant): cover uint8 in the async ISCP BVT
cpegeric Jun 12, 2026
dedb12d
test(vecuint8): parser round-trip + SHOW CREATE dimension
cpegeric Jun 12, 2026
142dae6
fix(vecuint8): handle uint8 in VecFromBase64 (divide-by-zero panic)
cpegeric Jun 12, 2026
827f24b
test(vecuint8): uint8 unit coverage + make.go const-expr helper
cpegeric Jun 12, 2026
6cf5836
test(ivf-quant): drop ORDER BY tiebreaker so the ivf index pushdown f…
cpegeric Jun 12, 2026
00aa210
fix(ivf-pushdown): allow narrow-base columns to use the ivf index
cpegeric Jun 12, 2026
6d9bb38
feat(vecuint8): archsimd AVX-512/AVX2 distance kernels for vecuint8
cpegeric Jun 12, 2026
b03d472
fix: kmean concurrent fit() crashed.
cpegeric Jun 12, 2026
f64f794
fix(vecnarrow): support narrow vector types in LOAD/external import p…
cpegeric Jun 12, 2026
641f286
Merge branch 'vecf16' of github.com:cpegeric/matrixone into vecf16
cpegeric Jun 12, 2026
a963208
fix(vecnarrow): support narrow vector types in LOAD/external import p…
cpegeric Jun 12, 2026
5306cf0
Merge branch 'vecf16' of github.com:cpegeric/matrixone into vecf16
cpegeric Jun 12, 2026
9f55e65
refactor(vectorindex): widen BruteForce + Pairwise to ArrayElement
cpegeric Jun 12, 2026
55e95af
fix(gpu-kmeans): always cluster in L2, not the search metric
cpegeric Jun 12, 2026
0d491f2
test(ivfpq-async): expect async keyword in SHOW CREATE baseline
cpegeric Jun 12, 2026
46bcabc
fix(metric): clamp cosine similarity to [-1,1] in SIMD kernels
cpegeric Jun 12, 2026
500b466
fix(frontend): render narrow vectors correctly on the export/GetValue…
cpegeric Jun 12, 2026
b532610
test(restore): execution-level clone/restore BVTs for ivfflat + fulltext
cpegeric Jun 15, 2026
e02718b
fix(clone): clone all hidden tables of a multi-table index
cpegeric Jun 15, 2026
59ea7b8
wiki_all 1M benchmark
cpegeric Jun 15, 2026
a97ba76
perf(fileservice): coalesce FileWithChecksum.ReadAt into 128KB reads
cpegeric Jun 15, 2026
49b7a0e
perf(ivfflat): skip memory-cache writes during index build source scan
cpegeric Jun 15, 2026
b89d04e
perf(vector): batch varlena materialization in UnionBatch/union paths
cpegeric Jun 16, 2026
4844e53
perf(function): route narrow int8/uint8 SQL distance through native m…
cpegeric Jun 16, 2026
25499af
feat(ivfflat): reject upcasting QUANTIZATION on narrow base columns
cpegeric Jun 16, 2026
beffb92
perf(vector): extend UnionBatch full-append fast path to null/groupin…
cpegeric Jun 16, 2026
79a567f
merge fix
cpegeric Jun 16, 2026
7c7b274
Merge gpu_plugin_all into vecf16
cpegeric Jun 16, 2026
11a537d
perf(fileservice): coalesce FileWithChecksum.ReadAt into 128KB reads
cpegeric Jun 15, 2026
6b0bfef
perf(vector): batch varlena materialization in UnionBatch/union paths
cpegeric Jun 16, 2026
171c935
perf(vector): extend UnionBatch full-append fast path to null/groupin…
cpegeric Jun 16, 2026
637827e
update doc
cpegeric Jun 16, 2026
7a3d835
fix(vector): bound UnionBatch fast-path bitmap propagation to [0,cnt)
cpegeric Jun 16, 2026
209d42b
Merge refactor_vector_fs into vecf16
cpegeric Jun 16, 2026
f45d61c
fix(function): support vector_dims on narrow vector types (bf16/f16/i…
cpegeric Jun 16, 2026
a7925f6
test(vector): BVT for vector_dims on narrow types + QUANTIZATION upca…
cpegeric Jun 16, 2026
33daf64
docs(vecf16): corrected 4-quadrant benchmark + cold-scan profile
cpegeric Jun 16, 2026
9f6daf0
go fmt
cpegeric Jun 16, 2026
5626bfe
fix(vector): exclude null rows from varlena pre-grow reservation
cpegeric Jun 16, 2026
678dac5
feat(reindex): honor per-index build options on ALTER ... REINDEX
cpegeric Jun 17, 2026
5309da5
Merge branch 'refactor_vector_fs' into vecf16
cpegeric Jun 17, 2026
a5d7882
Merge branch 'gpu_plugin_all' into vecf16
cpegeric Jun 17, 2026
305370f
feat(reindex): support quantization on ALTER ... REINDEX (per-backend)
cpegeric Jun 17, 2026
7fbf2fa
Merge branch 'main' into vecf16
mergify[bot] Jun 17, 2026
fc07b16
style: gofmt 3 narrow/bf16 test files (fix CI lint)
cpegeric Jun 17, 2026
bac1a26
Merge branch 'vecf16' of github.com:cpegeric/matrixone into vecf16
cpegeric Jun 17, 2026
2693a8f
test(vecnarrow): stabilize narrow-vector distance tests across platforms
cpegeric Jun 17, 2026
a75fa68
feat(cuvs/cpp): native f16 (half) + half-source quantize for ivf_pq/c…
cpegeric Jun 17, 2026
e00b68d
feat(cuvs/go): half-quantize bindings + SearchQuantizeHalf
cpegeric Jun 17, 2026
b5a7d91
feat(ivfpq,cagra): accept vecf16 base column + downcast-only QUANTIZA…
cpegeric Jun 17, 2026
be68c6b
feat(ivfpq,cagra): native f16 build wrappers (Add / AddChunk / AddQua…
cpegeric Jun 17, 2026
fb081c3
feat(ivfpq,cagra): f16 search — native half + half-source query quantize
cpegeric Jun 17, 2026
2e4b4e2
feat(table_function): vecf16 ivfpq/cagra create + search wiring
cpegeric Jun 17, 2026
1ae02a1
docs(cuvs_float16): plan for f16 base + native int8/uint8 quantization
cpegeric Jun 17, 2026
006e127
feat(ivfpq,cagra): [B,Q] base-typed overflow for quantized indexes (4c)
cpegeric Jun 18, 2026
6561e78
fix(ivfpq,cagra): emit parttype in search tblcfg for f16 base dispatch
cpegeric Jun 18, 2026
bbee610
fix(vectorindex): feed half overflow natively + guard empty bf query
cpegeric Jun 18, 2026
65ca6d7
Merge branch 'main' into vecf16
cpegeric Jun 18, 2026
13e6d35
feat(ivfpq,cagra): native base-type CDC overflow (no f32 detour)
cpegeric Jun 18, 2026
f0d1023
Merge branch 'vecf16' into cuvs_quantize
cpegeric Jun 18, 2026
807715a
test(ivfpq,cagra): f16 base DDL validation + GPU functional BVT
cpegeric Jun 18, 2026
1a42933
refactor(cuvs): unify base/storage into [B,Q] template, collapse dual…
cpegeric Jun 18, 2026
6d81fd4
fix(cuvs): train scalar quantizer on a strided sample across all rows
cpegeric Jun 18, 2026
ea2b9d8
test(gpu): refresh gpu_cases algo_params .result for narrowed session…
cpegeric Jun 18, 2026
863a207
Merge branch 'vecf16' into cuvs_quantize
cpegeric Jun 18, 2026
d17876c
perf(cuvs): CPU scalar quantization for int8/uint8 IVF-PQ build
cpegeric Jun 19, 2026
e134e61
refactor(cuvs): wire MultiGpuIvfFlat to [B,Q] + doc Search overflow c…
cpegeric Jun 19, 2026
83c12ef
refactor(cuvs): ivf_flat [B,Q] template — base type + storage type
cpegeric Jun 19, 2026
5a8dd21
refactor(cuvs/go): GpuIvfFlat[B,Q] bindings — base + storage type
cpegeric Jun 19, 2026
3d1776a
refactor(cuvs): base-typed search/add (search_quantize) + [B,Q] acros…
cpegeric Jun 22, 2026
9b0631d
feat(cuvs): vecf16 base CDC/incremental ingestion (ISCP) + BVT tests
cpegeric Jun 22, 2026
a92b46e
fix(ivfflat/iscp): native narrow base columns over the CDC delta path
cpegeric Jun 22, 2026
d2cb27d
fix(engine): handle narrow vector arrays in TAE value/search paths + UTs
cpegeric Jun 22, 2026
3e91521
fix(cuvs): reject QUANTIZATION 'bf16' instead of silently building f32
cpegeric Jun 22, 2026
18d1a1e
refactor(cuvs): remove dead SearchFloat32 / SearchFloat32Async chain
cpegeric Jun 22, 2026
b5bc71a
refactor(cuvs): unify non-filter search to base-typed SearchQuantize(…
cpegeric Jun 22, 2026
2651b38
refactor(cuvs): base-typed add/search unification + allocation-free A…
cpegeric Jun 22, 2026
384d711
fix(cuvs): break-review bugs for f16/quantized GPU indexes + BVTs
cpegeric Jun 22, 2026
2f66dfd
refactor(indexplugin): one home for (quantization, op_type) validation
cpegeric Jun 22, 2026
9d1fea4
reduce the rmm pool to 2%
cpegeric Jun 23, 2026
6548be8
Merge branch 'main' into cuvs_quantize
cpegeric Jun 23, 2026
4da9dac
Merge branch 'main' into cuvs_quantize
mergify[bot] Jun 23, 2026
b199423
gofmt
cpegeric Jun 23, 2026
5bda008
Merge branch 'cuvs_quantize' of github.com:cpegeric/matrixone into cu…
cpegeric Jun 23, 2026
54214e5
test(iscp): update RejectsVecF64 assertion to new vecf32/vecf16 message
cpegeric Jun 23, 2026
fa51e52
fix(index): evict vector-index search cache on DROP INDEX
cpegeric Jun 24, 2026
a1f5cf1
Merge branch 'main' into cuvs_quantize
cpegeric Jun 24, 2026
a306108
fix(index): evict HNSW search cache on DROP INDEX too
cpegeric Jun 24, 2026
1f229b0
fix(plan): allow vector columns in vector indexes via plugin hook (fi…
cpegeric Jun 24, 2026
f069bf9
test(gpu): cover ivfpq/cagra post-filter on a NON-INCLUDE column
cpegeric Jun 25, 2026
5950e59
fix(index): never auto-train SQ quantizer from flattened_host_dataset
cpegeric Jun 25, 2026
296ee34
fix(hnsw): restore multi-threaded build; vendor USearch #735 fix
cpegeric Jun 25, 2026
eea1365
fix(hnsw): restore multi-threaded build; vendor USearch #735 fix
cpegeric Jun 25, 2026
ff5ae42
Merge branch 'usearch_build_fix' of github.com:cpegeric/matrixone int…
cpegeric Jun 25, 2026
24afe65
Merge branch 'main' into usearch_build_fix
mergify[bot] Jun 25, 2026
12e13a1
Merge branch 'usearch_build_fix' of github.com:cpegeric/matrixone int…
cpegeric Jun 25, 2026
2a267aa
Merge branch 'usearch_build_fix' of github.com:cpegeric/matrixone int…
cpegeric Jun 25, 2026
0bc7f86
Merge branch 'usearch_build_fix' into cuvs_quantize
cpegeric Jun 25, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
22 changes: 19 additions & 3 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -48,6 +48,12 @@
# % cd matrixone
# % MO_CL_CUDA=1 make

# Go toolchain (override with `make GO=/path/to/go1.26rc1 ...` for the archsimd
# SIMD build); defaults to `go`.
ifeq ($(GO),)
GO=go
endif

# where am I
ROOT_DIR = $(shell dirname $(realpath $(lastword $(MAKEFILE_LIST))))
BIN_NAME := mo-service
Expand Down Expand Up @@ -185,6 +191,16 @@ DEBUG_OPT :=
CGO_DEBUG_OPT :=
TAGS :=

# Env-var prefix for the build command. Pass GOEXPERIMENT/GOAMD64 here for the
# archsimd SIMD build, e.g.:
# make GO=/path/go1.26rc1 GOEXPERIMENT_OPT="GOEXPERIMENT=simd" GOAMD64=v3 build
GOEXPERIMENT_OPT ?=
ifeq ("$(UNAME_M)", "x86_64")
ifneq ($(GOAMD64),)
GOEXPERIMENT_OPT += GOAMD64=$(GOAMD64)
endif
endif

ifeq ($(MO_CL_CUDA),1)
ifeq ($(CONDA_PREFIX),)
$(error CONDA_PREFIX env variable not found.)
Expand Down Expand Up @@ -235,7 +251,7 @@ jieba-dict:
.PHONY: build
build: config cgo thirdparties jieba-dict
$(info [Build binary])
$(CGO_OPTS) go build $(TAGS) $(RACE_OPT) $(GOLDFLAGS) $(DEBUG_OPT) $(GOBUILD_OPT) -o $(BIN_NAME) ./cmd/mo-service
$(GOEXPERIMENT_OPT) $(CGO_OPTS) $(GO) build $(TAGS) $(RACE_OPT) $(GOLDFLAGS) $(DEBUG_OPT) $(GOBUILD_OPT) -o $(BIN_NAME) ./cmd/mo-service

# https://wiki.musl-libc.org/getting-started.html
# https://musl.cc/
Expand Down Expand Up @@ -265,13 +281,13 @@ musl: override TAGS := -tags musl
musl: musl-install musl-cgo config musl-thirdparties jieba-dict
musl:
$(info [Build binary(musl)])
$(CGO_OPTS) go build $(TAGS) $(RACE_OPT) $(GOLDFLAGS) $(DEBUG_OPT) $(GOBUILD_OPT) -o $(BIN_NAME) ./cmd/mo-service
$(GOEXPERIMENT_OPT) $(CGO_OPTS) $(GO) build $(TAGS) $(RACE_OPT) $(GOLDFLAGS) $(DEBUG_OPT) $(GOBUILD_OPT) -o $(BIN_NAME) ./cmd/mo-service

# build mo-tool
.PHONY: mo-tool
mo-tool: config cgo thirdparties
$(info [Build mo-tool tool])
$(CGO_OPTS) go build $(GOLDFLAGS) -o mo-tool ./cmd/mo-tool
$(GOEXPERIMENT_OPT) $(CGO_OPTS) $(GO) build $(GOLDFLAGS) -o mo-tool ./cmd/mo-tool

# build mo-service binary for debugging with go's race detector enabled
# produced executable is 10x slower and consumes much more memory
Expand Down
13 changes: 13 additions & 0 deletions cgo/cuvs/Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -128,6 +128,19 @@ test_kmeans: obj/test/test_kmeans.o $(OBJS)
@echo "Linking $@"
$(NVCC) $(LDFLAGS) $^ $(LIBS) -o $@

# Standalone reproducer for the uint8 quantization recall collapse
# (f32->uint8 and f16->uint8). Has its own main(); runs in isolation
# rather than as part of the full test_cuvs_worker suite.
uint8_quant_bug: obj/test/uint8_quant_bug.o $(OBJS)
@echo "Linking $@"
$(NVCC) $(LDFLAGS) $^ $(LIBS) -o $@

# Real-dataset (wiki_all_1M) version of the uint8 collapse reproducer. Loads the
# .fbin base/queries + .ibin ground truth and grades recall@k at 1M scale.
wiki1m_uint8_bug: obj/test/wiki1m_uint8_bug.o $(OBJS)
@echo "Linking $@"
$(NVCC) $(LDFLAGS) $^ $(LIBS) -o $@

# Standalone reproducer for the cuvs::neighbors::dynamic_batching deadlock
# (conservative_dispatch=true). Intentionally depends on nothing in this
# project — links only against the cuVS / RAFT / RMM libraries we already
Expand Down
89 changes: 51 additions & 38 deletions cgo/cuvs/brute_force.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -102,7 +102,7 @@ namespace matrixone {
// is_loaded_=true, then clears flattened_host_dataset to free memory.
// The built index holds a device pointer to the dataset via
// dataset_device_ptr_ (shared_ptr kept alive by index_).
// 4. search() / search_float() — dispatched via submit() (round-robin, but with
// 4. search() / search_quantize() — dispatched via submit() (round-robin, but with
// SINGLE_GPU there is only one device).
// 5. Destructor calls destroy() which stops the worker and resets index_.
//
Expand All @@ -125,7 +125,7 @@ namespace matrixone {
// search_internal() holds a shared_lock during the GPU search call (read-only
// access to index_). This is fine because brute_force is SINGLE_GPU and has no
// concurrent extend path.
// search_float_internal() converts float queries to T on the device before
// search_quantize_internal() converts base (B) queries to T on the device before
// searching (quantize for 1-byte T, half-cast for T=half, direct for T=float).
//
// SOFT-DELETE BITSET
Expand Down Expand Up @@ -160,15 +160,21 @@ struct brute_force_search_result_t {
/**
* @brief gpu_brute_force_t implements a Brute Force index that can run on a single GPU.
*/
template <typename T>
class gpu_brute_force_t : public gpu_index_base_t<T, brute_force_build_params_t, int64_t> {
// [B,Q] design (B = base/query element type, T = storage element type), mirroring
// gpu_cagra_t / gpu_ivf_pq_t. For the unquantized cases B==T; the overflow of a
// quantized index stores T (e.g. half) while the base/query is B (e.g. float),
// so search_quantize() converts B -> T (cast for f32->f16, learned SQ for 1-byte).
template <typename B, typename T>
class gpu_brute_force_t : public gpu_index_base_t<B, T, brute_force_build_params_t, int64_t> {
public:
using base_type = B;
using storage_type = T;
// We force DistT=float for all our indices to avoid template bloat and satisfy cuVS
using brute_force_index = cuvs::neighbors::brute_force::index<T, float>;
using search_result_t = brute_force_search_result_t;
// Inherited dependent type — bring into scope so search_internal can take a
// const host_mask_bundle_t* parameter without `typename Base::...` everywhere.
using host_mask_bundle_t = typename gpu_index_base_t<T, brute_force_build_params_t, int64_t>::host_mask_bundle_t;
using host_mask_bundle_t = typename gpu_index_base_t<B, T, brute_force_build_params_t, int64_t>::host_mask_bundle_t;

// Internal index storage
std::unique_ptr<brute_force_index> index_;
Expand Down Expand Up @@ -536,87 +542,93 @@ class gpu_brute_force_t : public gpu_index_base_t<T, brute_force_build_params_t,
return search_res;
}

// Sync float entry — wraps search_float_async + search_wait.
search_result_t search_float(const float* queries_data, uint64_t num_queries, uint32_t query_dimension, uint32_t limit, const brute_force_search_params_t& sp) {
uint64_t job_id = this->search_float_async(queries_data, num_queries, query_dimension, limit, sp);
// Sync quantize entry — wraps search_quantize_async + search_wait. The query
// is the BASE type B; search_quantize_internal converts it to storage T.
search_result_t search_quantize(const B* queries_data, uint64_t num_queries, uint32_t query_dimension, uint32_t limit, const brute_force_search_params_t& sp) {
uint64_t job_id = this->search_quantize_async(queries_data, num_queries, query_dimension, limit, sp);
return this->search_wait(job_id);
}

uint64_t search_float_async(const float* queries_data, uint64_t num_queries, uint32_t query_dimension, uint32_t limit, const brute_force_search_params_t& sp) {
if constexpr (std::is_same_v<T, float>) return search_async(queries_data, num_queries, query_dimension, limit, sp);
uint64_t search_quantize_async(const B* queries_data, uint64_t num_queries, uint32_t query_dimension, uint32_t limit, const brute_force_search_params_t& sp) {
if constexpr (std::is_same_v<T, B>) return search_async(queries_data, num_queries, query_dimension, limit, sp);
if (!queries_data) throw std::invalid_argument("search_async: queries_data is null");
if (num_queries == 0) throw std::invalid_argument("search_async: num_queries is 0");
if (this->dimension == 0) throw std::runtime_error("search_async: index dimension is 0");
// Reject a mismatched caller dim instead of silently coercing to
// this->dimension. search_float_internal sizes its H2D extent by
// this->dimension. search_quantize_internal sizes its H2D extent by
// this->dimension; if caller's query_dimension differed we'd either
// OOB-read or under-copy host queries. Fail loudly so the caller bug
// surfaces here rather than as wrong search results.
if (query_dimension != this->dimension) {
throw std::invalid_argument(
"search_float_async: query_dimension (" + std::to_string(query_dimension) +
"search_quantize_async: query_dimension (" + std::to_string(query_dimension) +
") does not match index dimension (" + std::to_string(this->dimension) + ")");
}
if (!this->is_loaded_ || !index_) throw std::runtime_error("search_async: index not loaded");

auto queries_copy = std::make_shared<std::vector<float>>(queries_data, queries_data + num_queries * query_dimension);
auto queries_copy = std::make_shared<std::vector<B>>(queries_data, queries_data + num_queries * query_dimension);

auto task = [this, num_queries, query_dimension, limit, sp, queries_copy](raft_handle_wrapper_t& handle) -> std::any {
return this->search_float_internal(handle, queries_copy->data(), num_queries, query_dimension, limit, sp);
return this->search_quantize_internal(handle, queries_copy->data(), num_queries, query_dimension, limit, sp);
};
return this->worker->submit(task);
}

// Sync float filtered entry — wraps search_float_with_filter_async + search_wait.
search_result_t search_float_with_filter(const float* queries_data, uint64_t num_queries,
// Sync quantize filtered entry — wraps search_quantize_with_filter_async + search_wait.
search_result_t search_quantize_with_filter(const B* queries_data, uint64_t num_queries,
uint32_t query_dimension, uint32_t limit,
const brute_force_search_params_t& sp,
const std::string& preds_json) {
uint64_t job_id = this->search_float_with_filter_async(queries_data, num_queries, query_dimension, limit, sp, preds_json);
uint64_t job_id = this->search_quantize_with_filter_async(queries_data, num_queries, query_dimension, limit, sp, preds_json);
return this->search_wait(job_id);
}

// Async variant of search_float_with_filter. Brute force is single-GPU
// Async variant of search_quantize_with_filter. Brute force is single-GPU
// only, so the bitmap eval stays on the calling thread and the GPU
// search goes through worker->submit so concurrent calls can be
// auto-batched in the device queue. Used by the multi-index brute-force
// fallback so it dispatches in parallel with the primary IVF/CAGRA shards.
uint64_t search_float_with_filter_async(const float* queries_data, uint64_t num_queries,
// The query is the BASE type B; search_quantize_internal converts it to T.
uint64_t search_quantize_with_filter_async(const B* queries_data, uint64_t num_queries,
uint32_t query_dimension, uint32_t limit,
const brute_force_search_params_t& sp,
const std::string& preds_json) {
if (!queries_data) throw std::invalid_argument("search_async: queries_data is null");
if (num_queries == 0) throw std::invalid_argument("search_async: num_queries is 0");
if (this->dimension == 0) throw std::runtime_error("search_async: index dimension is 0");
// See search_float_async() above for the rationale.
// See search_quantize_async() above for the rationale.
if (query_dimension != this->dimension) {
throw std::invalid_argument(
"search_float_with_filter_async: query_dimension (" + std::to_string(query_dimension) +
"search_quantize_with_filter_async: query_dimension (" + std::to_string(query_dimension) +
") does not match index dimension (" + std::to_string(this->dimension) + ")");
}
if (!this->is_loaded_ || !index_) throw std::runtime_error("search_async: index not loaded");
if (!this->worker) throw std::runtime_error("Worker not initialized");

auto queries_copy = std::make_shared<std::vector<float>>(queries_data, queries_data + num_queries * query_dimension);
auto queries_copy = std::make_shared<std::vector<B>>(queries_data, queries_data + num_queries * query_dimension);
auto mask = this->build_filter_single_mask(preds_json);
auto task = [this, num_queries, query_dimension, limit, sp, queries_copy, mask](raft_handle_wrapper_t& handle) -> std::any {
return this->search_float_internal(handle, queries_copy->data(), num_queries, query_dimension, limit, sp, /*preds_json=*/"", mask.get());
return this->search_quantize_internal(handle, queries_copy->data(), num_queries, query_dimension, limit, sp, /*preds_json=*/"", mask.get());
};
return this->worker->submit(task);
}

// See search_internal() above for the prebuilt-bundle contract; identical
// semantics here (off-worker CPU mask eval, skip queries-H2D sync_stream
// when prebuilt is non-null).
search_result_t search_float_internal(raft_handle_wrapper_t& handle, const float* queries_data, uint64_t num_queries, uint32_t /*query_dimension*/, uint32_t limit, const brute_force_search_params_t& /*sp*/, const std::string& /*preds_json*/ = "", const host_mask_bundle_t* prebuilt = nullptr) {
// Converts the BASE-typed (B) query to storage T on-device, then runs the
// exact brute-force search — see the cagra search_quantize_internal comment.
// B==T copies straight; sizeof(T)==1 quantizes B -> int8/uint8; the remaining
// (B=float, T=half) instantiation casts f32 -> f16.
search_result_t search_quantize_internal(raft_handle_wrapper_t& handle, const B* queries_data, uint64_t num_queries, uint32_t /*query_dimension*/, uint32_t limit, const brute_force_search_params_t& /*sp*/, const std::string& /*preds_json*/ = "", const host_mask_bundle_t* prebuilt = nullptr) {
// Same snapshot pattern as search_internal — see comment there.
const brute_force_index* local_index = nullptr;
uint64_t local_count = 0;
uint64_t local_deleted_count = 0;
{
std::shared_lock<std::shared_mutex> lock(this->mutex_);
if (!this->is_loaded_ || !this->index_) {
throw std::runtime_error("search_float_internal: index not loaded");
throw std::runtime_error("search_quantize_internal: index not loaded");
}
local_index = this->index_.get();
local_count = this->count;
Expand All @@ -627,19 +639,20 @@ class gpu_brute_force_t : public gpu_index_base_t<T, brute_force_build_params_t,

auto q_dev_t = raft::make_device_matrix<T, int64_t>(*res, num_queries, this->dimension);

if constexpr (std::is_same_v<T, float>) {
raft::copy(*res, q_dev_t.view(), raft::make_host_matrix_view<const float, int64_t>(queries_data, num_queries, this->dimension));
if constexpr (std::is_same_v<T, B>) {
// B == T: no conversion.
raft::copy(*res, q_dev_t.view(), raft::make_host_matrix_view<const T, int64_t>(queries_data, num_queries, this->dimension));
} else if constexpr (sizeof(T) == 1) {
// sizeof(T) == 1: quantize the base-typed query B -> int8/uint8.
if (!this->quantizer_.is_trained()) throw std::runtime_error("Quantizer not trained");
auto q_dev_b = raft::make_device_matrix<B, int64_t>(*res, num_queries, this->dimension);
raft::copy(*res, q_dev_b.view(), raft::make_host_matrix_view<const B, int64_t>(queries_data, num_queries, this->dimension));
this->quantizer_.template transform<T>(*res, q_dev_b.view(), q_dev_t.data_handle(), true);
} else {
auto q_dev_f = raft::make_device_matrix<float, int64_t>(*res, num_queries, this->dimension);
raft::copy(*res, q_dev_f.view(), raft::make_host_matrix_view<const float, int64_t>(queries_data, num_queries, this->dimension));

if constexpr (sizeof(T) == 1) {
if (!this->quantizer_.is_trained()) throw std::runtime_error("Quantizer not trained");
this->quantizer_.template transform<T>(*res, q_dev_f.view(), q_dev_t.data_handle(), true);
} else {
// T is half
raft::copy(*res, q_dev_t.view(), q_dev_f.view());
}
// B != T and sizeof(T) != 1: (B=float, T=half) — cast f32 -> f16 on-device.
auto q_dev_b = raft::make_device_matrix<B, int64_t>(*res, num_queries, this->dimension);
raft::copy(*res, q_dev_b.view(), raft::make_host_matrix_view<const B, int64_t>(queries_data, num_queries, this->dimension));
raft::copy(*res, q_dev_t.view(), q_dev_b.view());
}
// Legacy path syncs so the deletes-only sync_device_bitset below can
// drain on the same stream. Prebuilt path skips: bitset H2D queues
Expand Down Expand Up @@ -729,7 +742,7 @@ class gpu_brute_force_t : public gpu_index_base_t<T, brute_force_build_params_t,
}

std::string info() const override {
std::string json = gpu_index_base_t<T, brute_force_build_params_t, int64_t>::info();
std::string json = gpu_index_base_t<B, T, brute_force_build_params_t, int64_t>::info();
json += ", \"type\": \"Brute-Force\", \"brute_force\": {";
if (index_) json += "\"built\": true";
else json += "\"built\": false";
Expand Down
Loading
Loading