IVFFLAT/IVFPQ/CAGRA support bf16, float16, int8 and uint8 quantization by cpegeric · Pull Request #25095 · matrixorigin/matrixone

cpegeric · 2026-06-23T14:42:27Z

What type of PR is this?

Which issue(s) this PR fixes:

What this PR does / why we need it:

IVFFLAT support bf16, float16, int8 and uint8 quantization
bug fix GPU concurrent Kmeans clustering
IVFPQ/CAGRA support float32, float16 as base type and its quantization int8 and uint8
bug fix slow int8 quantization in GPU. moved to CPU computation
basic array function for bf16, float16, int8, and uint8

The CDC chunk framing, event-record codec, and replay helpers shipped in pkg/vectorindex/cuvs_cdc.go are cuvs-specific (CAGRA + IVF-PQ); no general vectorindex code uses them. Lift the file into pkg/vectorindex/cuvs/ so future cuvs-shared helpers have a natural home and so the parent vectorindex surface narrows. Pure code-motion: cagra/ivfpq sync/search/cdc-load now reach the symbols via the cuvscdc alias (renaming to dodge the existing pkg/cuvs GPU-bindings package).

Resolves rename collisions: gpu_cdc moves cuvs_cdc.go into the new pkg/vectorindex/cuvs sub-package (cuvscdc alias), while gpu_plugin_cuvs added CagraSync/IvfpqSync.AppendRecords and the iscp CuvsCdcWriter + cuvs/idxcron.CuvsUpdatable, all of which referenced the old vectorindex.X symbols. Updated those call sites to the new alias.

The plugin refactor lifted getCagraParams / getIvfpqParams / buildCagraCreate / buildCagraSearch / buildIvfpqCreate / buildIvfpqSearch out of (*QueryBuilder) onto package-level functions in pkg/vectorindex/{cagra,ivfpq}/plugin/plan/tablefunc.go, but pkg/sql/plan/cagra_ivfpq_test.go was left calling the old methods — breaking GPU vet on pkg/sql/plan/... Port the suite to both plugin sub-packages using a minimal planplugin.PlanBuilder stub. The build* paths only consult GetContext / GenNewBindTag / AppendNode, so the per-algo ApplyForSort / CanApply redirects panic in the stub. Each test file also wires planplugin.DeepCopyColDefList as a shallow pass-through (production wires it from pkg/sql/plan, which can't be imported here without a cycle).

Widen the Updatable hook contract to take an UpdatableInput struct (sqlproc, tableDef, indexName, metadata, createdAt, lastUpdateAt, interval) so per-algo hooks can own their full rebuild gate. Move IVF-FLAT's lists/nsample heuristic + kmeans_train_percent mutation out of (*IndexUpdateTaskInfo).checkIndexUpdatable into the plugin hook; the executor's universal pre-checks (auto_update on, hour matches, createdAt + interval elapsed) stay in place but every algorithm-specific decision now lives with the algorithm. CuvsUpdatable picks up the lastUpdateAt + interval cadence the executor's listsAware=false branch used to enforce, so CAGRA / IVF-PQ behaviour is unchanged. The trivial HNSW / fulltext hooks just rename their parameter. Tests for the IVF-FLAT body move into the new home (pkg/vectorindex/ivfflat/plugin/idxcron/idxcron_test.go) and stub RunGetCountSql there instead of the executor-side runGetCountSql. The executor-side integration tests (TestIvfflatReindex, TestExecutorRunFakeTasks) now construct mockReindexAlgoPlugin with ivfflatidxcron.Hooks{} as the real idxcron implementation so the end-to-end cron flow is still exercised.

ProcessInitSQL runs the per-job InitSQL on a *process.Process that has no frontend session, so its ResolveVariableFunc lands as nil. Table functions consumed by InitSQL (e.g. ivfpq_create_gpu.go:236 reading kmeans_train_percent) silently skip their session-variable reads and build with degenerate config. Inline ProcessInitSQL's executor invocation so it can attach WithResolveVariableFunc(iscp.DefaultResolveVariable) — the hook is populated by pkg/frontend's init() with a closure that reads gSysVarsDefs[name].Default. Defaults-only by design: per-index admin-tuned values still flow through the captured-vars Metadata the idxcron task carries. Nil-safe: tests that don't blank-import pkg/frontend see the hook as nil and ProcessInitSQL keeps today's nil-resolver behaviour. ExecWithResult is left unchanged so the other ISCP call sites (executor.go, consumer_entry.go, data_retriever.go) are not affected. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

cuvs CAGRA needs at least intermediate_graph_degree rows per sub-index (default 128); IVF-PQ k-means needs at least `lists` rows. When the source has a partial trailing chunk (`total % IndexCapacity`) below the cuvs minimum — or the whole dataset is too small — the build would error. Pre-count source rows up front, compute cdcCutoff via the formula cdcCutoff = total - lastChunkSize when lastChunkSize < threshold = total otherwise Rows < cdcCutoff still feed the cuvs builder as today; the trailing rows buffer into a per-(table, index) PendingRecord slice and end() emits them as tag=1 CDC records under vectorindex.CdcTailId via the new cuvs.SaveSmallTailAsCdc helper. Search-side brute-force replay already serves tag=1 records when no tag=0 model exists for that slice, so queries keep working until a future rebuild lifts the tail back above threshold. Empty source is now a clean no-op (was: "source table is empty; cannot determine index capacity" error) — the auto-detect / cutoff branch sets srcEmpty=true and per-row / end() short-circuit. The CDC bytes layout reuses the existing cuvscdc.EncodeEventRecord + FrameCdcChunk + CdcAppendEventsSql primitives so replay decodes identically. INCLUDE-column bytes are produced by a new encodeIncludeRowFromArgVecs sibling next to appendFilterRow, matching the cuvscdc.EncodeIncludeRow on-wire layout. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

The CAGRA / IVF-PQ idxcron Hooks are thin wrappers that delegate to cuvsidxcron.CuvsUpdatable with a per-algo CuvsUpdatableSpec. Add focused tests that drive each wrapper through the IndexDef-missing and threshold-missing paths of the shared body — the error message and skip reason name the storage-table-type and threshold-param the spec asked about, so a regression to the wrong constant surfaces immediately. HNSW and fulltext don't participate in scheduled rebuilds; cover their trivial-true contract too so any future wiring keeps the "don't surprise-skip" guarantee. IVF-FLAT's full nsample/lists body suite already exists. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

When the small-tail fallback writes all source rows to CDC tag=1 records without producing a tag=0 sub-index, the search-side loadCdcTail used to short-circuit ("cdc_tail data is moot without a main index") and ignore those records. Filtered queries against small-data-only indexes therefore returned empty results. Persist the INCLUDE-column layout in a self-describing record at the start of chunk_id=0: CdcOpHeader (1) | payload_len (uint32 LE) | colMetaJSON SaveSmallTailAsCdc prepends this header when colMetaJSON is non-empty (computed via the new colMetaJSONFromCols helper from the table-function's resolved []cuvsfilter.ColumnMeta). The header's self-describing length lets DecodeEventRecord skip past it without knowing includeBytesPerRow, and PeekColMetaJSON recovers the JSON without committing to dim/ibpr. CagraSearch.loadCdcTail and IvfpqSearch.loadCdcTail no longer return early when no sub-index has loaded. They peek the header, derive includeBytesPerRow via cuvscdc.CdcIncludeBytesPerRow, replay the tag=1 events into a synthetic model, and stash the colMetaJSON on a new OverflowColMetaJSON field. buildOverflow falls back to that field when no main-index has a GetFilterColMetaJSON() to offer — so the brute-force FilterStore gets wired with INCLUDE-column metadata and filtered prefilter still works on small-data-only indexes. ReplayEventLog also captures the header into ReplayState.ColMetaJSON for callers that prefer the unified result struct over the peek helper. Empty-result invariant preserved: a header-only chunk with no event records produces no overflow → buildOverflow leaves s.Overflow nil → buildMultiIndex returns nil → Search returns []int64{}, []float64{}, nil. Both buildMultiIndex docstrings call out that this is the load-bearing path for "no main index + no brute-force → empty result" and that TestCagraSearchEmpty / TestIvfpqSearchEmpty pin it. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Replaces the CdcOpHeader record introduced in f28f73d with a dedicated header section in every chunk's frame. Frame format bumped to version 2: magic_start | version | payload_len | header_len | header | records | crc | reserved | reserved | magic_end The header section carries colMetaJSON when the index has INCLUDE columns; payload_len covers only the event records (Delete/Insert, unchanged shape). header_len = 0 collapses the new section to nothing, matching the original 32-byte overhead. Why the shape change: - Records stay pure event payloads — no CdcOpHeader op, no special- case in DecodeEventRecord / ReplayEventLog. Decoders treat headers as frame metadata, not as records to skip. - Every chunk is self-describing: any one chunk read in isolation knows its INCLUDE-column layout without depending on chunk_id ordering or whether chunk_id=0 is present. - Fixes the empty-source-then-CDC edge case: when cagra_create with srcEmpty=true emits nothing, the first CagraSync.Save chunk (chunk_id=0, NextChunkIdSql) carries the header so search can decode it. Surface changes: - FrameCdcChunk(records, header []byte) — new second arg. - UnframeCdcChunk returns (records, header, err). - CdcAppendEventsSql(..., colMetaJSON string) — embeds the header in every emitted chunk. - SaveSmallTailAsCdc just passes colMetaJSON through; no longer prepends a header record. - CagraSync.Save / IvfpqSync.Save pass s.colMetaJSON to CdcAppendEventsSql so ongoing CDC iterations also embed it. - ReplayEventLog captures the header from each chunk's frame into ReplayState.ColMetaJSON (last-write-wins; in practice all chunks share the same value). - PeekColMetaJSON simplifies to "unframe chunks[0], return header". - CdcOpHeader / EncodeHeaderRecord / CdcEventRecord.Header dropped. Tests updated: existing FrameCdcChunk / UnframeCdcChunk callers take the new signature; the old "header as first record" small-tail tests are replaced by ones that assert the header lives in every chunk's frame. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Both producers (cuvscdc.ResolveIncludeColumns and the table-function helper colMetaJSONFromCols) now share one entry type and one marshal function: cuvscdc.ColMetaEntry{Name, Type} cuvscdc.MarshalColMetaJSON([]ColMetaEntry) (string, error) The shared producer uses encoding/json so column names containing `"` or `\` (or any other JSON-significant character) escape correctly — the previous strings.Builder paths would have emitted invalid JSON for such names. New TestMarshalColMetaJSON_EscapesNames pins that contract by round-tripping a name containing each special character through encoding/json. Single producer also guarantees the iscp writer side (ResolveIncludeColumns at index-CDC-event-write time) and the table- function side (small-tail emit at build time) cannot drift: any future shape change lands in one place. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Replace the unreliable resolver-error probe used to detect background re-entry (idxcron ALTER REINDEX, ProcessInitSQL) with an explicit proc.Base.IsFrontend flag carried via executor.Options.WithFrontend. Default is background; frontend opts in at the two session-bound proc-construction sites (mysql client query handler and back_exec). BuildIdxcronMetadata, ddl.go AlterTableInplace re-registration, and the experimental_xxx_index gates in cagra/ivfpq/hnsw now consult ctx.IsFrontend() instead of probing a resolver — so background re- entry no longer clobbers captured task metadata or trips an experimental-flag check that already passed at CREATE INDEX time. The dead probe-based FrontendProbeVar / IdxcronFrontendProbeVar fields are removed in the same pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Three comments in types.go and sqlexec.go still spoke of "IsBackground=true" / "WithIsBackground(false)" — relics of the prior name. Reworded to match the post-rename API (IsFrontend / WithFrontend) so the in-file docstrings line up with the code. No behaviour change. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Add [plugin] / [isfrontend] tagged logutil.Info calls at each plugin lifecycle milestone so SQL-driven end-to-end tests can confirm via the CN log that the right algorithm's hook ran with the expected context. Covered points: - compile.handleCreate / HandleCreateIndex (cagra, ivfpq, ivfflat, hnsw): logs isFrontend / forceSync / def-count at entry — proves the per-algo gate and forceSync decision. - compile.HandleDropIndex (all four): logs entry on DROP INDEX. - compile.IdxcronMetadata (cagra, ivfpq, ivfflat): per-algo entry log pairs with the existing shared BuildIdxcronMetadata capture/skip [isfrontend] lines. - idxcron.Updatable (all four): logs every cron-tick decision. - iscp.NewIndexSqlWriter: single central log fires once per CDC consumer construction across all algos. - cuvs Sync.AppendRecords / Sync.Save (cagra, ivfpq): logs records IN from the CDC stream and OUT to the storage table, so flush cadence and chunk count are visible in the log. Smoke test files added for ivfflat and hnsw plugin/compile/ so the new log lines stay covered (ivfflat went 0% → 7.1%, hnsw 0% → 13.9%; cagra/ivfpq held at 79.6%). All other touched packages held or improved coverage. Build + vet clean on both default and gpu tag sets. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

# Conflicts: # pkg/sql/plan/function/function_id.go # pkg/sql/plan/function/function_id_test.go

Follow-up to the drop-index cache-eviction fix: HNSW's HandleDropIndex was still a no-op, so with the new dispatch its cached search index lingered until the 5-min VectorIndexCacheTTL (same leak as ivfpq/cagra/ivfflat). Evict via cache.Cache.Remove(storageDef.IndexTableName), mirroring the create-side. All four vector plugins now release on drop. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…xes CLONE) CREATE TABLE ... CLONE of a table with a CAGRA/IVF-PQ vector index failed with "VECTOR column 'v' cannot be in index": indexColumnCheckKind mapped only IVFFLAT/HNSW (CAGRA/IVFPQ fell to "secondary"), and checkIndexColumnSupportability hardcoded the vector allowlist to ivfflat/hnsw and only matched f32/f64 (narrow f16/bf16/int8/uint8 fell through unvalidated). Delegate the vector-column check to the per-plugin catalog hook (catalog.SupportsVectorType / SupportedVectorTypes) so each algorithm's real supported element types are enforced: ivfflat = f32/f64/f16/bf16/int8/uint8, cagra/ivfpq = f32/f16, hnsw = f32/f64; non-vector index kinds reject vector columns. indexColumnCheckKind now maps cagra/ivfpq so Get() resolves the plugin. Verified: gpu_cases/vector BVT 100% (vector_clone_idxcron now 21/21) + unit tests. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

XuPeng-SH

I re-checked the current head and still see two substantive correctness issues in the cuVS quantizer path.

The 1-byte quantizer fallback still learns from already-quantized bytes.
In cgo/cuvs/index_base.hpp, train_quantizer_if_needed() still auto-trains from flattened_host_dataset after explicitly warning that this buffer may already hold int8/uint8 storage values when data came in through the public storage-typed constructors / add-chunk path. In that case the quantizer learns the compressed range, not the original float range, so later base-typed search/extend paths can silently quantize against the wrong min/max.

Suggestion: do not auto-train from storage-typed data. Require either an explicit quantizer/range or original base-typed training data before enabling base-typed search/extend on pre-quantized indexes.
The new “strided sample” still ignores the tail for 501–999 row builds.
With n_train = min(500, count) and stride = count / n_train, any 501 <= count < 1000 still collapses to stride == 1, so the sampling loop only visits rows 0..499. That means extrema in the tail are still missed, even though the comment now claims the sampler covers all rows.

Suggestion: choose indices proportionally across the full range (for example r = j * (count - 1) / (n_train - 1)) or switch to a true uniform/reservoir sampler.

I would keep this at request changes until those two are addressed, because both can directly bias quantization and search quality without any obvious runtime failure.

A WHERE predicate on a column not in the index INCLUDE list cannot be pushed into the GPU bitset; the planner runs the ANN search for a candidate window then JOINs+filters at the DB (post-filter). This path had no BVT coverage — all existing filter cases only filter on INCLUDE'd columns. Add vector_{cagra,ivfpq}_postfilter.sql: establish the unfiltered ranked result, then verify the post-filtered result equals exactly the unfiltered rows that satisfy the predicate (exact when LIMIT >= row count so the candidate window covers all rows), plus the mixed pre(INCLUDE)+post(non-INCLUDE) case and the small-LIMIT approximate-window case (far match falls outside the window). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

aunjgr

LGTM for the quantization support. Well-structured across cuvs C++/CUDA layer, Go bindings, and SQL compilation.

For a 1-byte storage type that buffer only ever holds STORAGE bytes (raw T from a pre-quantized add_chunk(T*), or post-flush quantized output), never original floats. Training the scalar quantizer on it learns the COMPRESSED range (e.g. int8 [-128,127]) instead of the true float range, so later base-typed search/extend silently quantizes against the wrong min/max. Quantizer training now happens solely in flush_pending_float_chunks_internal() on the ORIGINAL floats buffered by add_chunk_float()/add_chunk_quantize(). A pre-quantized index leaves the quantizer untrained; base-typed search (quantize_query) and extend (upload_float_matrix_as_T) already throw "quantizer not trained", so the op fails loudly instead of mis-quantizing. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

cpegeric and others added 30 commits May 19, 2026 09:43

code review fix

6b13486

code review fix

8a01ef3

Merge branch 'gpu_async_search' into gpu_cdc

67f3352

merge fix

0f5b3ad

merge fix

0f7c26f

add cdc unframe test

417c0e4

Merge branch 'gpu_cdc' into gpu_plugin_all

3164281

better plugin integration

8c20654

AlterTableCloneBehavior

064e545

iscp plugin

52fea2d

ivfpq and cagra plugin iscp integration

90771d8

cuvs sync

618cec0

add tests

07a30c8

force sync with ivfpq/cagra and add hook for AlterReIndex

c95e057

idxcron integration

aba3b13

idxcron hook

9d7ed3a

Merge branch 'iscp_resolve_variable' into gpu_plugin_cuvs

ecb1845

cpegeric temporarily deployed to ci June 24, 2026 12:30 — with GitHub Actions Inactive

cpegeric temporarily deployed to ci June 24, 2026 12:31 — with GitHub Actions Inactive

Merge branch 'main' into cuvs_quantize

a1f5cf1

# Conflicts: # pkg/sql/plan/function/function_id.go # pkg/sql/plan/function/function_id_test.go

cpegeric had a problem deploying to ci June 24, 2026 13:48 — with GitHub Actions Error

cpegeric temporarily deployed to ci June 24, 2026 13:48 — with GitHub Actions Inactive

cpegeric had a problem deploying to ci June 24, 2026 13:48 — with GitHub Actions Error

cpegeric temporarily deployed to ci June 24, 2026 13:49 — with GitHub Actions Inactive

cpegeric had a problem deploying to ci June 24, 2026 13:49 — with GitHub Actions Error

cpegeric temporarily deployed to ci June 24, 2026 14:06 — with GitHub Actions Inactive

XuPeng-SH requested changes Jun 24, 2026

View reviewed changes

heni02 approved these changes Jun 25, 2026

View reviewed changes

aunjgr approved these changes Jun 25, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

IVFFLAT/IVFPQ/CAGRA support bf16, float16, int8 and uint8 quantization#25095

IVFFLAT/IVFPQ/CAGRA support bf16, float16, int8 and uint8 quantization#25095
cpegeric wants to merge 874 commits into
matrixorigin:mainfrom
cpegeric:cuvs_quantize

cpegeric commented Jun 23, 2026 •

edited

Loading

Uh oh!

XuPeng-SH left a comment

Uh oh!

aunjgr left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

Uh oh!

Conversation

cpegeric commented Jun 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What type of PR is this?

Which issue(s) this PR fixes:

What this PR does / why we need it:

Uh oh!

XuPeng-SH left a comment

Choose a reason for hiding this comment

Uh oh!

aunjgr left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

cpegeric commented Jun 23, 2026 •

edited

Loading