IVFFLAT/IVFPQ/CAGRA support bf16, float16, int8 and uint8 quantization#25095
IVFFLAT/IVFPQ/CAGRA support bf16, float16, int8 and uint8 quantization#25095cpegeric wants to merge 874 commits into
Conversation
The CDC chunk framing, event-record codec, and replay helpers shipped in pkg/vectorindex/cuvs_cdc.go are cuvs-specific (CAGRA + IVF-PQ); no general vectorindex code uses them. Lift the file into pkg/vectorindex/cuvs/ so future cuvs-shared helpers have a natural home and so the parent vectorindex surface narrows. Pure code-motion: cagra/ivfpq sync/search/cdc-load now reach the symbols via the cuvscdc alias (renaming to dodge the existing pkg/cuvs GPU-bindings package).
Resolves rename collisions: gpu_cdc moves cuvs_cdc.go into the new pkg/vectorindex/cuvs sub-package (cuvscdc alias), while gpu_plugin_cuvs added CagraSync/IvfpqSync.AppendRecords and the iscp CuvsCdcWriter + cuvs/idxcron.CuvsUpdatable, all of which referenced the old vectorindex.X symbols. Updated those call sites to the new alias.
The plugin refactor lifted getCagraParams / getIvfpqParams /
buildCagraCreate / buildCagraSearch / buildIvfpqCreate /
buildIvfpqSearch out of (*QueryBuilder) onto package-level functions
in pkg/vectorindex/{cagra,ivfpq}/plugin/plan/tablefunc.go, but
pkg/sql/plan/cagra_ivfpq_test.go was left calling the old methods —
breaking GPU vet on pkg/sql/plan/...
Port the suite to both plugin sub-packages using a minimal
planplugin.PlanBuilder stub. The build* paths only consult
GetContext / GenNewBindTag / AppendNode, so the per-algo
ApplyForSort / CanApply redirects panic in the stub. Each test file
also wires planplugin.DeepCopyColDefList as a shallow pass-through
(production wires it from pkg/sql/plan, which can't be imported
here without a cycle).
Widen the Updatable hook contract to take an UpdatableInput struct
(sqlproc, tableDef, indexName, metadata, createdAt, lastUpdateAt,
interval) so per-algo hooks can own their full rebuild gate. Move
IVF-FLAT's lists/nsample heuristic + kmeans_train_percent mutation
out of (*IndexUpdateTaskInfo).checkIndexUpdatable into the plugin
hook; the executor's universal pre-checks (auto_update on, hour
matches, createdAt + interval elapsed) stay in place but every
algorithm-specific decision now lives with the algorithm.
CuvsUpdatable picks up the lastUpdateAt + interval cadence the
executor's listsAware=false branch used to enforce, so CAGRA /
IVF-PQ behaviour is unchanged. The trivial HNSW / fulltext hooks
just rename their parameter.
Tests for the IVF-FLAT body move into the new home
(pkg/vectorindex/ivfflat/plugin/idxcron/idxcron_test.go) and stub
RunGetCountSql there instead of the executor-side runGetCountSql.
The executor-side integration tests (TestIvfflatReindex,
TestExecutorRunFakeTasks) now construct mockReindexAlgoPlugin with
ivfflatidxcron.Hooks{} as the real idxcron implementation so the
end-to-end cron flow is still exercised.
ProcessInitSQL runs the per-job InitSQL on a *process.Process that has no frontend session, so its ResolveVariableFunc lands as nil. Table functions consumed by InitSQL (e.g. ivfpq_create_gpu.go:236 reading kmeans_train_percent) silently skip their session-variable reads and build with degenerate config. Inline ProcessInitSQL's executor invocation so it can attach WithResolveVariableFunc(iscp.DefaultResolveVariable) — the hook is populated by pkg/frontend's init() with a closure that reads gSysVarsDefs[name].Default. Defaults-only by design: per-index admin-tuned values still flow through the captured-vars Metadata the idxcron task carries. Nil-safe: tests that don't blank-import pkg/frontend see the hook as nil and ProcessInitSQL keeps today's nil-resolver behaviour. ExecWithResult is left unchanged so the other ISCP call sites (executor.go, consumer_entry.go, data_retriever.go) are not affected. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
cuvs CAGRA needs at least intermediate_graph_degree rows per
sub-index (default 128); IVF-PQ k-means needs at least `lists`
rows. When the source has a partial trailing chunk
(`total % IndexCapacity`) below the cuvs minimum — or the whole
dataset is too small — the build would error.
Pre-count source rows up front, compute cdcCutoff via the formula
cdcCutoff = total - lastChunkSize when lastChunkSize < threshold
= total otherwise
Rows < cdcCutoff still feed the cuvs builder as today; the
trailing rows buffer into a per-(table, index) PendingRecord slice
and end() emits them as tag=1 CDC records under
vectorindex.CdcTailId via the new cuvs.SaveSmallTailAsCdc helper.
Search-side brute-force replay already serves tag=1 records when
no tag=0 model exists for that slice, so queries keep working
until a future rebuild lifts the tail back above threshold.
Empty source is now a clean no-op (was: "source table is empty;
cannot determine index capacity" error) — the auto-detect /
cutoff branch sets srcEmpty=true and per-row / end() short-circuit.
The CDC bytes layout reuses the existing cuvscdc.EncodeEventRecord
+ FrameCdcChunk + CdcAppendEventsSql primitives so replay decodes
identically. INCLUDE-column bytes are produced by a new
encodeIncludeRowFromArgVecs sibling next to appendFilterRow,
matching the cuvscdc.EncodeIncludeRow on-wire layout.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The CAGRA / IVF-PQ idxcron Hooks are thin wrappers that delegate to cuvsidxcron.CuvsUpdatable with a per-algo CuvsUpdatableSpec. Add focused tests that drive each wrapper through the IndexDef-missing and threshold-missing paths of the shared body — the error message and skip reason name the storage-table-type and threshold-param the spec asked about, so a regression to the wrong constant surfaces immediately. HNSW and fulltext don't participate in scheduled rebuilds; cover their trivial-true contract too so any future wiring keeps the "don't surprise-skip" guarantee. IVF-FLAT's full nsample/lists body suite already exists. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
When the small-tail fallback writes all source rows to CDC tag=1
records without producing a tag=0 sub-index, the search-side
loadCdcTail used to short-circuit ("cdc_tail data is moot without a
main index") and ignore those records. Filtered queries against
small-data-only indexes therefore returned empty results.
Persist the INCLUDE-column layout in a self-describing record at
the start of chunk_id=0:
CdcOpHeader (1) | payload_len (uint32 LE) | colMetaJSON
SaveSmallTailAsCdc prepends this header when colMetaJSON is
non-empty (computed via the new colMetaJSONFromCols helper from
the table-function's resolved []cuvsfilter.ColumnMeta). The
header's self-describing length lets DecodeEventRecord skip past
it without knowing includeBytesPerRow, and PeekColMetaJSON
recovers the JSON without committing to dim/ibpr.
CagraSearch.loadCdcTail and IvfpqSearch.loadCdcTail no longer
return early when no sub-index has loaded. They peek the header,
derive includeBytesPerRow via cuvscdc.CdcIncludeBytesPerRow,
replay the tag=1 events into a synthetic model, and stash the
colMetaJSON on a new OverflowColMetaJSON field. buildOverflow
falls back to that field when no main-index has a
GetFilterColMetaJSON() to offer — so the brute-force FilterStore
gets wired with INCLUDE-column metadata and filtered prefilter
still works on small-data-only indexes.
ReplayEventLog also captures the header into ReplayState.ColMetaJSON
for callers that prefer the unified result struct over the peek
helper.
Empty-result invariant preserved: a header-only chunk with no
event records produces no overflow → buildOverflow leaves
s.Overflow nil → buildMultiIndex returns nil → Search returns
[]int64{}, []float64{}, nil. Both buildMultiIndex docstrings call
out that this is the load-bearing path for "no main index + no
brute-force → empty result" and that TestCagraSearchEmpty /
TestIvfpqSearchEmpty pin it.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Replaces the CdcOpHeader record introduced in f28f73d with a dedicated header section in every chunk's frame. Frame format bumped to version 2: magic_start | version | payload_len | header_len | header | records | crc | reserved | reserved | magic_end The header section carries colMetaJSON when the index has INCLUDE columns; payload_len covers only the event records (Delete/Insert, unchanged shape). header_len = 0 collapses the new section to nothing, matching the original 32-byte overhead. Why the shape change: - Records stay pure event payloads — no CdcOpHeader op, no special- case in DecodeEventRecord / ReplayEventLog. Decoders treat headers as frame metadata, not as records to skip. - Every chunk is self-describing: any one chunk read in isolation knows its INCLUDE-column layout without depending on chunk_id ordering or whether chunk_id=0 is present. - Fixes the empty-source-then-CDC edge case: when cagra_create with srcEmpty=true emits nothing, the first CagraSync.Save chunk (chunk_id=0, NextChunkIdSql) carries the header so search can decode it. Surface changes: - FrameCdcChunk(records, header []byte) — new second arg. - UnframeCdcChunk returns (records, header, err). - CdcAppendEventsSql(..., colMetaJSON string) — embeds the header in every emitted chunk. - SaveSmallTailAsCdc just passes colMetaJSON through; no longer prepends a header record. - CagraSync.Save / IvfpqSync.Save pass s.colMetaJSON to CdcAppendEventsSql so ongoing CDC iterations also embed it. - ReplayEventLog captures the header from each chunk's frame into ReplayState.ColMetaJSON (last-write-wins; in practice all chunks share the same value). - PeekColMetaJSON simplifies to "unframe chunks[0], return header". - CdcOpHeader / EncodeHeaderRecord / CdcEventRecord.Header dropped. Tests updated: existing FrameCdcChunk / UnframeCdcChunk callers take the new signature; the old "header as first record" small-tail tests are replaced by ones that assert the header lives in every chunk's frame. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Both producers (cuvscdc.ResolveIncludeColumns and the table-function
helper colMetaJSONFromCols) now share one entry type and one marshal
function:
cuvscdc.ColMetaEntry{Name, Type}
cuvscdc.MarshalColMetaJSON([]ColMetaEntry) (string, error)
The shared producer uses encoding/json so column names containing
`"` or `\` (or any other JSON-significant character) escape
correctly — the previous strings.Builder paths would have emitted
invalid JSON for such names. New TestMarshalColMetaJSON_EscapesNames
pins that contract by round-tripping a name containing each special
character through encoding/json.
Single producer also guarantees the iscp writer side
(ResolveIncludeColumns at index-CDC-event-write time) and the table-
function side (small-tail emit at build time) cannot drift: any
future shape change lands in one place.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Replace the unreliable resolver-error probe used to detect background re-entry (idxcron ALTER REINDEX, ProcessInitSQL) with an explicit proc.Base.IsFrontend flag carried via executor.Options.WithFrontend. Default is background; frontend opts in at the two session-bound proc-construction sites (mysql client query handler and back_exec). BuildIdxcronMetadata, ddl.go AlterTableInplace re-registration, and the experimental_xxx_index gates in cagra/ivfpq/hnsw now consult ctx.IsFrontend() instead of probing a resolver — so background re- entry no longer clobbers captured task metadata or trips an experimental-flag check that already passed at CREATE INDEX time. The dead probe-based FrontendProbeVar / IdxcronFrontendProbeVar fields are removed in the same pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Three comments in types.go and sqlexec.go still spoke of "IsBackground=true" / "WithIsBackground(false)" — relics of the prior name. Reworded to match the post-rename API (IsFrontend / WithFrontend) so the in-file docstrings line up with the code. No behaviour change. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Add [plugin] / [isfrontend] tagged logutil.Info calls at each plugin lifecycle milestone so SQL-driven end-to-end tests can confirm via the CN log that the right algorithm's hook ran with the expected context. Covered points: - compile.handleCreate / HandleCreateIndex (cagra, ivfpq, ivfflat, hnsw): logs isFrontend / forceSync / def-count at entry — proves the per-algo gate and forceSync decision. - compile.HandleDropIndex (all four): logs entry on DROP INDEX. - compile.IdxcronMetadata (cagra, ivfpq, ivfflat): per-algo entry log pairs with the existing shared BuildIdxcronMetadata capture/skip [isfrontend] lines. - idxcron.Updatable (all four): logs every cron-tick decision. - iscp.NewIndexSqlWriter: single central log fires once per CDC consumer construction across all algos. - cuvs Sync.AppendRecords / Sync.Save (cagra, ivfpq): logs records IN from the CDC stream and OUT to the storage table, so flush cadence and chunk count are visible in the log. Smoke test files added for ivfflat and hnsw plugin/compile/ so the new log lines stay covered (ivfflat went 0% → 7.1%, hnsw 0% → 13.9%; cagra/ivfpq held at 79.6%). All other touched packages held or improved coverage. Build + vet clean on both default and gpu tag sets. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
# Conflicts: # pkg/sql/plan/function/function_id.go # pkg/sql/plan/function/function_id_test.go
Follow-up to the drop-index cache-eviction fix: HNSW's HandleDropIndex was still a no-op, so with the new dispatch its cached search index lingered until the 5-min VectorIndexCacheTTL (same leak as ivfpq/cagra/ivfflat). Evict via cache.Cache.Remove(storageDef.IndexTableName), mirroring the create-side. All four vector plugins now release on drop. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…xes CLONE) CREATE TABLE ... CLONE of a table with a CAGRA/IVF-PQ vector index failed with "VECTOR column 'v' cannot be in index": indexColumnCheckKind mapped only IVFFLAT/HNSW (CAGRA/IVFPQ fell to "secondary"), and checkIndexColumnSupportability hardcoded the vector allowlist to ivfflat/hnsw and only matched f32/f64 (narrow f16/bf16/int8/uint8 fell through unvalidated). Delegate the vector-column check to the per-plugin catalog hook (catalog.SupportsVectorType / SupportedVectorTypes) so each algorithm's real supported element types are enforced: ivfflat = f32/f64/f16/bf16/int8/uint8, cagra/ivfpq = f32/f16, hnsw = f32/f64; non-vector index kinds reject vector columns. indexColumnCheckKind now maps cagra/ivfpq so Get() resolves the plugin. Verified: gpu_cases/vector BVT 100% (vector_clone_idxcron now 21/21) + unit tests. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
XuPeng-SH
left a comment
There was a problem hiding this comment.
I re-checked the current head and still see two substantive correctness issues in the cuVS quantizer path.
-
The 1-byte quantizer fallback still learns from already-quantized bytes.
Incgo/cuvs/index_base.hpp,train_quantizer_if_needed()still auto-trains fromflattened_host_datasetafter explicitly warning that this buffer may already holdint8/uint8storage values when data came in through the public storage-typed constructors / add-chunk path. In that case the quantizer learns the compressed range, not the original float range, so later base-typed search/extend paths can silently quantize against the wrong min/max.Suggestion: do not auto-train from storage-typed data. Require either an explicit quantizer/range or original base-typed training data before enabling base-typed search/extend on pre-quantized indexes.
-
The new “strided sample” still ignores the tail for 501–999 row builds.
Withn_train = min(500, count)andstride = count / n_train, any501 <= count < 1000still collapses tostride == 1, so the sampling loop only visits rows0..499. That means extrema in the tail are still missed, even though the comment now claims the sampler covers all rows.Suggestion: choose indices proportionally across the full range (for example
r = j * (count - 1) / (n_train - 1)) or switch to a true uniform/reservoir sampler.
I would keep this at request changes until those two are addressed, because both can directly bias quantization and search quality without any obvious runtime failure.
A WHERE predicate on a column not in the index INCLUDE list cannot be pushed
into the GPU bitset; the planner runs the ANN search for a candidate window then
JOINs+filters at the DB (post-filter). This path had no BVT coverage — all
existing filter cases only filter on INCLUDE'd columns.
Add vector_{cagra,ivfpq}_postfilter.sql: establish the unfiltered ranked result,
then verify the post-filtered result equals exactly the unfiltered rows that
satisfy the predicate (exact when LIMIT >= row count so the candidate window
covers all rows), plus the mixed pre(INCLUDE)+post(non-INCLUDE) case and the
small-LIMIT approximate-window case (far match falls outside the window).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
aunjgr
left a comment
There was a problem hiding this comment.
LGTM for the quantization support. Well-structured across cuvs C++/CUDA layer, Go bindings, and SQL compilation.
For a 1-byte storage type that buffer only ever holds STORAGE bytes (raw T from a pre-quantized add_chunk(T*), or post-flush quantized output), never original floats. Training the scalar quantizer on it learns the COMPRESSED range (e.g. int8 [-128,127]) instead of the true float range, so later base-typed search/extend silently quantizes against the wrong min/max. Quantizer training now happens solely in flush_pending_float_chunks_internal() on the ORIGINAL floats buffered by add_chunk_float()/add_chunk_quantize(). A pre-quantized index leaves the quantizer untrained; base-typed search (quantize_query) and extend (upload_float_matrix_as_T) already throw "quantizer not trained", so the op fails loudly instead of mis-quantizing. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
What type of PR is this?
Which issue(s) this PR fixes:
issue #25026
What this PR does / why we need it: