Skip to content

IVFFLAT/IVFPQ/CAGRA support bf16, float16, int8 and uint8 quantization#25095

Open
cpegeric wants to merge 874 commits into
matrixorigin:mainfrom
cpegeric:cuvs_quantize
Open

IVFFLAT/IVFPQ/CAGRA support bf16, float16, int8 and uint8 quantization#25095
cpegeric wants to merge 874 commits into
matrixorigin:mainfrom
cpegeric:cuvs_quantize

Conversation

@cpegeric

@cpegeric cpegeric commented Jun 23, 2026

Copy link
Copy Markdown
Contributor

What type of PR is this?

  • API-change
  • BUG
  • Improvement
  • Documentation
  • Feature
  • Test and CI
  • Code Refactoring

Which issue(s) this PR fixes:

issue #25026

What this PR does / why we need it:

  1. IVFFLAT support bf16, float16, int8 and uint8 quantization
  2. bug fix GPU concurrent Kmeans clustering
  3. IVFPQ/CAGRA support float32, float16 as base type and its quantization int8 and uint8
  4. bug fix slow int8 quantization in GPU. moved to CPU computation
  5. basic array function for bf16, float16, int8, and uint8

cpegeric and others added 30 commits May 19, 2026 09:43
The CDC chunk framing, event-record codec, and replay helpers
shipped in pkg/vectorindex/cuvs_cdc.go are cuvs-specific (CAGRA +
IVF-PQ); no general vectorindex code uses them. Lift the file into
pkg/vectorindex/cuvs/ so future cuvs-shared helpers have a natural
home and so the parent vectorindex surface narrows.

Pure code-motion: cagra/ivfpq sync/search/cdc-load now reach the
symbols via the cuvscdc alias (renaming to dodge the existing
pkg/cuvs GPU-bindings package).
Resolves rename collisions: gpu_cdc moves cuvs_cdc.go into the new
pkg/vectorindex/cuvs sub-package (cuvscdc alias), while gpu_plugin_cuvs
added CagraSync/IvfpqSync.AppendRecords and the iscp CuvsCdcWriter +
cuvs/idxcron.CuvsUpdatable, all of which referenced the old
vectorindex.X symbols. Updated those call sites to the new alias.
The plugin refactor lifted getCagraParams / getIvfpqParams /
buildCagraCreate / buildCagraSearch / buildIvfpqCreate /
buildIvfpqSearch out of (*QueryBuilder) onto package-level functions
in pkg/vectorindex/{cagra,ivfpq}/plugin/plan/tablefunc.go, but
pkg/sql/plan/cagra_ivfpq_test.go was left calling the old methods —
breaking GPU vet on pkg/sql/plan/...

Port the suite to both plugin sub-packages using a minimal
planplugin.PlanBuilder stub. The build* paths only consult
GetContext / GenNewBindTag / AppendNode, so the per-algo
ApplyForSort / CanApply redirects panic in the stub. Each test file
also wires planplugin.DeepCopyColDefList as a shallow pass-through
(production wires it from pkg/sql/plan, which can't be imported
here without a cycle).
Widen the Updatable hook contract to take an UpdatableInput struct
(sqlproc, tableDef, indexName, metadata, createdAt, lastUpdateAt,
interval) so per-algo hooks can own their full rebuild gate. Move
IVF-FLAT's lists/nsample heuristic + kmeans_train_percent mutation
out of (*IndexUpdateTaskInfo).checkIndexUpdatable into the plugin
hook; the executor's universal pre-checks (auto_update on, hour
matches, createdAt + interval elapsed) stay in place but every
algorithm-specific decision now lives with the algorithm.

CuvsUpdatable picks up the lastUpdateAt + interval cadence the
executor's listsAware=false branch used to enforce, so CAGRA /
IVF-PQ behaviour is unchanged. The trivial HNSW / fulltext hooks
just rename their parameter.

Tests for the IVF-FLAT body move into the new home
(pkg/vectorindex/ivfflat/plugin/idxcron/idxcron_test.go) and stub
RunGetCountSql there instead of the executor-side runGetCountSql.
The executor-side integration tests (TestIvfflatReindex,
TestExecutorRunFakeTasks) now construct mockReindexAlgoPlugin with
ivfflatidxcron.Hooks{} as the real idxcron implementation so the
end-to-end cron flow is still exercised.
ProcessInitSQL runs the per-job InitSQL on a *process.Process that
has no frontend session, so its ResolveVariableFunc lands as nil.
Table functions consumed by InitSQL (e.g. ivfpq_create_gpu.go:236
reading kmeans_train_percent) silently skip their session-variable
reads and build with degenerate config.

Inline ProcessInitSQL's executor invocation so it can attach
WithResolveVariableFunc(iscp.DefaultResolveVariable) — the hook is
populated by pkg/frontend's init() with a closure that reads
gSysVarsDefs[name].Default. Defaults-only by design: per-index
admin-tuned values still flow through the captured-vars Metadata
the idxcron task carries.

Nil-safe: tests that don't blank-import pkg/frontend see the hook
as nil and ProcessInitSQL keeps today's nil-resolver behaviour.
ExecWithResult is left unchanged so the other ISCP call sites
(executor.go, consumer_entry.go, data_retriever.go) are not
affected.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
cuvs CAGRA needs at least intermediate_graph_degree rows per
sub-index (default 128); IVF-PQ k-means needs at least `lists`
rows. When the source has a partial trailing chunk
(`total % IndexCapacity`) below the cuvs minimum — or the whole
dataset is too small — the build would error.

Pre-count source rows up front, compute cdcCutoff via the formula
  cdcCutoff = total - lastChunkSize    when lastChunkSize < threshold
            = total                    otherwise
Rows < cdcCutoff still feed the cuvs builder as today; the
trailing rows buffer into a per-(table, index) PendingRecord slice
and end() emits them as tag=1 CDC records under
vectorindex.CdcTailId via the new cuvs.SaveSmallTailAsCdc helper.
Search-side brute-force replay already serves tag=1 records when
no tag=0 model exists for that slice, so queries keep working
until a future rebuild lifts the tail back above threshold.

Empty source is now a clean no-op (was: "source table is empty;
cannot determine index capacity" error) — the auto-detect /
cutoff branch sets srcEmpty=true and per-row / end() short-circuit.

The CDC bytes layout reuses the existing cuvscdc.EncodeEventRecord
+ FrameCdcChunk + CdcAppendEventsSql primitives so replay decodes
identically. INCLUDE-column bytes are produced by a new
encodeIncludeRowFromArgVecs sibling next to appendFilterRow,
matching the cuvscdc.EncodeIncludeRow on-wire layout.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The CAGRA / IVF-PQ idxcron Hooks are thin wrappers that delegate
to cuvsidxcron.CuvsUpdatable with a per-algo CuvsUpdatableSpec.
Add focused tests that drive each wrapper through the
IndexDef-missing and threshold-missing paths of the shared body
— the error message and skip reason name the storage-table-type
and threshold-param the spec asked about, so a regression to the
wrong constant surfaces immediately.

HNSW and fulltext don't participate in scheduled rebuilds; cover
their trivial-true contract too so any future wiring keeps the
"don't surprise-skip" guarantee.

IVF-FLAT's full nsample/lists body suite already exists.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
When the small-tail fallback writes all source rows to CDC tag=1
records without producing a tag=0 sub-index, the search-side
loadCdcTail used to short-circuit ("cdc_tail data is moot without a
main index") and ignore those records. Filtered queries against
small-data-only indexes therefore returned empty results.

Persist the INCLUDE-column layout in a self-describing record at
the start of chunk_id=0:

  CdcOpHeader (1) | payload_len (uint32 LE) | colMetaJSON

SaveSmallTailAsCdc prepends this header when colMetaJSON is
non-empty (computed via the new colMetaJSONFromCols helper from
the table-function's resolved []cuvsfilter.ColumnMeta). The
header's self-describing length lets DecodeEventRecord skip past
it without knowing includeBytesPerRow, and PeekColMetaJSON
recovers the JSON without committing to dim/ibpr.

CagraSearch.loadCdcTail and IvfpqSearch.loadCdcTail no longer
return early when no sub-index has loaded. They peek the header,
derive includeBytesPerRow via cuvscdc.CdcIncludeBytesPerRow,
replay the tag=1 events into a synthetic model, and stash the
colMetaJSON on a new OverflowColMetaJSON field. buildOverflow
falls back to that field when no main-index has a
GetFilterColMetaJSON() to offer — so the brute-force FilterStore
gets wired with INCLUDE-column metadata and filtered prefilter
still works on small-data-only indexes.

ReplayEventLog also captures the header into ReplayState.ColMetaJSON
for callers that prefer the unified result struct over the peek
helper.

Empty-result invariant preserved: a header-only chunk with no
event records produces no overflow → buildOverflow leaves
s.Overflow nil → buildMultiIndex returns nil → Search returns
[]int64{}, []float64{}, nil. Both buildMultiIndex docstrings call
out that this is the load-bearing path for "no main index + no
brute-force → empty result" and that TestCagraSearchEmpty /
TestIvfpqSearchEmpty pin it.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Replaces the CdcOpHeader record introduced in f28f73d with a
dedicated header section in every chunk's frame. Frame format
bumped to version 2:

  magic_start | version | payload_len | header_len | header | records | crc | reserved | reserved | magic_end

The header section carries colMetaJSON when the index has INCLUDE
columns; payload_len covers only the event records (Delete/Insert,
unchanged shape). header_len = 0 collapses the new section to nothing,
matching the original 32-byte overhead.

Why the shape change:
- Records stay pure event payloads — no CdcOpHeader op, no special-
  case in DecodeEventRecord / ReplayEventLog. Decoders treat headers
  as frame metadata, not as records to skip.
- Every chunk is self-describing: any one chunk read in isolation
  knows its INCLUDE-column layout without depending on chunk_id
  ordering or whether chunk_id=0 is present.
- Fixes the empty-source-then-CDC edge case: when cagra_create
  with srcEmpty=true emits nothing, the first CagraSync.Save
  chunk (chunk_id=0, NextChunkIdSql) carries the header so search
  can decode it.

Surface changes:
- FrameCdcChunk(records, header []byte) — new second arg.
- UnframeCdcChunk returns (records, header, err).
- CdcAppendEventsSql(..., colMetaJSON string) — embeds the header
  in every emitted chunk.
- SaveSmallTailAsCdc just passes colMetaJSON through; no longer
  prepends a header record.
- CagraSync.Save / IvfpqSync.Save pass s.colMetaJSON to
  CdcAppendEventsSql so ongoing CDC iterations also embed it.
- ReplayEventLog captures the header from each chunk's frame into
  ReplayState.ColMetaJSON (last-write-wins; in practice all chunks
  share the same value).
- PeekColMetaJSON simplifies to "unframe chunks[0], return header".
- CdcOpHeader / EncodeHeaderRecord / CdcEventRecord.Header dropped.

Tests updated: existing FrameCdcChunk / UnframeCdcChunk callers
take the new signature; the old "header as first record" small-tail
tests are replaced by ones that assert the header lives in every
chunk's frame.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Both producers (cuvscdc.ResolveIncludeColumns and the table-function
helper colMetaJSONFromCols) now share one entry type and one marshal
function:

  cuvscdc.ColMetaEntry{Name, Type}
  cuvscdc.MarshalColMetaJSON([]ColMetaEntry) (string, error)

The shared producer uses encoding/json so column names containing
`"` or `\` (or any other JSON-significant character) escape
correctly — the previous strings.Builder paths would have emitted
invalid JSON for such names. New TestMarshalColMetaJSON_EscapesNames
pins that contract by round-tripping a name containing each special
character through encoding/json.

Single producer also guarantees the iscp writer side
(ResolveIncludeColumns at index-CDC-event-write time) and the table-
function side (small-tail emit at build time) cannot drift: any
future shape change lands in one place.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Replace the unreliable resolver-error probe used to detect background
re-entry (idxcron ALTER REINDEX, ProcessInitSQL) with an explicit
proc.Base.IsFrontend flag carried via executor.Options.WithFrontend.
Default is background; frontend opts in at the two session-bound
proc-construction sites (mysql client query handler and back_exec).
BuildIdxcronMetadata, ddl.go AlterTableInplace re-registration, and
the experimental_xxx_index gates in cagra/ivfpq/hnsw now consult
ctx.IsFrontend() instead of probing a resolver — so background re-
entry no longer clobbers captured task metadata or trips an
experimental-flag check that already passed at CREATE INDEX time.
The dead probe-based FrontendProbeVar / IdxcronFrontendProbeVar
fields are removed in the same pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Three comments in types.go and sqlexec.go still spoke of
"IsBackground=true" / "WithIsBackground(false)" — relics of the
prior name. Reworded to match the post-rename API
(IsFrontend / WithFrontend) so the in-file docstrings line up
with the code. No behaviour change.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Add [plugin] / [isfrontend] tagged logutil.Info calls at each plugin
lifecycle milestone so SQL-driven end-to-end tests can confirm via
the CN log that the right algorithm's hook ran with the expected
context. Covered points:

- compile.handleCreate / HandleCreateIndex (cagra, ivfpq, ivfflat,
  hnsw): logs isFrontend / forceSync / def-count at entry — proves
  the per-algo gate and forceSync decision.
- compile.HandleDropIndex (all four): logs entry on DROP INDEX.
- compile.IdxcronMetadata (cagra, ivfpq, ivfflat): per-algo entry log
  pairs with the existing shared BuildIdxcronMetadata
  capture/skip [isfrontend] lines.
- idxcron.Updatable (all four): logs every cron-tick decision.
- iscp.NewIndexSqlWriter: single central log fires once per CDC
  consumer construction across all algos.
- cuvs Sync.AppendRecords / Sync.Save (cagra, ivfpq): logs records
  IN from the CDC stream and OUT to the storage table, so flush
  cadence and chunk count are visible in the log.

Smoke test files added for ivfflat and hnsw plugin/compile/ so the
new log lines stay covered (ivfflat went 0% → 7.1%, hnsw 0% →
13.9%; cagra/ivfpq held at 79.6%). All other touched packages held
or improved coverage. Build + vet clean on both default and gpu
tag sets.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
# Conflicts:
#	pkg/sql/plan/function/function_id.go
#	pkg/sql/plan/function/function_id_test.go
Follow-up to the drop-index cache-eviction fix: HNSW's HandleDropIndex was
still a no-op, so with the new dispatch its cached search index lingered
until the 5-min VectorIndexCacheTTL (same leak as ivfpq/cagra/ivfflat).
Evict via cache.Cache.Remove(storageDef.IndexTableName), mirroring the
create-side. All four vector plugins now release on drop.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…xes CLONE)

CREATE TABLE ... CLONE of a table with a CAGRA/IVF-PQ vector index failed with
"VECTOR column 'v' cannot be in index": indexColumnCheckKind mapped only
IVFFLAT/HNSW (CAGRA/IVFPQ fell to "secondary"), and checkIndexColumnSupportability
hardcoded the vector allowlist to ivfflat/hnsw and only matched f32/f64 (narrow
f16/bf16/int8/uint8 fell through unvalidated).

Delegate the vector-column check to the per-plugin catalog hook
(catalog.SupportsVectorType / SupportedVectorTypes) so each algorithm's real
supported element types are enforced: ivfflat = f32/f64/f16/bf16/int8/uint8,
cagra/ivfpq = f32/f16, hnsw = f32/f64; non-vector index kinds reject vector
columns. indexColumnCheckKind now maps cagra/ivfpq so Get() resolves the plugin.

Verified: gpu_cases/vector BVT 100% (vector_clone_idxcron now 21/21) + unit tests.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

@XuPeng-SH XuPeng-SH left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I re-checked the current head and still see two substantive correctness issues in the cuVS quantizer path.

  1. The 1-byte quantizer fallback still learns from already-quantized bytes.
    In cgo/cuvs/index_base.hpp, train_quantizer_if_needed() still auto-trains from flattened_host_dataset after explicitly warning that this buffer may already hold int8/uint8 storage values when data came in through the public storage-typed constructors / add-chunk path. In that case the quantizer learns the compressed range, not the original float range, so later base-typed search/extend paths can silently quantize against the wrong min/max.

    Suggestion: do not auto-train from storage-typed data. Require either an explicit quantizer/range or original base-typed training data before enabling base-typed search/extend on pre-quantized indexes.

  2. The new “strided sample” still ignores the tail for 501–999 row builds.
    With n_train = min(500, count) and stride = count / n_train, any 501 <= count < 1000 still collapses to stride == 1, so the sampling loop only visits rows 0..499. That means extrema in the tail are still missed, even though the comment now claims the sampler covers all rows.

    Suggestion: choose indices proportionally across the full range (for example r = j * (count - 1) / (n_train - 1)) or switch to a true uniform/reservoir sampler.

I would keep this at request changes until those two are addressed, because both can directly bias quantization and search quality without any obvious runtime failure.

A WHERE predicate on a column not in the index INCLUDE list cannot be pushed
into the GPU bitset; the planner runs the ANN search for a candidate window then
JOINs+filters at the DB (post-filter). This path had no BVT coverage — all
existing filter cases only filter on INCLUDE'd columns.

Add vector_{cagra,ivfpq}_postfilter.sql: establish the unfiltered ranked result,
then verify the post-filtered result equals exactly the unfiltered rows that
satisfy the predicate (exact when LIMIT >= row count so the candidate window
covers all rows), plus the mixed pre(INCLUDE)+post(non-INCLUDE) case and the
small-LIMIT approximate-window case (far match falls outside the window).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

@aunjgr aunjgr left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM for the quantization support. Well-structured across cuvs C++/CUDA layer, Go bindings, and SQL compilation.

For a 1-byte storage type that buffer only ever holds STORAGE bytes
(raw T from a pre-quantized add_chunk(T*), or post-flush quantized
output), never original floats. Training the scalar quantizer on it
learns the COMPRESSED range (e.g. int8 [-128,127]) instead of the true
float range, so later base-typed search/extend silently quantizes
against the wrong min/max.

Quantizer training now happens solely in flush_pending_float_chunks_internal()
on the ORIGINAL floats buffered by add_chunk_float()/add_chunk_quantize().
A pre-quantized index leaves the quantizer untrained; base-typed search
(quantize_query) and extend (upload_float_matrix_as_T) already throw
"quantizer not trained", so the op fails loudly instead of mis-quantizing.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

kind/feature size/XXL Denotes a PR that changes 2000+ lines

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants