fix(kvflash): pooled prefix snapshot at chunk boundary (≥128K restore) by dusterbloom · Pull Request #445 · Luce-Org/lucebox-hub

dusterbloom · 2026-06-24T15:22:26Z

Wave-1 KVFlash deep-context. New commit in this PR: feee0a1d — save the pooled prefix snapshot at a chunk-aligned boundary so ≥128K restore works (was a crash).

Stacking / merge order: depends on #428 (pool reservation+sizing), #429 (pager serde), and #430 (pooled chunked prefill core). Those appear as lower commits in this branch's diff; review/merge them first — this branch rebases to drop them once they land, leaving only the snapshot-boundary fix.

Validation: rebased clean onto current main; the composed deep-context stack passed 64K NIAH (needle exact, restore engaged) on the integration branch. No standalone agentic gate yet.

cubic-dev-ai

2 issues found across 16 files

_{Reply with feedback, questions, or to request a fix.

Re-trigger cubic}

Add serialize()/deserialize() to KvFlashPager (snapshot the full resident+paged KV in logical chunk order; header-validated against layout) and a factored for_each_segment() helper. serde uses synchronous get/set and adapts to the pinned void* host_data of the async-DMA path (Luce-Org#408). Add critical-chunk pinning (pin_range/is_pinned/unpin_all + a best-effort deadlock floor) OR-ed into the ensure_free_block + reselect protections; empty by default (byte-identical non-pin path). CPU unit test (no GPU) covers serde round-trip, header-guard reject, pinning, deadlock guard, reset.

…r KVFlash Drive the MoE cold-expert hybrid path through KVFlash's resident pool: prompts larger than the pool prefill via a chunk loop over hybrid_forward_batch (eviction automatic in alloc_span); the restore residual delta routes through the same chunked path. Pooled snapshot save/restore serializes the pager into the prefix snapshot (PrefixSnapshot += is_pooled + blob; snapshot_target_cache/restore gain skip_kv; the blob rides the disk prefix-cache via a named tensor so cross-turn 128K restore composes). Drafter-scorer residency + DFLASH_KVFLASH_PIN_SPANS critical-chunk pinning wired in. Composes with the landed KVFlash (Luce-Org#373/Luce-Org#408/Luce-Org#385) and MoE restore (Luce-Org#362); serde adapts to the async pinned host_data. GPU gate (RTX 3090): pooled prefill preserves sink context + stable across pool sizes; cross-turn disk restore round-trips losslessly.

…gment Three complexity cuts, no behavior change (GPU sink-recall gate + serde/ placement unit tests green): - merge restore residual's identical snap_pooled/else chunk loops into one (the else ct ternary already subsumes the pooled case) - extract chunked_prefill() shared by generate_impl kvf_paged + restore residual - inline single-caller for_each_segment template into serialize net -25 lines (54 ins / 79 del).

…restore works At ≥128K with the KVFlash pool active, turn 1 never saved a prefix snapshot — the pooled-prefill branch was stubbed to a diagnostic ("boundary snapshot skipped: pooled prefill relocates chunks") and returned without saving. So turn 2 found nothing to restore (prefix_len=0), fell back to a full cold re-prefill (0.8s→77.6s), decode regressed 80→20 tok/s, and turn 3 crashed. The all-hot 35B-A3B runs the dense Qwen35Backend path (moe_hybrid==nullptr), so this was the live bug for the user's deep-context (>128K = 39% of real prompts) workload. - add KvFlashPager::serialize(max_chunks) to capture only chunks [0, max_chunks) — the chunk-aligned turn boundary, not the whole prompt. - add Qwen35Backend::snapshot_save_pooled_at(slot, boundary): floor the requested snap_pos to a chunk multiple, set cur_pos to that boundary, serialize the partial pager blob, and save it (the restore/deserialize path already existed and was correct — only the save was missing). - replace the pooled-prefill skip stub at the chunk-aligned boundary with the real save; mirror the same save on the qwen35moe hybrid path. - unit tests: floor_to_chunk + serialize(max_chunks) partial round-trip (bit-identical first k chunks). 131K 3-turn smoke: turn-2 restore=true prefix_len=34077 (97.5% hit), turn-3 restore=true, no crash, tool_call_valid=1.0, decode recovered 20→56-59 tok/s. Known follow-up: warm-prefill at 131K is still ~44s (deserialize re-pages the whole pool) — correctness/crash/decode are fixed; restoring only resident chunks is the next optimization.

…uard removal core 71371d8 removed the !layout_known_ short-circuit; cold_prefix_boundary now returns the last eligible boundary. Updates the stale ==0 expectation. CI: test_server_unit.cpp.

… contention)

…OB, UAF, atoi-UB, re-prefill fallback, pin_range, gguf type guards)

This was referenced Jun 24, 2026

feat(kvflash): snapshot ledger + QK-pool restore-seed + consume-restored-KV #446

Open

feat(dflash): entropy + self-calibrating spec-decode gate (composes with #436) dusterbloom/lucebox-hub#3

Open

cubic-dev-ai Bot reviewed Jun 24, 2026

View reviewed changes

Comment thread server/src/common/moe_hybrid_ffn_eval.cpp Outdated

Comment thread server/src/common/kvflash_pager.h Outdated

dusterbloom closed this Jun 24, 2026

dusterbloom reopened this Jun 24, 2026

dusterbloom added 7 commits June 24, 2026 22:10

test(server): cold_prefix_boundary returns 4000 after layout_known_ g…

5e9ca84

…uard removal core 71371d8 removed the !layout_known_ short-circuit; cold_prefix_boundary now returns the last eligible boundary. Updates the stale ==0 expectation. CI: test_server_unit.cpp.

ci: re-trigger (prior RTX GPU job cancelled by self-hosted-runner GPU…

e324870

… contention)

fix(cubic): address production P1/P2 review findings (bounded loop, O…

ef3b4cc

…OB, UAF, atoi-UB, re-prefill fallback, pin_range, gguf type guards)

dusterbloom force-pushed the pr/kvflash-cache-persistence branch from 750e090 to ef3b4cc Compare June 24, 2026 20:19

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(kvflash): pooled prefix snapshot at chunk boundary (≥128K restore)#445

fix(kvflash): pooled prefix snapshot at chunk boundary (≥128K restore)#445
dusterbloom wants to merge 7 commits into
Luce-Org:mainfrom
dusterbloom:pr/kvflash-cache-persistence

dusterbloom commented Jun 24, 2026 •

edited by cubic-dev-ai Bot

Loading

Uh oh!

cubic-dev-ai Bot left a comment •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

dusterbloom commented Jun 24, 2026 • edited by cubic-dev-ai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cubic-dev-ai Bot left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

dusterbloom commented Jun 24, 2026 •

edited by cubic-dev-ai Bot

Loading

cubic-dev-ai Bot left a comment •

edited

Loading