fix(fts): use async send in FTS index builder to prevent thread-pool … by a-agmon · Pull Request #7423 · lance-format/lance

a-agmon · 2026-06-23T16:25:45Z

Fixes lancedb/lancedb#3568 (the issue arises in lancedb indexing)

Building a full-text-search index hangs permanently at 0% CPU on hosts
whose Lance CPU pool has a single thread.
The CPU compute pool is sized max(1, num_cpus - LANCE_IO_CORE_RESERVATION) (default reservation 2), so any machine with <= 3 visible CPUs (1-vCPU VMs, CI runners, CPU-limited Kubernetes pods) collapses to a 1-thread pool and deadlocks.

Root cause is in write_posting_lists. The posting-list producer runs on the CPU pool via spawn_cpu and pushes batches into a capacity-1 async_channel using the synchronous tx.send_blocking(). When the channel is full, send_blocking parks the OS thread it is running on. On a single-thread pool that is the only thread, and the async consumer's column encoder (write_record_batch -> spawn_cpu) needs that same pool to drain the channel. The parked producer and the starved consumer wait on each other forever: no timeout, no error, just a silent hang at 0% CPU.
The hang only triggers once the posting lists span a second output batch (so the producer reaches a second, blocking send), which is why it appears as a data-size "cliff".

The PR restructures the producer as an async task that builds each batch on the CPU pool via spawn_cpu and dispatches it with tx.send(batch).await. When the channel is full, send().await yields the task back to the runtime instead of parking a pool thread, so the consumer can always be scheduled to drain it. Between batches the producer holds no pool thread while waiting, making the pool size irrelevant. The builder and the remaining posting-list iterator are handed back out of each spawn_cpu call so the cross-batch cache-group accumulator is preserved.

In addition, it adds a regression test that writes a partition whose posting lists span many output batches (exercising channel back-pressure) under a timeout and verifies every batch is searchable.

(verbose comments added in the code intentionally for review purposes - can be removed if inappropriate. I just thought it might be helpful as the issue is somewhat confusing)

a-agmon · 2026-06-24T03:22:41Z

Hi @westonpace - would be happy for your review.
This issue causes a nasty bug on K8S pods with one core, and it took my team quite some time to pin down. Especially as it occurs in native rust space. Submitting this PR to resolve this.
Thanks!

wjones127

Thanks for doing this. It took me a bit to reason about this. I think we need to add better docs to the spawn_cpu() function and maybe check other usage. Basically it seems like we should never use spawn_cpu() for any work that will cause the thread to sleep. No channels, no IO, and no locks. It should just consume CPU and finish. If you need a blocking thread that does any of those things, we should instead just use a spawn_blocking against a unbounded threadpool. Does that sound right to you @westonpace ?

…deadlock

codecov · 2026-07-04T16:42:12Z

Codecov Report

❌ Patch coverage is 95.83333% with 4 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
rust/lance-index/src/scalar/inverted/builder.rs	85.71%	1 Missing and 3 partials ⚠️

📢 Thoughts on this report? Let us know!

github-actions Bot added bug Something isn't working A-index Vector index, linalg, tokenizer and removed bug Something isn't working labels Jun 23, 2026

github-actions Bot added the bug Something isn't working label Jun 24, 2026

wjones127 self-requested a review July 2, 2026 21:36

wjones127 requested changes Jul 2, 2026

View reviewed changes

Comment thread rust/lance-index/src/scalar/inverted/index.rs Outdated

a-agmon force-pushed the fix/fts-async-send branch from 710a9fa to db619f2 Compare July 3, 2026 05:59

a-agmon requested a review from wjones127 July 3, 2026 06:25

fix(fts): use async send in FTS index builder to prevent thread-pool …

d8dcf5d

…deadlock

a-agmon force-pushed the fix/fts-async-send branch from db619f2 to d8dcf5d Compare July 3, 2026 07:25

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(fts): use async send in FTS index builder to prevent thread-pool …#7423

fix(fts): use async send in FTS index builder to prevent thread-pool …#7423
a-agmon wants to merge 1 commit into
lance-format:mainfrom
a-agmon:fix/fts-async-send

a-agmon commented Jun 23, 2026 •

edited

Loading

Uh oh!

a-agmon commented Jun 24, 2026 •

edited

Loading

Uh oh!

wjones127 left a comment

Uh oh!

Uh oh!

codecov Bot commented Jul 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

a-agmon commented Jun 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

a-agmon commented Jun 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

wjones127 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

codecov Bot commented Jul 4, 2026

Codecov Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

a-agmon commented Jun 23, 2026 •

edited

Loading

a-agmon commented Jun 24, 2026 •

edited

Loading