fix(fts): use async send in FTS index builder to prevent thread-pool …#7423
fix(fts): use async send in FTS index builder to prevent thread-pool …#7423a-agmon wants to merge 1 commit into
Conversation
|
Hi @westonpace - would be happy for your review. |
wjones127
left a comment
There was a problem hiding this comment.
Thanks for doing this. It took me a bit to reason about this. I think we need to add better docs to the spawn_cpu() function and maybe check other usage. Basically it seems like we should never use spawn_cpu() for any work that will cause the thread to sleep. No channels, no IO, and no locks. It should just consume CPU and finish. If you need a blocking thread that does any of those things, we should instead just use a spawn_blocking against a unbounded threadpool. Does that sound right to you @westonpace ?
710a9fa to
db619f2
Compare
db619f2 to
d8dcf5d
Compare
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
Fixes lancedb/lancedb#3568 (the issue arises in lancedb indexing)
Building a full-text-search index hangs permanently at 0% CPU on hosts
whose Lance CPU pool has a single thread.
The CPU compute pool is sized
max(1, num_cpus - LANCE_IO_CORE_RESERVATION)(default reservation 2), so any machine with <= 3 visible CPUs (1-vCPU VMs, CI runners, CPU-limited Kubernetes pods) collapses to a 1-thread pool and deadlocks.Root cause is in
write_posting_lists. The posting-list producer runs on the CPU pool viaspawn_cpuand pushes batches into a capacity-1async_channelusing the synchronoustx.send_blocking(). When the channel is full,send_blockingparks the OS thread it is running on. On a single-thread pool that is the only thread, and the async consumer's column encoder (write_record_batch->spawn_cpu) needs that same pool to drain the channel. The parked producer and the starved consumer wait on each other forever: no timeout, no error, just a silent hang at 0% CPU.The hang only triggers once the posting lists span a second output batch (so the producer reaches a second, blocking send), which is why it appears as a data-size "cliff".
The PR restructures the producer as an async task that builds each batch on the CPU pool via
spawn_cpuand dispatches it withtx.send(batch).await. When the channel is full,send().awaityields the task back to the runtime instead of parking a pool thread, so the consumer can always be scheduled to drain it. Between batches the producer holds no pool thread while waiting, making the pool size irrelevant. The builder and the remaining posting-list iterator are handed back out of eachspawn_cpucall so the cross-batch cache-group accumulator is preserved.In addition, it adds a regression test that writes a partition whose posting lists span many output batches (exercising channel back-pressure) under a timeout and verifies every batch is searchable.
(verbose comments added in the code intentionally for review purposes - can be removed if inappropriate. I just thought it might be helpful as the issue is somewhat confusing)