Skip to content

feat: add a primitive form of continuous batching#2167

Open
AlpinDale wants to merge 1 commit intoLostRuins:concedofrom
AlpinDale:feat/batching
Open

feat: add a primitive form of continuous batching#2167
AlpinDale wants to merge 1 commit intoLostRuins:concedofrom
AlpinDale:feat/batching

Conversation

@AlpinDale
Copy link
Copy Markdown

This PR adds a continuous batching path, for only modern GGUFs, and a select set of samplers. This is mostly a proof of concept, so a lot of features including context shift, smart context, and others will not work. To test:

python koboldcpp.py --model /path/to/model.gguf --multiuser 32 --continuous-batching 8 --noshift

--noshift is required, and --continuous-batching needs to be manually tuned; too high and it'll not work. Suffice to say, lots of work needs to be done. At least, it's really fast.

Mode Requests Concurrency Max Length Batch Slots Total Wall Time Throughput p50 Latency p95 Latency
Legacy queue 16 16 512 1 22.607s 0.708 req/s 11.970s 22.813s
Continuous batching 16 16 512 4 6.241s 2.564 req/s 3.824s 6.236s

@LostRuins LostRuins added enhancement New feature or request needs review needs review labels Apr 27, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request needs review needs review

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants