feat: add configurable thinking output format support for vLLM#8901
feat: add configurable thinking output format support for vLLM#8901AyRickk wants to merge 12 commits intocontinuedev:mainfrom
Conversation
|
I have read the CLA Document and I hereby sign the CLA 1 out of 2 committers have signed the CLA. |
|
I have read the CLA Document and I hereby sign the CLA |
|
recheck |
- Add new section in vLLM provider docs explaining thinking output format options - Document thinkingOpenTag and thinkingCloseTag properties in YAML reference - Document thinkingOpenTag and thinkingCloseTag properties in JSON reference - Include configuration examples for both YAML and JSON formats Co-authored-by: nate <nate@continue.dev> Generated with [Continue](https://continue.dev) Co-Authored-By: Continue <noreply@continue.dev>
|
Added documentation for the new thinking output format configuration feature: Changes made:
The documentation maintains the existing level of detail and focuses on practical usage of the new feature. |
…eaming Add comprehensive integration tests to verify the ThinkingTagExtractor works correctly when integrated with BaseLLM's streamChat method. Tests cover: - Single and multiple chunk scenarios - Partial tag handling at chunk boundaries - Flush behavior at stream end - Multiple thinking blocks - Custom tag formats - Interaction with native thinking role chunks Co-authored-by: nate <nate@continue.dev>
|
Added integration tests in The tests cover:
These integration tests complement the existing unit tests for ThinkingTagExtractor by verifying the end-to-end behavior when the extractor is integrated with the actual streaming pipeline. |
There was a problem hiding this comment.
1 issue found across 4 files
Prompt for AI agents (all 1 issues)
Check if these issues are valid — if so, understand the root cause of each and fix them.
<file name="core/llm/index.ts">
<violation number="1" location="core/llm/index.ts:1412">
Custom provider streaming now suppresses assistant tool-call messages with empty `content`, so downstream tool executions are lost.</violation>
</file>
Reply to cubic to teach it or ask questions. Re-run a review with @cubic-dev-ai review this PR
There was a problem hiding this comment.
1 issue found across 8 files
Prompt for AI agents (all 1 issues)
Check if these issues are valid — if so, understand the root cause of each and fix them.
<file name="core/llm/index.ts">
<violation number="1" location="core/llm/index.ts:1413">
Tool-call/usage-only assistant chunks are dropped because they are now yielded only when `content` is non-empty, preventing tools from ever executing.</violation>
</file>
Reply to cubic to teach it or ask questions. Re-run a review with @cubic-dev-ai review this PR
There was a problem hiding this comment.
1 issue found across 7 files
Prompt for AI agents (all 1 issues)
Check if these issues are valid — if so, understand the root cause of each and fix them.
<file name="core/llm/index.ts">
<violation number="1" location="core/llm/index.ts:1412">
Tool/function-call chunks are dropped because new guard filters out assistant messages whose content is an empty string, so streaming tool calls no longer reach the caller.</violation>
</file>
Reply to cubic to teach it or ask questions. Re-run a review with @cubic-dev-ai review this PR
RomneyDa
left a comment
There was a problem hiding this comment.
@AyRickk see comments, I think we could merge if
- functionality is siloed to the VLLM class
- use a single thinkingTagName not duplicate tag configs (unless it would be normal for them to be different in VLLM, I'm not sure)
- Update config yaml schema as in comment, and in config-yaml package (or it will not work for yaml configs)
- On first glance I wonder if the tag parsing logic could be significantly more concise, maybe worth another look/approach?
- Follow up on/clean up test/docs commits. This is a new feature and we are tweaking when it runs (i.e. we might not run for community PRs) but I think in this case it made sense
…utput-format-support-for-vLLM
Per reviewer feedback on PR continuedev#8901: - Remove ThinkingTagExtractor class from core/llm/index.ts (keep in separate file) - Remove thinkingOpenTag/thinkingCloseTag from BaseLLM class - Remove thinking extractor logic from processChatChunk and streamChat in BaseLLM - Remove thinkingOpenTag/thinkingCloseTag from LLMOptions in core/index.d.ts - Remove thinkingTagIntegration.vitest.ts (BaseLLM integration test) The feature is now vLLM-specific only, handled by the Vllm class. Co-authored-by: AyRickk <94455465+AyRickk@users.noreply.github.com>
There was a problem hiding this comment.
1 issue found across 2 files (reviewed changes from recent commits).
Prompt for AI agents (all 1 issues)
Check if these issues are valid — if so, understand the root cause of each and fix them.
<file name="core/llm/thinkingTagIntegration.vitest.ts">
<violation number="1" location="core/llm/thinkingTagIntegration.vitest.ts:30">
Tests duplicate the production `_streamChat` logic inside `MockVllm`, so none of these cases exercise the real vLLM streaming implementation—regressions in `Vllm._streamChat` would still pass.</violation>
</file>
Reply to cubic to teach it or ask questions. Re-run a review with @cubic-dev-ai review this PR
|
@RomneyDa is it ok ? |
|
@AyRickk what are your thoughts on this? |
There was a problem hiding this comment.
@AyRickk status from previous review
✅ functionality is siloed to the VLLM class
✅ Follow up on/clean up test/docs commits. This is a new feature and we are tweaking when it runs (i.e. we might not run for community PRs) but I think in this case it made sense
🟡 use a single thinkingTagName not duplicate tag configs (unless it would be normal for them to be different in VLLM, I'm not sure) - I saw your comment, but I think still the think tag name would always be the same, so it might make more sense to say thinkTagName and alwaysIncludesReasoning (boolean) or similar, which could clean up much of this logic
🟡 On first glance I wonder if the tag parsing logic could be significantly more concise, maybe worth another look/approach?
❌ Update config yaml schema as in comment, ❌ and in config-yaml package (or it will not work for yaml configs)
I'm concerned about the maintainability of this, I think we could merge if we kept it to a couple hundred lines of simple think tag parsing etc.
|
@AyRickk Thanks for the contribution here. This has gone a bit stale, closing for now. Definitely still open for contributions on this. See above feedback. |
Related to #5992
Summary by cubic
Adds configurable thinking output format support for vLLM by parsing custom tags in streamed responses, while preserving vLLM’s native reasoning_content during streaming. Models using tags like … or … now emit proper “thinking” and “assistant” chunks.
Written for commit d99b93c. Summary will update automatically on new commits.