fix: catch vLLM InternalServerError for overlong prompts#1088
Draft
mikasenghaas wants to merge 1 commit intomainfrom
Draft
fix: catch vLLM InternalServerError for overlong prompts#1088mikasenghaas wants to merge 1 commit intomainfrom
mikasenghaas wants to merge 1 commit intomainfrom
Conversation
vLLM returns overlong-prompt errors as HTTP 500 InternalServerError instead of 400 BadRequestError. Extend the handle_openai_overlong_prompt decorator to also catch InternalServerError and check for context-length phrases in the error text. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
| """A 500 that is NOT about context length should propagate as InternalServerError.""" | ||
| client = OpenAIChatCompletionsClient(_OverlongVLLMChatClient("CUDA out of memory")) | ||
|
|
||
| with pytest.raises(OpenAIInternalServerError): |
There was a problem hiding this comment.
Test expects wrong exception type for non-matching 500
High Severity
The test test_vllm_non_overlong_internal_server_error_not_converted expects OpenAIInternalServerError to propagate, but the base get_response method in client.py wraps all non-auth, non-Error exceptions in ModelError. When the decorator re-raises InternalServerError, it's caught by except Exception as e: raise ModelError from e. The existing analogous test test_anthropic_non_overlong_bad_request_not_converted correctly expects ModelError instead.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.


Summary
InternalServerErrorinstead of 400BadRequestErrorhandle_openai_overlong_promptdecorator to also catchInternalServerErrorand check for context-length phrases in the error textOverlongPromptError) and non-matching (passes through) 500 errorsTest plan
OverlongPromptErrorInternalServerErrorBadRequestError, still re-raises auth errors)🤖 Generated with Claude Code
Note
Medium Risk
Expands exception handling to reinterpret some
InternalServerErrorresponses asOverlongPromptError, which could mask genuine 500s if the message matches the context-length phrases; tests reduce this risk by asserting non-matching 500s still propagate.Overview
Extends OpenAI chat-completions overlong-prompt detection to also handle vLLM-style HTTP 500s by catching
InternalServerErrorinhandle_openai_overlong_promptand mapping context-length messages toOverlongPromptError.Adds regression tests covering both conversion of vLLM 500 context-length errors and pass-through behavior for unrelated 500s (e.g., "CUDA out of memory").
Written by Cursor Bugbot for commit f93ca5f. This will update automatically on new commits. Configure here.