Skip to content

fix: catch vLLM InternalServerError for overlong prompts#1088

Draft
mikasenghaas wants to merge 1 commit intomainfrom
fix/vllm-overlong-prompt-error
Draft

fix: catch vLLM InternalServerError for overlong prompts#1088
mikasenghaas wants to merge 1 commit intomainfrom
fix/vllm-overlong-prompt-error

Conversation

@mikasenghaas
Copy link
Copy Markdown
Member

@mikasenghaas mikasenghaas commented Apr 1, 2026

Summary

  • vLLM returns overlong-prompt errors as HTTP 500 InternalServerError instead of 400 BadRequestError
  • Extended handle_openai_overlong_prompt decorator to also catch InternalServerError and check for context-length phrases in the error text
  • Added tests for both matching (converted to OverlongPromptError) and non-matching (passes through) 500 errors

Test plan

  • Verified fix converts vLLM-style 500 with context length message to OverlongPromptError
  • Verified non-context-length 500 errors still propagate as InternalServerError
  • All existing error handling behavior unchanged (decorator still catches BadRequestError, still re-raises auth errors)

🤖 Generated with Claude Code


Note

Medium Risk
Expands exception handling to reinterpret some InternalServerError responses as OverlongPromptError, which could mask genuine 500s if the message matches the context-length phrases; tests reduce this risk by asserting non-matching 500s still propagate.

Overview
Extends OpenAI chat-completions overlong-prompt detection to also handle vLLM-style HTTP 500s by catching InternalServerError in handle_openai_overlong_prompt and mapping context-length messages to OverlongPromptError.

Adds regression tests covering both conversion of vLLM 500 context-length errors and pass-through behavior for unrelated 500s (e.g., "CUDA out of memory").

Written by Cursor Bugbot for commit f93ca5f. This will update automatically on new commits. Configure here.

vLLM returns overlong-prompt errors as HTTP 500 InternalServerError
instead of 400 BadRequestError. Extend the handle_openai_overlong_prompt
decorator to also catch InternalServerError and check for context-length
phrases in the error text.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

"""A 500 that is NOT about context length should propagate as InternalServerError."""
client = OpenAIChatCompletionsClient(_OverlongVLLMChatClient("CUDA out of memory"))

with pytest.raises(OpenAIInternalServerError):
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test expects wrong exception type for non-matching 500

High Severity

The test test_vllm_non_overlong_internal_server_error_not_converted expects OpenAIInternalServerError to propagate, but the base get_response method in client.py wraps all non-auth, non-Error exceptions in ModelError. When the decorator re-raises InternalServerError, it's caught by except Exception as e: raise ModelError from e. The existing analogous test test_anthropic_non_overlong_bad_request_not_converted correctly expects ModelError instead.

Fix in Cursor Fix in Web

@mikasenghaas mikasenghaas marked this pull request as draft April 2, 2026 12:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant