fix: catch vLLM InternalServerError for overlong prompts by mikasenghaas · Pull Request #1088 · PrimeIntellect-ai/verifiers

mikasenghaas · 2026-04-01T15:05:29Z

Summary

vLLM returns overlong-prompt errors as HTTP 500 InternalServerError instead of 400 BadRequestError
Extended handle_openai_overlong_prompt decorator to also catch InternalServerError and check for context-length phrases in the error text
Added tests for both matching (converted to OverlongPromptError) and non-matching (passes through) 500 errors

Test plan

Verified fix converts vLLM-style 500 with context length message to OverlongPromptError
Verified non-context-length 500 errors still propagate as InternalServerError
All existing error handling behavior unchanged (decorator still catches BadRequestError, still re-raises auth errors)

🤖 Generated with Claude Code

Note

Medium Risk
Expands exception handling to reinterpret some InternalServerError responses as OverlongPromptError, which could mask genuine 500s if the message matches the context-length phrases; tests reduce this risk by asserting non-matching 500s still propagate.

Overview
Extends OpenAI chat-completions overlong-prompt detection to also handle vLLM-style HTTP 500s by catching InternalServerError in handle_openai_overlong_prompt and mapping context-length messages to OverlongPromptError.

Adds regression tests covering both conversion of vLLM 500 context-length errors and pass-through behavior for unrelated 500s (e.g., "CUDA out of memory").

^{Written by Cursor Bugbot for commit f93ca5f. This will update automatically on new commits. Configure here.}

vLLM returns overlong-prompt errors as HTTP 500 InternalServerError instead of 400 BadRequestError. Extend the handle_openai_overlong_prompt decorator to also catch InternalServerError and check for context-length phrases in the error text. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

^{Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.}

cursor · 2026-04-01T15:07:57Z

tests/test_client_auth_errors.py

+    """A 500 that is NOT about context length should propagate as InternalServerError."""
+    client = OpenAIChatCompletionsClient(_OverlongVLLMChatClient("CUDA out of memory"))
+
+    with pytest.raises(OpenAIInternalServerError):


Test expects wrong exception type for non-matching 500

High Severity

The test test_vllm_non_overlong_internal_server_error_not_converted expects OpenAIInternalServerError to propagate, but the base get_response method in client.py wraps all non-auth, non-Error exceptions in ModelError. When the decorator re-raises InternalServerError, it's caught by except Exception as e: raise ModelError from e. The existing analogous test test_anthropic_non_overlong_bad_request_not_converted correctly expects ModelError instead.

cursor bot reviewed Apr 1, 2026

View reviewed changes

mikasenghaas marked this pull request as draft April 2, 2026 12:09

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: catch vLLM InternalServerError for overlong prompts#1088

fix: catch vLLM InternalServerError for overlong prompts#1088
mikasenghaas wants to merge 1 commit intomainfrom
fix/vllm-overlong-prompt-error

mikasenghaas commented Apr 1, 2026 •

edited by cursor bot

Loading

Uh oh!

cursor bot left a comment

Uh oh!

cursor bot Apr 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

mikasenghaas commented Apr 1, 2026 • edited by cursor bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Uh oh!

cursor bot left a comment

Choose a reason for hiding this comment

Uh oh!

cursor bot Apr 1, 2026

Choose a reason for hiding this comment

Test expects wrong exception type for non-matching 500

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

mikasenghaas commented Apr 1, 2026 •

edited by cursor bot

Loading