fix(litellm): detect GPT-5 family when model has openai/ or azure/ prefix#2401
fix(litellm): detect GPT-5 family when model has openai/ or azure/ prefix#2401sizickp wants to merge 4 commits into
Conversation
GPT-5 codex models only accept `temperature=1`. The handler converted
`temperature` into `reasoning_effort` only when the model string started
with `gpt-5`, but users frequently set the full provider-qualified name
in their config (e.g. `openai/gpt-5.1-codex-max`). That prefix made the
`startswith('gpt-5')` check fail, so `temperature=0.2` was sent and
litellm rejected the request with `UnsupportedParamsError`. The primary
model then always fell back to the secondary one — silently downgrading
quality and doubling per-call latency/cost.
Normalize the model name by stripping the `openai/` / `azure/` prefix
before the GPT-5 check, and rebuild the routed name with the original
provider so Azure routing is preserved.
Adds regression tests for `openai/gpt-5*` (including `codex-max`) and
for the `_thinking` suffix together with a prefix.
Review Summary by QodoFix GPT-5 detection with provider-prefixed model names
WalkthroughsDescription• Strip provider prefix before GPT-5 detection to handle prefixed model names • Preserve original provider prefix in routed model name for Azure compatibility • Add regression tests for openai/gpt-5* variants and _thinking suffix handling Diagramflowchart LR
A["Model name with prefix<br/>openai/gpt-5.1-codex-max"] -->|removeprefix| B["Normalized base<br/>gpt-5.1-codex-max"]
B -->|startswith check| C["GPT-5 path triggered"]
C -->|set reasoning_effort| D["Drop temperature"]
D -->|rebuild with prefix| E["Route to openai/gpt-5.1-codex-max"]
File Changes1. pr_agent/algo/ai_handlers/litellm_ai_handler.py
|
Code Review by Qodo
1.
|
| model_base = model.removeprefix('openai/').removeprefix('azure/') | ||
| if model_base.startswith('gpt-5'): | ||
| # Use configured reasoning_effort or default to MEDIUM |
There was a problem hiding this comment.
1. Azure+prefix skips gpt-5 🐞 Bug ≡ Correctness
In Azure mode, chat_completion() prepends azure/ before computing model_base, and the new removeprefix() chain only removes a single leading provider prefix. If the configured model is already provider-qualified (e.g. openai/gpt-5* or azure/gpt-5*), model_base can still start with openai/ or azure/, so the GPT-5 branch is skipped and temperature is sent (triggering the same LiteLLM UnsupportedParamsError this PR aims to fix).
Agent Prompt
### Issue description
In `LiteLLMAIHandler.chat_completion()`, GPT-5 detection relies on `model_base` computed via `removeprefix('openai/').removeprefix('azure/')`. When `self.azure` is enabled, the function prepends `azure/` to the model first; if the incoming model already has a provider prefix, the resulting string can contain multiple provider segments (e.g. `azure/openai/gpt-5...` or `azure/azure/gpt-5...`). The current normalization removes at most one leading prefix, so `model_base` may still begin with a provider segment and fail `startswith('gpt-5')`, causing `temperature` to be passed to GPT-5.
### Issue Context
Models come directly from user configuration/fallback lists and can be provider-qualified strings.
### Fix Focus Areas
- pr_agent/algo/ai_handlers/litellm_ai_handler.py[409-413]
- pr_agent/algo/ai_handlers/litellm_ai_handler.py[433-461]
- pr_agent/algo/pr_processing.py[341-354]
### Suggested fix
Refactor provider handling so you do not build composite provider prefixes:
1) Determine/normalize the provider prefix separately from the model name (don’t mutate `model` with `'azure/' + model` before parsing).
2) Strip provider prefixes in a loop (or parse once) until the remaining model name is the real base (`gpt-5...`).
3) Then rebuild the final routed model string once (provider + base model), ensuring exactly one provider prefix.
ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools
b7fe4eb to
0d9006b
Compare
|
Persistent review updated to latest commit 0d9006b |
0d9006b to
8c4377f
Compare
|
Persistent review updated to latest commit 8c4377f |
…efixes Address two routing issues raised by the Qodo review on the initial fix: 1. Stacked prefixes (Azure mode + provider-qualified config). With self.azure prepending "azure/" to a model that already had an "openai/" or "azure/" prefix in config, the single removeprefix() call left a residual provider segment, GPT-5 detection failed, and temperature was still sent (the original bug this PR fixes). 2. Explicit "azure/" rewritten to "openai/" when self.azure was false. The previous rebuild used "azure/ if self.azure else openai/", which ignored an explicit "azure/" the user had written in config and silently rerouted the request to OpenAI. Strip provider prefixes in a loop so any number of stacked segments is removed, and choose the rebuilt prefix by priority: Azure mode > explicit prefix in user config > "openai/" default. Capture the user's original model string before the azure auto-prepend so the explicit prefix is still visible at rebuild time. Tests added for: - explicit "azure/" preserved when self.azure is false - Azure mode with bare / openai/- / azure/- prefixed configs all producing exactly one "azure/" prefix
Four `with patch(...)` statements in the GPT-5 provider-prefix tests exceeded the repo's 120-character line limit (ruff line-length=120 in pyproject.toml). Wrap them onto multiple lines so the test file does not regress lint compliance for the newly added cases.
8c4377f to
66026ff
Compare
|
Persistent review updated to latest commit 66026ff |
Address Qodo review feedback: the added Group 8 tests instantiate LiteLLMAIHandler() directly, and its __init__ branches on environment variables such as AWS_USE_IMDS, OPENAI_API_KEY, and the AWS_* set. With those vars potentially leaking from the runner environment, the tests could behave differently across machines or CI runners (e.g. attempt boto3 import when AWS_USE_IMDS is set). Explicitly delete the relevant env vars via monkeypatch.delenv() at the start of each new test, so the constructor takes a deterministic path regardless of where the suite runs.
|
Persistent review updated to latest commit 79513ba |
This comment was marked as spam.
This comment was marked as spam.
This comment was marked as spam.
This comment was marked as spam.
|
The fix is correct, but it's local to the GPT-5 branch — the same prefix-detection bug it fixes still bites non–GPT-5 models through the capability gates right below, and Azure mode makes it worse. Same bug, other gates
if model in self.user_message_only_models ... # ~L457
if model not in self.no_support_temperature_models ... # ~L473 ← adds temperature
if model in self.support_reasoning_models ... # ~L483
if model in self.claude_extended_thinking_models ... # ~L500But
The tell that the matching is prefix-fragile: Azure amplifies itIn Azure mode the handler unconditionally prepends (Separately: Azure also uses arbitrary deployment names rather than model IDs, so SuggestionCompute the normalized base name once (the |
Summary
GPT-5 codex models only accept
temperature=1. The LiteLLM handler swapstemperatureforreasoning_effortonly when the model string starts withgpt-5, but users frequently put the full provider-qualified name in[config].model(e.g.openai/gpt-5.1-codex-max). That prefix makesmodel.startswith('gpt-5')returnFalse, so the handler still sendstemperature=0.2and litellm rejects the call:The primary model always errors, and PR-Agent silently falls back to the secondary model on every invocation — degrading answer quality vs. the configured model, doubling per-call latency, and burning input tokens on a request that can never succeed.
Fix
In
pr_agent/algo/ai_handlers/litellm_ai_handler.py::chat_completion, normalize the model name by stripping theopenai//azure/prefix before the GPT-5 detection, then rebuild the routed name with the original provider prefix (so Azure routing is preserved whenself.azureis set).This is a minimal, surgical change — no behavior changes for unprefixed names, no widening of the heuristic; just the prefix-aware detection.
Repro
configuration.toml:Before: every call logs the warning above and falls back. After: the GPT-5 branch runs,
reasoning_effortis set,temperatureis dropped, and the primary model is used.Test plan
test_gpt5_with_openai_prefix_triggers_reasoning_effortcoveringopenai/gpt-5,openai/gpt-5.1-codex,openai/gpt-5.1-codex-max,openai/gpt-5.4-mini. Assertsreasoning_effortis set,temperatureis not in kwargs, and the model name is not double-prefixed (openai/openai/...).test_gpt5_with_openai_prefix_strips_thinking_suffixforopenai/gpt-5_thinking— verifies suffix is removed and provider prefix kept exactly once.test_litellm_*suites still pass (68 tests).