fix(fallback): treat 404/405/409/410 as retryable so a model-unavailable error falls through#173
fix(fallback): treat 404/405/409/410 as retryable so a model-unavailable error falls through#173agpituk wants to merge 1 commit into
Conversation
…e falls through A multi-attempt routing policy stopped at the first attempt when the upstream returned 404/405/409/410, returning 502 without trying the remaining entries. These codes mean the requested model or endpoint is unavailable at THIS provider (deprecated, renamed, retired, or never offered) — recovering from a retired model is one of the main reasons users configure fallback, so they should fall through to the next entry instead of failing the whole request. 400/422 stay terminal: a malformed request is rejected by every provider, so falling through would only waste attempts. Adds a unit test pinning the classifier contract (which codes fall through vs stop, plus timeout/network/no-status cases) and an integration test covering 404-on-primary fall-through to a healthy secondary. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
This PR appears to be missing required sections from the PR template. Please edit your PR description to include the PR Type, Checklist, and AI Usage sections. The template helps maintainers review your contribution. This PR will be automatically closed in 24 hours if the template is not restored. If you're using an AI coding tool, please ensure it preserves the PR template. |
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (3)
Walkthrough
ChangesFallback retry policy expansion and test coverage
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes 🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
✨ Simplify code
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Problem
On a multi-attempt routing policy, the gateway stopped at the first attempt and returned
502when the upstream responded with404(and405/409/410), never trying the remaining entries. These codes typically mean the requested model or endpoint is unavailable at that specific provider — deprecated, renamed, retired, or never offered. Recovering from a retired/renamed model is one of the main reasons users configure fallback, so this defeated a core use case.Root cause
_classify_upstream_error(src/gateway/api/routes/_platform.py) classified only{400, 422}as terminal and{401, 403, 408, 429, 5xx}as retryable; every other status fell through to a defaultreturn False(non-retryable). So404/405/409/410short-circuited the attempts loop. Note the prior inconsistency:403(no access to a model) fell through, but404(model not found) did not.Fix
Add
404, 405, 409, 410to_FALLBACK_RETRYABLE_STATUS_CODES. These mean "model/endpoint unavailable at this provider" → fall through to the next attempt.400/422stay terminal: a malformed request is rejected by every provider, so falling through would only waste attempts.Tests
tests/unit/test_classify_upstream_error.py(new) — pins the classifier contract: which status codes fall through vs stop, plus timeout / network-error / no-status cases.tests/integration/test_platform_mode_messages.py::test_platform_mode_falls_through_on_404_model_unavailable— primary returns404, runner falls through, healthy secondary serves the response.Targeted suites green (
tests/unit/test_classify_upstream_error.py,tests/integration/test_platform_mode_messages.py,tests/integration/test_platform_mode_chat.py,tests/unit/test_run_platform_attempts.py).🤖 Generated with Claude Code
Summary
This PR fixes a critical issue where the gateway would prematurely fail requests with a 502 error instead of trying alternative providers when dealing with unavailable models or endpoints.
What Changed
Code Changes:
Specific Updates:
_FALLBACK_RETRYABLE_STATUS_CODESconstant to include the four additional HTTP status codesBenefits
Users who configure multiple providers for redundancy will now experience:
Technical Notes
The fix distinguishes between error categories: