feat(providers): add OpenAI Codex provider with Responses API support#128
feat(providers): add OpenAI Codex provider with Responses API support#128b3nw wants to merge 9 commits intoMirrowel:devfrom
Conversation
|
@mirrobot-agent gimme a review ya lazy bot |
|
@b3nw, I'm on it! I'll dive into the Codex provider implementation and give you a thorough review. 🕵️♂️ |
There was a problem hiding this comment.
Overall Assessment
The OpenAI Codex provider implementation is exceptionally thorough and well-integrated into the existing rotator architecture. The move to the Responses API allows for advanced features like reasoning output and streaming tool calls, which are critical for agentic workflows. The OAuth implementation with PKCE is robust and handles headless environments elegantly.
Architectural Feedback
- Separation of Concerns: The use of mixins (
OpenAIOAuthBase,CodexQuotaTracker) keeps the mainCodexProviderclass focused on request/response translation. - Resilience: The token refresh logic and re-auth coordinator handle credential expiration gracefully.
- Quota Tracking: Using virtual models (
_5h_window,_weekly_window) is a clever way to expose Codex's unique rate limit windows to the TUI.
Key Suggestions
- Error Handling: Refine the generic exceptions in the streaming loop to include more context (status codes, etc.).
- Fragile Parsing: Move away from regex-based retry-after extraction if the API provides structured alternatives.
- Backend API usage: Double-check the stability/longevity of the
chatgpt.com/backend-apiendpoints as they appear to be internal.
Questions for the Author
- GPT-5 Models: Are the
gpt-5model names intended as placeholders or are they targeting a specific internal preview? - System Message Conversion: Is the conversion of system messages to user messages a strict requirement of the Responses API, or a strategy to bypass the fixed
instructionsfield?
This review was generated by an AI assistant.
| error_message = error.get("message", "Response failed") | ||
|
|
||
| if error_message: | ||
| raise Exception(f"Codex response failed: {error_message}") |
There was a problem hiding this comment.
Consider using a more specific exception type here, or including the HTTP status code if available, to help with debugging and error handling downstream.
|
|
||
| # Try to extract from message | ||
| import re | ||
| match = re.search(r"try again in (\d+)s", message) |
There was a problem hiding this comment.
Parsing rate limit retry times from strings can be fragile. If the API provides a Retry-After header or a structured field in the JSON error response, those would be safer to use.
| # - If base contains /backend-api: use /wham/usage (ChatGptApi style) | ||
| # - Otherwise: use /api/codex/usage (CodexApi style) | ||
| # Since we use chatgpt.com/backend-api, we need /wham/usage | ||
| CODEX_USAGE_URL = "https://chatgpt.com/backend-api/wham/usage" |
There was a problem hiding this comment.
This looks like a internal backend API endpoint. Ensure that the proxy is intended to hit this directly and that there are no public-facing alternatives that might be more stable.
| if item.get("type") == "function_call": | ||
| call_id = item.get("call_id") or item.get("id", "") | ||
| name = item.get("name", "") | ||
| arguments = item.get("arguments", "") |
There was a problem hiding this comment.
User-provided system messages are converted to user messages here. While this ensures compatibility with the Responses API's strict instruction field, it might affect models that rely heavily on the system role for personality/constraints. It's worth confirming if this is the intended behavior for all Codex-style models.
|
@b3nw, I've finished the review! Overall, it's a very solid implementation. I've left some detailed feedback and a few questions in the formal review. Let me know if you want me to look into anything else! 🚀 |
|
I'm starting my review of the OpenAI Codex provider integration. I see some new retry logic for handling garbled tool calls in the Responses API - checking the implementation details now! |
There was a problem hiding this comment.
Follow-up Review: Codex Provider & Responses API
The implementation of the OpenAI Codex provider is very thorough and the integration with the Responses API is well-handled. The latest additions introducing retry logic for garbled tool calls are a welcome step toward production-level reliability.
Assessment of New Changes:
- Retry Logic: The introduction of
_stream_with_retryand_non_stream_with_retrycorrectly addresses the intermittent "ChatGPT internal format" reversion issue. - Marker Detection: The use of the
GARBLED_TOOL_CALL_MARKERis a practical heuristic for detecting when the model has entered an invalid state.
Key Suggestions:
- Streaming Duplication (High Priority): As noted in the inline comment, retrying a stream after yielding chunks will cause duplication of the initial content (like reasoning tokens) in the client's view. This should be addressed by buffering the start of the stream or implementing a "reset" signal for the client if possible.
- Fragmented Marker Detection: The current check might miss markers split across two network packets/chunks.
- Cleanup: There is a minor unreachable code path in the non-streaming retry wrapper.
Note on Previous Feedback:
The previous suggestions regarding specific exception types (line 1186), fragile regex-based retry extraction (line 1242), and the stability of internal backend API endpoints still apply and should be considered for future hardening.
This review was generated by an AI assistant.
| for attempt in range(GARBLED_TOOL_CALL_MAX_RETRIES): | ||
| garbled_detected = False | ||
|
|
||
| try: | ||
| async for chunk in self._stream_response( | ||
| client, headers, payload, model, reasoning_compat, credential_path | ||
| ): | ||
| # Inspect content deltas for garbled tool call marker | ||
| if hasattr(chunk, "choices") and chunk.choices: | ||
| choice = chunk.choices[0] | ||
| delta = getattr(choice, "delta", None) | ||
| if delta: | ||
| content = getattr(delta, "content", None) or "" | ||
| if GARBLED_TOOL_CALL_MARKER in content: | ||
| garbled_detected = True | ||
| lib_logger.warning( | ||
| f"[Codex] Garbled tool call detected in stream for {model}, " | ||
| f"attempt {attempt + 1}/{GARBLED_TOOL_CALL_MAX_RETRIES}. " | ||
| f"Content snippet: {content[:100]!r}" | ||
| ) | ||
| break # Stop consuming this stream | ||
|
|
||
| yield chunk | ||
|
|
||
| if not garbled_detected: | ||
| return # Stream completed successfully | ||
|
|
||
| except Exception: | ||
| if garbled_detected: | ||
| # Exception during stream teardown after garble detected - continue to retry | ||
| pass | ||
| else: | ||
| raise # Non-garble exception - propagate | ||
|
|
||
| # Garbled stream detected - retry if we have attempts left | ||
| if attempt < GARBLED_TOOL_CALL_MAX_RETRIES - 1: | ||
| lib_logger.info( | ||
| f"[Codex] Retrying request for {model} after garbled tool call " | ||
| f"(attempt {attempt + 2}/{GARBLED_TOOL_CALL_MAX_RETRIES})" | ||
| ) | ||
| await asyncio.sleep(GARBLED_TOOL_CALL_RETRY_DELAY) | ||
| else: | ||
| lib_logger.error( | ||
| f"[Codex] Garbled tool call persisted after {GARBLED_TOOL_CALL_MAX_RETRIES} " | ||
| f"attempts for {model}. Giving up." | ||
| ) | ||
| # Final attempt already yielded whatever chunks it had | ||
| return |
There was a problem hiding this comment.
There is a significant duplication risk in the streaming retry logic. Since yield chunk occurs inside the same loop that detects the garbled marker, any chunks yielded before the marker is found (e.g., reasoning tokens or early valid text) will be sent to the client. When the request is retried, the client will receive those same chunks again from the new stream, leading to duplicated content in the UI.
Consider buffering the early chunks or only retrying if no content has been yielded yet. If retrying mid-stream is unavoidable, the client should be notified to clear the previous partial response.
| delta = getattr(choice, "delta", None) | ||
| if delta: | ||
| content = getattr(delta, "content", None) or "" | ||
| if GARBLED_TOOL_CALL_MARKER in content: |
There was a problem hiding this comment.
Checking for GARBLED_TOOL_CALL_MARKER in individual chunks may fail if the marker (+#+#) is split across two consecutive chunks (e.g., +# followed by +#).
Consider maintaining a small rolling buffer of the last few characters to ensure the marker is detected even when fragmented.
| return response | ||
|
|
||
| # Should not reach here, but return the last response as fallback | ||
| return response |
There was a problem hiding this comment.
This line is unreachable because the return response at line 805 is inside the loop and covers the final iteration fallback.
41085fe to
f3739c2
Compare
There was a problem hiding this comment.
Follow-up Review: Anthropic & Cursor Integration
The PR continues to expand with robust implementations for Anthropic (OAuth + API key) and Cursor providers. The new Log Viewer TUI is also a great addition for observability.
Assessment of New Changes:
- Anthropic Provider: The dual routing (OAuth/API key) is well-handled. The interleaved thinking support is correctly implemented.
- Cursor Provider: Solid integration with the sidecar and clever usage of a virtual model for quota tracking.
- Log Viewer: A very functional addition. The lazy loading of metadata and prompts is a good choice for performance.
- Request Transforms: Centralized transforms for Kimi and GLM-5/4 help with model-specific quirks.
Note on Previous Feedback:
The critical issues raised in the previous review for src/rotator_library/providers/codex_provider.py (specifically the Streaming Duplication risk on line 760 and the Fragmented Marker detection on line 726) still apply as the file has not been modified in the latest commits. These should be addressed to ensure reliability in production scenarios.
This review was generated by an AI assistant.
| ANTHROPIC_MESSAGES_ENDPOINT = f"{ANTHROPIC_API_BASE}/v1/messages" | ||
|
|
||
| # Required headers for OAuth requests | ||
| ANTHROPIC_BETA_HEADER = "oauth-2025-04-20,interleaved-thinking-2025-05-14" |
There was a problem hiding this comment.
The date 2025-05-14 in the ANTHROPIC_BETA_HEADER seems very far in the future. Is this a placeholder or a requirement for a specific private beta?
| ANTHROPIC_USER_AGENT = "claude-cli/2.1.2 (external, cli)" | ||
|
|
||
| # Tool name prefix for OAuth path | ||
| TOOL_PREFIX = "mcp_" |
There was a problem hiding this comment.
Why is the TOOL_PREFIX set to mcp_? If this refers to "Model Context Protocol", it might be worth adding a comment explaining the choice, as it might affect client expectations for tool naming.
src/proxy_app/log_viewer.py
Outdated
| from rich.syntax import Syntax | ||
|
|
||
|
|
||
| def _get_logs_dir() -> Path: |
There was a problem hiding this comment.
This function re-implements logic already available in src/rotator_library/utils/paths.py. It is better to import get_logs_dir from there to ensure consistency and avoid duplication.
f3739c2 to
8b094e3
Compare
There was a problem hiding this comment.
Follow-up Review: Codex Garble Handling Refinements
The latest updates to the Codex provider significantly harden the garbled tool call detection and retry logic. The move to a buffer-then-flush strategy for streaming is a particularly robust fix for the duplication risks previously identified.
Assessment of New Changes:
- Robust Detection: Expanding
GARBLED_TOOL_CALL_MARKERSand implementingaccumulated_texttracking ensures that garbled output is caught even when markers are split across chunks. - Streaming Reliability: The buffer-then-flush approach in
_stream_with_retryis a great solution. While it adds latency to the initial chunk, it guarantees that no "corrupted" or duplicated content reaches the client. - Diagnostics: The new logging for
response.completedwill be very helpful for tracking down why the model sometimes reverts to ChatML-style tool calls.
Note on Previous Feedback:
- Resolved: The high-priority issue regarding Streaming Duplication and the concern about Fragmented Markers have been successfully addressed.
- Still Applies: The minor unreachable code path in
_non_stream_with_retry(line 863) still exists. Additionally, the previous suggestions regarding specific exception types (line 1255 ofcodex_provider.pyin the full file) and regex-based retry extraction in the quota tracker remain relevant for future hardening.
This review was generated by an AI assistant.
| return response | ||
|
|
||
| # Should not reach here, but return the last response as fallback | ||
| return response |
There was a problem hiding this comment.
This line remains unreachable because the return response at line 860 covers all exit paths from the loop. While minor, removing it would clean up the implementation.
| Uses a buffer-then-flush approach: all chunks are collected first, | ||
| then checked for the garbled marker. Only if the stream is clean | ||
| are chunks yielded to the caller. This allows true retry since | ||
| no chunks have been sent to the HTTP client yet. | ||
|
|
||
| Detection is done both per-chunk (for early abort) AND on the | ||
| accumulated text after stream completion (to catch markers that | ||
| are split across multiple SSE chunks). | ||
| """ | ||
| for attempt in range(GARBLED_TOOL_CALL_MAX_RETRIES): | ||
| garbled_detected = False | ||
| buffered_chunks: list = [] | ||
| accumulated_text = "" # Track all text content across chunks | ||
|
|
||
| try: | ||
| async for chunk in self._stream_response( |
There was a problem hiding this comment.
The buffer-then-flush strategy effectively addresses the streaming duplication risk identified in the previous review. Note that this does mean real-time streaming is disabled for the client as they must wait for the entire response to be generated before any chunks are yielded. Given the intermittent nature of the Responses API garbling, this seems like the correct trade-off for reliability.
| buffered_chunks.append(chunk) | ||
|
|
||
| # Post-stream check: inspect accumulated text for markers split across chunks | ||
| if not garbled_detected and _is_garbled_tool_call(accumulated_text): |
There was a problem hiding this comment.
Using accumulated_text for a post-stream check is a great addition to catch garble markers that might be split across network packets. This addresses the fragmented marker concern from the previous review.
8b094e3 to
64c0a37
Compare
There was a problem hiding this comment.
Follow-up Review
This update focuses on cleaning up diagnostic logging within the Codex provider's streaming logic.
Assessment of New Changes:
- Log Cleanup: The removal of the verbose
lib_logger.infoblock in theresponse.completedhandler is a sensible refinement now that the Responses API integration is stable.
Overall Status:
The implementation remains robust. The core logic for handling garbled tool calls and the buffer-then-flush streaming strategy are well-executed.
Note on Previous Feedback:
The following items from previous reviews remain unaddressed:
codex_provider.py: The unreachable code path at line 863 and the use of generic exceptions at lines 1123 and 1242.codex_quota_tracker.py: The recommendation to favor structured retry-after data over regex parsing where the API allows.anthropic_provider.py: The questions regarding the future-dated beta header (2025-05-14) and the choice of themcp_tool prefix.log_viewer.py: The suggestion to use the centralizedget_logs_dirutility for path consistency.
Great work on the cleanup!
This review was generated by an AI assistant.
… support (#33) * feat(rotator_library): ✨ add OpenAI Codex provider with Responses API support Adds a new provider for OpenAI Codex models (GPT-5, GPT-5.1, GPT-5.2, Codex, Codex Mini) via the ChatGPT Responses API with OAuth PKCE authentication. Key features: - OAuth base class for OpenAI authentication with PKCE flow and token refresh - Responses API streaming with SSE event handling - Reasoning/thinking output with configurable effort levels (minimal to xhigh) - Tool calling support translated from OpenAI format - System prompt validation using official opencode prompt - Usage tracking with proper litellm.Usage objects Files added: - codex_provider.py: Main provider implementation - openai_oauth_base.py: OAuth base class with PKCE support - codex_prompt.txt: Required system prompt for API validation * fix(codex): 🐛 prefix model names with provider identifier Models must be returned with the 'codex/' prefix (e.g., 'codex/gpt-5.2') to match the convention used by other providers like antigravity. This ensures proper provider routing in the RotatingClient. * feat(codex): ✨ add reasoning effort model variants Exposes reasoning effort levels as model variants: - codex/gpt-5.2:low, :medium, :high, :xhigh - codex/gpt-5.1:low, :medium, :high - And similar for all codex models This allows clients to control reasoning effort by model name, similar to how gemini models use :thinking suffix. Total models: 9 base + 30 reasoning variants = 39 models * feat(codex): ✨ add identity override for user system prompt precedence Injects an identity override as the first user message that tells the model to prioritize user-provided system prompts over the required opencode instructions. This mirrors the pattern used by Antigravity provider. Message order: 1. Identity override (<system_override> tag) 2. User's system message (converted to user message) 3. Rest of conversation Controlled by CODEX_INJECT_IDENTITY_OVERRIDE env var (default: true) * docs: 📝 add Codex provider configuration to .env.example Documents all Codex environment variables: - CODEX_REASONING_EFFORT: low/medium/high/xhigh - CODEX_REASONING_SUMMARY: auto/concise/detailed/none - CODEX_REASONING_COMPAT: think-tags/raw/none - CODEX_INJECT_IDENTITY_OVERRIDE: true/false - CODEX_INJECT_INSTRUCTION: true/false - CODEX_EMPTY_RESPONSE_ATTEMPTS: retry count - CODEX_EMPTY_RESPONSE_RETRY_DELAY: seconds - CODEX_OAUTH_PORT: callback port * fix(codex): 🐛 register Codex in provider factory for OAuth TUI Adds Codex to PROVIDER_MAP so it appears in the credential tool's "Add OAuth Credential" menu alongside other OAuth providers. * feat(codex): ✨ add quota tracking with rate limit header parsing Integrate CodexQuotaTracker mixin into CodexProvider to track API usage via response headers and periodic quota API calls. Enables smart credential selection based on remaining quota and automatic cooldowns when limits are approached. Signed-off-by: Moeeze Hassan <fammas.maz@gmail.com> * feat(codex): add gpt-5.3-codex to model list and reasoning config * feat(codex): wire quota header updates to UsageManager for TUI display - Add set_usage_manager() to CodexQuotaTracker mixin - Push parsed rate limit headers to UsageManager via update_quota_baseline - Inject UsageManager reference during RotatingClient initialization - Enables quota status display in TUI after each Codex API response * feat(codex): default to sequential credential selection for token caching Sequential mode reuses the same credential across requests, enabling upstream token caching on the Codex API. * fix(codex): standardize ChatGPT-Account-Id header casing Match the exact capitalization used by the Codex CLI, consistent with codex_quota_tracker.py. Addresses PR review feedback.
…, and reasoning config Squash of feature/codex-openai-auth branch improvements: - Fix model list: remove non-existent codex-mini, add gpt-5.3-codex-spark (Pro-only) - Add quota tracking with rate limit header parsing - Wire quota header updates to UsageManager for TUI display - Standardize ChatGPT-Account-Id header casing - Add identity override for user system prompt precedence - Add reasoning effort model variants with gpt-5.3 support - Default to sequential credential selection for token caching - Update .env.example documentation
…e_manager Align method signature with the calling convention in rotating_client.py. Also update update_quota_baseline calls to use the current keyword arg API.
Squash of fix/oauth-docker-url-paste branch. Allows OAuth completion via URL paste when redirect callbacks are unavailable (e.g., Docker containers without exposed callback ports).
Combines work from: - feature/codex-openai-auth (12 commits) - fix/think-tag-spacing (2 commits) - fix/codex-cached-tokens-and-rotation (1 commit) Changes include: - OpenAI Codex provider with Responses API support - Reasoning effort model variants - Identity override for user system prompt precedence - Provider factory registration for OAuth TUI - Quota tracking with rate limit header parsing - UsageManager wire-up for TUI display - Sequential credential selection for token caching - ChatGPT-Account-Id header standardization - Correct model list with gpt-5.3-codex-spark - Stream reasoning as reasoning_content instead of think-tags - Map cached tokens from Responses API - codex-mini alias and shared sequential rotation - Refactored openai_oauth_base.py for cleaner auth flow
…-codex gpt-5.3-codex-spark doesn't exist as a distinct model. Aliases now redirect spark requests to gpt-5.3-codex instead.
…iew items Root cause: Missing text.verbosity parameter in Responses API payload. Without it, the model defaulted to a verbose text mode that caused: 1. Tool calls emitted as garbled text content (+#+#, ♀♀♀♀, etc.) 2. Model announcing actions in text then stopping (finish_reason=stop) Changes: - Add text.verbosity: 'medium' to Codex Responses API payload - Fix garbled tool call detection: delta is a dict, not object - Add multi-marker detection: +#+#, to=functions., ♀♀♀♀ - Add accumulated text check for markers split across SSE chunks - Buffer-then-flush retry: buffer all chunks before yielding to allow retry - Replace generic Exception with StreamedAPIError for response.failed - Remove unreachable return statement in _non_stream_with_retry
…ntial rotation When Codex returns 'exceeds the context window' via response.failed SSE event, the error was classified as 'unknown' and rotated through all 5 credentials pointlessly. Now StreamedAPIError is classified as context_window_exceeded or invalid_request — both non-rotatable — so it fails immediately. Also applies PR review fixes to codex_provider.py: - Replace generic Exception with StreamedAPIError for response.failed - Remove unreachable return in _non_stream_with_retry - Import StreamedAPIError from core.errors
…ls on startup - Codex Responses API requires 'instructions' field to be non-optional. Always include it in payload; fall back to CODEX_SYSTEM_INSTRUCTION if no caller system prompt is provided. Fixes 'Instructions are required' 400 errors. - On first background job run, fetch quota baselines for ALL credentials and apply cooldowns for any that are exhausted. This prevents wasting a request attempt on a depleted credential after container restart.
64c0a37 to
73b27bf
Compare
|
@mirrobot-agent major cleanup and fixed a stall issue around instructions. |
There was a problem hiding this comment.
Follow-up Review: Codex Implementation Refined
The latest changes significantly harden the OpenAI Codex provider. The refactoring of instruction handling and the introduction of a robust streaming retry strategy address the most critical stability concerns raised in previous reviews.
Assessment of New Changes:
- Retry Logic: The
buffer-then-flushapproach in_stream_with_retrysuccessfully eliminates the risk of yielding duplicated or garbled content to the client. - Instruction Handling: Utilizing the
instructionsfield of the Responses API directly (with proper fallbacks) is a much cleaner approach than converting system messages to user messages and appears to have resolved the stall issues. - Error Handling: Replacing generic
Exceptioncalls withStreamedAPIErrorallows the rotator library to correctly classify and handle model-generated errors. - Cleanup: Unreachable code paths have been removed, and the overall provider logic is more concise.
Key Suggestions:
- Regex Parsing (Minor): The
parse_quota_errormethod still relies on regex forretry-afterextraction. While functional, favoring headers (if the Responses API eventually provides them) would be more robust. - Scope Clarification: I noticed that the Anthropic and Cursor providers, along with the Log Viewer, were deleted in this update. I assume this is a deliberate move to narrow the PR scope to Codex only, but please confirm if these were intended to be removed.
Overall Status:
The Codex implementation is now very stable and follows project conventions well. The addition of the bot management infrastructure also looks like a great step forward for the repository's maintenance.
This review was generated by an AI assistant.
| accumulated text after stream completion (to catch markers that | ||
| are split across multiple SSE chunks). | ||
| """ | ||
| for attempt in range(GARBLED_TOOL_CALL_MAX_RETRIES): |
There was a problem hiding this comment.
The buffer-then-flush strategy in _stream_with_retry is a robust solution to the streaming duplication risk. It ensures that no garbled content is yielded to the client before detection is complete.
| # The Codex Responses API requires the 'instructions' field — it's non-optional. | ||
| # Always include it; fall back to the Codex system instruction if nothing else. | ||
| if not instructions: | ||
| instructions = CODEX_SYSTEM_INSTRUCTION |
There was a problem hiding this comment.
The refactored instructions handling properly utilizes the Responses API field and includes a fallback to CODEX_SYSTEM_INSTRUCTION. This should effectively address the stall issues previously identified.
| error = evt.get("response", {}).get("error", {}) | ||
| error_msg = error.get("message", "Response failed") | ||
| lib_logger.error(f"Codex response failed: {error_msg}") | ||
| raise StreamedAPIError(f"Codex response failed: {error_msg}") |
There was a problem hiding this comment.
Thanks for replacing generic exceptions with StreamedAPIError. This allows for more granular error classification in the rotator.
|
|
||
| # Try to extract from message | ||
| import re | ||
| match = re.search(r"try again in (\d+)s", message) |
There was a problem hiding this comment.
This regex-based extraction still exists. While acceptable if the API doesn't provide structured retry data, consider using headers if available in the future.
Add full Codex provider implementation with OAuth-based authentication, Responses API integration, and quota tracking.
New files:
Integration points:
Important
Adds OpenAI Codex provider with OAuth authentication, Responses API support, and quota tracking.
CodexProviderincodex_provider.pyfor OpenAI Codex models with OAuth authentication and Responses API.CodexQuotaTracker.openai_oauth_base.pyfor token management.codex_quota_tracker.pyfor tracking rate limits and credits.UsageManager.provider_factory.pyand updates credential management.quota_viewer.pyto skip limitless quota groups in views.This description was created by
for 993793c. You can customize this summary. It will automatically update as commits are pushed.