-
Notifications
You must be signed in to change notification settings - Fork 5.7k
Fix: prevent Thought leakage in final answers #4229
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
- Raise OutputParserError instead of returning raw output on parse failures - Strip trailing ReAct markers appended after Final Answer
- Strip trailing ReAct markers appended after Final Answer - Fail closed on parse errors without leaking raw text - Avoid crash after max iterations
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR is being reviewed by Cursor Bugbot
Details
Your team is on the Bugbot Free tier. On this plan, Bugbot will review limited PRs each billing cycle for each member of your team.
To receive Bugbot reviews on all of your PRs, visit the Cursor dashboard to activate Pro and start your 14-day free trial.
| if marker.startswith("Thought:") and ("Action:" not in tail and "Action Input:" not in tail): | ||
| continue | ||
| if marker.startswith("Action:") and "Action Input:" not in tail: | ||
| continue |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Case-insensitive regex with case-sensitive guard conditions
Medium Severity
The regex _TRAILING_REACT_MARKER_RE uses the (?i) flag for case-insensitive matching, so it captures "thought:", "THOUGHT:", etc. However, the guard conditions use case-sensitive operations: marker.startswith("Thought:") and "Action:" not in tail. When the LLM outputs lowercase markers (e.g., "thought:"), match.group(1) returns "thought:", but startswith("Thought:") is False, so the guard is bypassed and text is unconditionally stripped. Uppercase "Thought:" without a following action block is protected by the guard and not stripped. This creates inconsistent stripping behavior based on the case of the markers in the LLM output.
Additional Locations (1)
|
|
||
| _CODE_SPAN_RE: re.Pattern[str] = re.compile( | ||
| r"(?is)<pre><code>.*?</code></pre>|<code>.*?</code>" | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Markdown code blocks not protected from stripping logic
Medium Severity
The _CODE_SPAN_RE regex only matches HTML <code> and <pre><code> tags, but not markdown triple-backtick code blocks. When an LLM outputs a final answer containing a markdown code block with ReAct-like content (e.g., explaining how agents work with examples), the _is_in_code check returns False for markers inside that block. If the markdown code block contains Thought: followed by Action: and Action Input:, the guard conditions don't protect it, and the content from the marker onward is incorrectly stripped. This is inconsistent with the existing markdown awareness shown in the trailing triple-backtick handling.
Additional Locations (1)
VedantMadane
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Solid fix for a real security/UX issue. The "fail closed" approach is the right call here. Some observations:
What I like:
- _strip_trailing_react_blocks is well-designed - the code span detection prevents false positives in legitimate code output
- The guard conditions (checking for Action Input: presence) reduce over-stripping
- Converting unexpected exceptions to OutputParserError keeps the retry loop functioning without leaking raw output
Minor suggestions:
-
Regex precompilation - Good that _CODE_SPAN_RE and _TRAILING_REACT_MARKER_RE are module-level compiled patterns.
-
Edge case question - What happens if the model outputs Final Answer: Here's how to use ReAct:\n\nThought: This is an example...? The current logic would strip the educational example. Might be worth documenting this known limitation, or checking if the "Thought:" appears within the first N characters after Final Answer.
-
The max-iterations fallback message is appropriately generic - good choice not to expose any internal state.
This addresses a real problem I've seen with Gemini models in particular. Nice work!
Problem
Agents can leak internal ReAct control text (e.g., Thought:, Action:, Action Input:, Observation:) to end users in two ways:
This shows up heavily with models that mimic prior Thought/Action/Observation patterns in history (e.g., Gemini), but it can happen with any provider.
Changes
<code> / <pre><code>spans.This PR enforces a strict output contract at the framework boundary (where AgentFinish.output is produced), rather than relying on app-side sanitizers. It prevents both parse-failure leaks and parse-success-but-contaminated leaks, while keeping the agent retry loop functional and avoiding crashes in the max-iterations edge case.
++ #4186
Note
Prevents leakage of internal ReAct control text to end users and hardens parsing behavior.
Final Answer:is present, strips trailingThought:/Action:/Action Input:/Observation:blocks outside code spans via_strip_trailing_react_blocks; also normalizes stray trailing backticks.format_answer()now fails closed by raisingOutputParserError(no raw output fallback); unexpected errors are wrapped intoOutputParserErrorfor retries.handle_max_iterations_exceeded()catchesOutputParserErrorand returns a safe genericAgentFinishmessage instead of crashing or exposing raw model output.Written by Cursor Bugbot for commit 11eacca. This will update automatically on new commits. Configure here.