Skip to content
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 10 additions & 4 deletions verifiers/utils/message_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -37,10 +37,16 @@ def _normalize_raw_message_content(message: dict[str, Any]) -> dict[str, Any]:
if isinstance(content, list):
normalized_parts = []
for part in content:
if isinstance(part, dict):
normalized_parts.append(from_raw_content_part(part))
else:
normalized_parts.append(part)
if isinstance(part, str):
# HuggingFace datasets may serialize content-part dicts to JSON
# strings when storing heterogeneous lists in Arrow tables.
try:
part = json.loads(part)
except (json.JSONDecodeError, TypeError):
pass
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

json.loads silently converts strings to non-dict types

Medium Severity

json.loads on a string content part can return non-dict types (e.g., "null"None, "123"123, "true"True, "[1,2]"list). These silently replace the original string in part, bypass the isinstance(part, dict) guard, and get appended as invalid ContentPart types. The parsed result is only useful when it's a dict, so the assignment to part needs to be guarded — e.g., only replacing part when isinstance(parsed, dict).

Fix in Cursor Fix in Web

if isinstance(part, dict) and "type" in part:
part = from_raw_content_part(part)
normalized_parts.append(part)
message = dict(message)
message["content"] = normalized_parts
return message
Expand Down
Loading