Skip to content

fix: strip markdown code fences from LLM JSON responses#1787

Open
Br1an67 wants to merge 1 commit intounclecode:mainfrom
Br1an67:fix/issue-1663-strip-markdown-json
Open

fix: strip markdown code fences from LLM JSON responses#1787
Br1an67 wants to merge 1 commit intounclecode:mainfrom
Br1an67:fix/issue-1663-strip-markdown-json

Conversation

@Br1an67
Copy link

@Br1an67 Br1an67 commented Mar 1, 2026

Summary

Some LLM providers (notably Claude Sonnet) wrap JSON responses in markdown code fences (json....) even when JSON mode is requested. This causes json.loads() to fail with JSONDecodeError in generate_schema() and the main extraction methods.

This PR adds a _strip_markdown_json() helper that detects and removes markdown fences before parsing. Applied to:

  • generate_schema() (sync)
  • agenerate_schema() (async)
  • _perform_extraction() sync and async paths

Fixes #1663

List of files changed and why

  • crawl4ai/extraction_strategy.py — Added _strip_markdown_json() helper and applied it to all json.loads() calls on LLM responses

How Has This Been Tested?

Verified locally with test cases covering:

  • json\n{...}\n (json-tagged fences)
  • \n{...}\n (untagged fences)
  • Plain JSON (no fences — passed through unchanged)
  • Array responses wrapped in fences

Checklist:

  • My code follows the style guidelines of this project
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have added/updated unit tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes

Some LLM providers (notably Claude) wrap JSON responses in markdown
code fences even when JSON mode is requested. This causes json.loads
to fail with JSONDecodeError.

Add a _strip_markdown_json helper that removes ```json....``` fences
before parsing. Applied to generate_schema (sync+async) and the main
extraction methods.

Fixes unclecode#1663
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: Claude Sonnet returns markdown-wrapped JSON despite JSON mode being enabled in generate_schema

1 participant