refactor(mdxish): replace brace-escaping preprocessing with lenient expression tokenizer#1531
Conversation
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Plus Run ID: 📒 Files selected for processing (1)
🔗 Linked repositories identifiedCodeRabbit considers these linked repositories for cross-repo context during reviews:
🚧 Files skipped from review as they are similar to previous changes (1)
WalkthroughThis PR replaces brace-escaping preprocessing with a lenient MDX text-expression tokenizer that returns literal text for unbalanced Changes
Sequence Diagram(s)sequenceDiagram
participant mdxishAstProcessor
participant mdxExpressionLenient
participant tokenizeTextExpression
participant removeJSXComments
mdxishAstProcessor->>removeJSXComments: strip JSX comments
mdxishAstProcessor->>mdxExpressionLenient: build text-only expression extension
mdxExpressionLenient->>tokenizeTextExpression: register tokenizer for '{'
tokenizeTextExpression-->>mdxExpressionLenient: emit mdxTextExpression or nok
Related issues: Not specified. Related PRs: Not specified. Suggested labels: tests, parser, refactor Suggested reviewers: Not specified. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@processor/transform/mdxish/remove-jsx-comments.ts`:
- Around line 3-13: The JSDoc example for removeJSXComments is inconsistent with
JSX_COMMENT_REGEX: it currently shows spaced delimiters that would not match.
Update the example in removeJSXComments so the sample string uses the exact {/*
... */} form with no spaces around the braces, and keep the documented return
value aligned with the actual behavior of the regex.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro Plus
Run ID: b7d4ef3b-27f9-41dc-91eb-c8c13dbc46bc
📒 Files selected for processing (10)
__tests__/lib/mdxish/exports.test.ts__tests__/lib/micromark/mdx-expression-lenient-redos.test.ts__tests__/lib/micromark/mdx-expression-lenient.test.ts__tests__/transformers/preprocess-jsx-expressions.test.ts__tests__/transformers/preprocess-redos-attack.test.tslib/mdxish.tslib/micromark/mdx-expression-lenient/index.tslib/micromark/mdx-expression-lenient/syntax.tsprocessor/transform/mdxish/preprocess-jsx-expressions.tsprocessor/transform/mdxish/remove-jsx-comments.ts
🔗 Linked repositories identified
CodeRabbit considers these linked repositories for cross-repo context during reviews:
readmeio/ai(manual)readmeio/gitto(manual)readmeio/markdown(manual)readmeio/readme(manual)
💤 Files with no reviewable changes (3)
- tests/transformers/preprocess-jsx-expressions.test.ts
- processor/transform/mdxish/preprocess-jsx-expressions.ts
- tests/transformers/preprocess-redos-attack.test.ts
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
## Version 14.11.0 ### ✨ New & Improved * **mdxish:** replace brace-escaping preprocessing with lenient expression tokenizer ([#1531](#1531)) ([5c5a57c](5c5a57c)) * **mdxish:** support library imports in declarations ([#1530](#1530)) ([fa6ee97](fa6ee97)) ### 🛠 Fixes & Updates * correct exports map ordering ([#1529](#1529)) ([08a63dd](08a63dd)) <!--SKIP CI-->
This PR was released!🚀 Changes included in v14.11.0 |

🎯 What does this PR do?
Removes the brace-escaping preprocessing in favour of a lenient MDX expression tokenizer.
We've carried
escapeProblematicBracesfor a while — a string pre-pass that tries to predict which{…}the parser will choke on and escapes them to\{. It's worked, but we keep having to iterate on it: every new case (code blocks, attributes, HTML boundaries,exportranges…) is another carve-out, because a regex/state-machine can only ever approximate the real grammar. It's gotten convoluted and fragile enough that patching it further felt like the wrong direction.So I explored getting rid of it entirely to make brace handling more stable. This is the result: a ~90-line micromark tokenizer (
lib/micromark/mdx-expression-lenient/) that returnsnokon an unbalanced brace instead of throwing — micromark rolls back and the{renders as literal text, decided by the real grammar rather than a heuristic. Deletes ~1,600 LOC (the pre-pass + its tests) for ~90 LOC of tokenizer, and the disagreements just stop happening. Seems to work well across the suite.Examples:
{1 + 1} → 2 (balanced evaluates) Hello {user.name → Hello {user.name (unbalanced stays literal, no \{) <div>{foo </div> → <div>{foo </div> (no backslash leak)export default function Page() { return <div>{cond ? (<p>a</p>) : (<p>b</p>)}</div>; } <Page />Previously threw
Could not parse import/exports with acorn; now renders.🧪 QA tips
{1+1}still evaluates; unbalanced/paragraph-spanning braces render literally with no\{exportcomponents__tests__/lib/micromark/mdx-expression-lenient*.test.ts(incl. ReDoS vectors)📸 Screenshot or Loom
N/A — parser-only change, covered by tests.