Skip to content

refactor(mdxish): replace brace-escaping preprocessing with lenient expression tokenizer#1531

Merged
kevinports merged 6 commits into
nextfrom
dimas/rm-17232-pages-with-react-scripts-break-after-mdxish-revert
Jul 2, 2026
Merged

refactor(mdxish): replace brace-escaping preprocessing with lenient expression tokenizer#1531
kevinports merged 6 commits into
nextfrom
dimas/rm-17232-pages-with-react-scripts-break-after-mdxish-revert

Conversation

@eaglethrost

@eaglethrost eaglethrost commented Jul 2, 2026

Copy link
Copy Markdown
Contributor
🎫 Resolve RM-17232

🎯 What does this PR do?

Removes the brace-escaping preprocessing in favour of a lenient MDX expression tokenizer.

We've carried escapeProblematicBraces for a while — a string pre-pass that tries to predict which {…} the parser will choke on and escapes them to \{. It's worked, but we keep having to iterate on it: every new case (code blocks, attributes, HTML boundaries, export ranges…) is another carve-out, because a regex/state-machine can only ever approximate the real grammar. It's gotten convoluted and fragile enough that patching it further felt like the wrong direction.

So I explored getting rid of it entirely to make brace handling more stable. This is the result: a ~90-line micromark tokenizer (lib/micromark/mdx-expression-lenient/) that returns nok on an unbalanced brace instead of throwing — micromark rolls back and the { renders as literal text, decided by the real grammar rather than a heuristic. Deletes ~1,600 LOC (the pre-pass + its tests) for ~90 LOC of tokenizer, and the disagreements just stop happening. Seems to work well across the suite.

Examples:

{1 + 1}            → 2                (balanced evaluates)
Hello {user.name   → Hello {user.name (unbalanced stays literal, no \{)
<div>{foo </div>   → <div>{foo </div> (no backslash leak)
export default function Page() {
  return <div>{cond ? (<p>a</p>) : (<p>b</p>)}</div>;
}

<Page />

Previously threw Could not parse import/exports with acorn; now renders.

🧪 QA tips

  • {1+1} still evaluates; unbalanced/paragraph-spanning braces render literally with no \{
  • Expressions inside callouts, lists, tables, headings, and export components
  • New suites: __tests__/lib/micromark/mdx-expression-lenient*.test.ts (incl. ReDoS vectors)

📸 Screenshot or Loom

N/A — parser-only change, covered by tests.

@coderabbitai

coderabbitai Bot commented Jul 2, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 5fe75297-ca4a-4e1f-8bc8-ef8c2137d48c

📥 Commits

Reviewing files that changed from the base of the PR and between a54d73d and 65afe74.

📒 Files selected for processing (1)
  • processor/transform/mdxish/remove-jsx-comments.ts
🔗 Linked repositories identified

CodeRabbit considers these linked repositories for cross-repo context during reviews:

  • readmeio/ai (manual)
  • readmeio/gitto (manual)
  • readmeio/markdown (manual)
  • readmeio/readme (manual)
🚧 Files skipped from review as they are similar to previous changes (1)
  • processor/transform/mdxish/remove-jsx-comments.ts

Walkthrough

This PR replaces brace-escaping preprocessing with a lenient MDX text-expression tokenizer that returns literal text for unbalanced {...} input. mdxish.ts now uses mdxExpressionLenient and a separate removeJSXComments module. The old preprocessing implementation is removed, and new Vitest coverage adds tokenizer, rendering, JSX comment, exported-declaration ternary, and ReDoS resistance tests.

Changes

  • Tokenizer: added mdxExpressionLenient and tokenizeTextExpression
  • Wiring: updated lib/mdxish.ts to use the new tokenizer and standalone JSX comment remover
  • Tests: added new tokenizer/rendering/ReDoS suites and exported-declaration ternary cases
  • Removal: deleted the old brace-escaping preprocessing module and its related tests

Sequence Diagram(s)

sequenceDiagram
  participant mdxishAstProcessor
  participant mdxExpressionLenient
  participant tokenizeTextExpression
  participant removeJSXComments

  mdxishAstProcessor->>removeJSXComments: strip JSX comments
  mdxishAstProcessor->>mdxExpressionLenient: build text-only expression extension
  mdxExpressionLenient->>tokenizeTextExpression: register tokenizer for '{'
  tokenizeTextExpression-->>mdxExpressionLenient: emit mdxTextExpression or nok
Loading

Related issues: Not specified.

Related PRs: Not specified.

Suggested labels: tests, parser, refactor

Suggested reviewers: Not specified.


Comment @coderabbitai help to get the list of available commands.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@processor/transform/mdxish/remove-jsx-comments.ts`:
- Around line 3-13: The JSDoc example for removeJSXComments is inconsistent with
JSX_COMMENT_REGEX: it currently shows spaced delimiters that would not match.
Update the example in removeJSXComments so the sample string uses the exact {/*
... */} form with no spaces around the braces, and keep the documented return
value aligned with the actual behavior of the regex.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: b7d4ef3b-27f9-41dc-91eb-c8c13dbc46bc

📥 Commits

Reviewing files that changed from the base of the PR and between 945fbf9 and f43c181.

📒 Files selected for processing (10)
  • __tests__/lib/mdxish/exports.test.ts
  • __tests__/lib/micromark/mdx-expression-lenient-redos.test.ts
  • __tests__/lib/micromark/mdx-expression-lenient.test.ts
  • __tests__/transformers/preprocess-jsx-expressions.test.ts
  • __tests__/transformers/preprocess-redos-attack.test.ts
  • lib/mdxish.ts
  • lib/micromark/mdx-expression-lenient/index.ts
  • lib/micromark/mdx-expression-lenient/syntax.ts
  • processor/transform/mdxish/preprocess-jsx-expressions.ts
  • processor/transform/mdxish/remove-jsx-comments.ts
🔗 Linked repositories identified

CodeRabbit considers these linked repositories for cross-repo context during reviews:

  • readmeio/ai (manual)
  • readmeio/gitto (manual)
  • readmeio/markdown (manual)
  • readmeio/readme (manual)
💤 Files with no reviewable changes (3)
  • tests/transformers/preprocess-jsx-expressions.test.ts
  • processor/transform/mdxish/preprocess-jsx-expressions.ts
  • tests/transformers/preprocess-redos-attack.test.ts

Comment thread processor/transform/mdxish/remove-jsx-comments.ts
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
coderabbitai[bot]
coderabbitai Bot previously approved these changes Jul 2, 2026
@kevinports kevinports merged commit 5c5a57c into next Jul 2, 2026
8 checks passed
@kevinports kevinports deleted the dimas/rm-17232-pages-with-react-scripts-break-after-mdxish-revert branch July 2, 2026 14:06
rafegoldberg pushed a commit that referenced this pull request Jul 2, 2026
## Version 14.11.0
### ✨ New & Improved

* **mdxish:** replace brace-escaping preprocessing with lenient expression tokenizer ([#1531](#1531)) ([5c5a57c](5c5a57c))
* **mdxish:** support library imports in declarations ([#1530](#1530)) ([fa6ee97](fa6ee97))

### 🛠 Fixes & Updates

* correct exports map ordering ([#1529](#1529)) ([08a63dd](08a63dd))

<!--SKIP CI-->
@rafegoldberg

Copy link
Copy Markdown
Collaborator

This PR was released!

🚀 Changes included in v14.11.0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants