fix(mdxish): <HTMLBlocks> inside <Table> not rendering#1484
Conversation
There was a problem hiding this comment.
It seems like most of the complexity in this PR is dealing with the marker that protectHTMLBlockContent adds.
But we recently merged #1455 which shields multi-line template literal content in component bodies from html parsing. I wonder if it's possible to just remove protectHTMLBlockContent to significantly simplify everything here? I definitely didn't consider the HTMLBlock use case when working on #1455 and never tried removing that pre-processor.
|
i have this work in #1439 to create a tokenizer for our HTMLBlock? maybe that can help with eliminating the need to protect html block content altogether? @eaglethrost @kevinports |
|
Yeah @kevinports @maximilianfalco I've had a rethink of the approach and I find that we can reuse both Kevin's work in #1455 and falco's tokenizer work in #1439:
Will move this to draft first to consolidate the combined logic & investigating the bug |
|
Having this fix working & now allowing html blocks inside table uncovered a a bug where the block content in the table gets indented in the editor, but actually renders fine in view mode. Interestingly this also happens in the old editor in MDX, so it doesn't look like an issue from this PR specifically and would be a separate fix I'll investigate. Demo of this in old editor & MDX project, notice how the block content gets indented after round trips: Screen.Recording.2026-05-28.at.11.10.10.pm.movIt looks like it's taking the indented space literally in the deserialization, might be an editor side fix required. |
kevinports
left a comment
There was a problem hiding this comment.
Very glad to drop all the preprocessing with this revised approach. Lgtm 👍
| 🎫 Resolve ISSUE_ID | | :-----------------: | ## 🎯 What does this PR do? While fixing HTMLBlocks not rendering inside Tables in mdxish, I noticed that once it worked, an editor round trip would unexpectedly indent the block content lines in the editor, even though the rendering is fine & not affected. See this demo: https://github.com/user-attachments/assets/5b3fa862-b493-417b-b48f-fac82650133b The root issue is actually the HTMLBlock transformer in the engine captures the content verbatim from the source, each content line's leading whitespace is exactly the characters that sit between the backticks, measured from column 1. Since the Table content is indented in serialisation, the leading whitespaces exist. The fix I went for here is in the block content extraction code, we pass in the `<HTMLBlock>` opening tag position & deindent each line relative to that, instead of the starting column. I think it makes sense to use the tag as the anchor column, there's a few ways we can decide that. I also think the fix should be the engine side cause I don't think it should capture the content verbatim anyway (briefly considered putting fix in the editor). Note that this happens in MDX as well. Haven't investigated yet but it's likely it's an engine issue as well and not the editor. ## 🧪 QA tips <!-- Unique code decisions, code walkthroughs, how to test them --> The fix deindents each `<HTMLBlock>` content line **relative to the opening tag's column**, not the start of the line. To verify, paste each example into the mdxish editor, confirm it renders correctly, then do an editor round-trip (e.g. view as Markdown and reopen) — the content lines should **not** gain extra leading indentation. - [ ] **Indented `<HTMLBlock>` (nested under a list item)** ````md 1. Here is some custom HTML: <HTMLBlock>{` <div style="color: red;"> <p>Hello</p> <p>World</p> </div> `}</HTMLBlock> ```` The extracted content should be deindented relative to the `<HTMLBlock>` tag, so the `<div>` sits at column 0 and the `<p>`s keep their relative 2-space indent: ```html <div style="color: red;"> <p>Hello</p> <p>World</p> </div> ``` Before the fix, every round-trip would keep the list's 3-space indentation on each line (and compound it on repeated trips). - [ ] **`<HTMLBlock>` inside a `<Table>` cell** ````md <Table> <thead> <tr><th>Name</th><th>Markup</th></tr> </thead> <tbody> <tr> <td>Custom</td> <td><HTMLBlock>{`<div style="color: red;"> <p>Hello</p> <p>World</p> </div>`}</HTMLBlock></td> </tr> </tbody> </Table> ```` The table should stay a JSX `<Table>` and the cell should render the raw HTML. The extracted content should preserve the author's relative indentation without the table-cell serialization indentation leaking into the lines: ```html <div style="color: red;"> <p>Hello</p> <p>World</p> </div> ``` ## 📸 Screenshot or Loom Demo of block inside Table where the indents are retained: https://github.com/user-attachments/assets/68178bd0-0d44-4ebc-8dbb-86be1b2fad8a
## Version 14.7.0 ### ✨ New & Improved * **images:** allow non centered images to have caption ([#1502](#1502)) ([15616ea](15616ea)) ### 🛠 Fixes & Updates * **mdxish:** <HTMLBlocks> inside <Table> not rendering ([#1484](#1484)) ([3817fa1](3817fa1)), closes [#1455](#1455) * **mdxish:** normalize spacing for blank-line-split table tags ([#1493](#1493)) ([f162158](f162158)) <!--SKIP CI-->
This PR was released!🚀 Changes included in v14.7.0 |
Addresses PR review requests for more table-rendering coverage and a callout example, each fixture grounded in a merged bug fix: - jsx-table-multiline-cells (#1445) — multi-paragraph cells preserved - jsx-table-unclosed-cells (#1465) — asymmetric/unclosed cell tags recovered - table-unwrapped-rows (#1458, #1411) — rows missing <tr>/<tbody> wrappers - htmlblock-in-table (#1484) — <HTMLBlock> inside a <Table> cell - legacy-vars-in-table (#1458) — legacy <<vars>> in raw table cells - callout-icons (#1498) — blockquote + FA-class-icon callout render Also refreshes the divergent, htmlblock-with-script, and jsx-attribute-entities snapshots to reflect engine output changes pulled in from the origin/next merge (invalid <p> wrappers removed around block elements; <figcaption> now a direct child of <figure>). Claude-Session: https://claude.ai/code/session_01GPTShf49qTsVP1AxSbpRJk

🎯 What does this PR do?
To try to fix an issue where
<HTMLBlock>is not rendering inside JSX<Table>, this PR makes substantial changes to how we parse HTMLBlocks syntax by moving away from the string-level content protection we've been doing and reusing the existing MDX tokenizer for it.Root cause of rendering issue: We have a preprocessing step in the pipeline where HTMLBlock bodies encoded into an HTML-comment marker (
<!--RDMX_HTMLBLOCK:…-->) inpreprocessJSXExpressions, then decoded back further down the pipeline to be transformed to HTMLBlock nodes. When the<HTMLBlock>is inside a<Table>, the table transformer which still has the encoded HTMLBlock fails to parse it since it uses remarkMdx which turns out rejects HTML comments, making the table never parsed. The blocks were encoded because we didn't want its content to be modified by other preprocessing steps & it's usage of the curly braces could cause expression parsing issues.Approach: We now actually can stop protecting and decoding. Now that the
mdxComponenttokenizer can capture component bodies, including multiline{…}template literals, thanks to the brace-aware body states added in #1455, we can now let the tokenizer claim<HTMLBlock>and read its body straight from the parsed template-literal expression. No marker round-trip, no comment for remarkMdx to choke on. (This is the same direction as @maximilianfalco's HTMLBlock-tokenizer work in #1439.)What changed:
<HTMLBlock>. Split the exclusion set so the micromarkmdxComponentconstruct captures<HTMLBlock>(newTOKENIZER_MDX_COMPONENT_EXCLUDED_TAGS), while the remark string-reparse transforms still leave it alone — re-parsing it there is what would mangle bodies containing unbalanced-looking braces.mdxish-html-blocks.ts) Now the transformer deals with different input data to extract:mdxJsxFlowElement/mdxJsxTextElement) — block context (e.g.<Callout>) and table cells (after their remarkMdx re-parse);<div>(CommonMark slurps these whole, so we split them back out);<HTMLBlock>open/close arriving as separate siblings around the expression.mdxish-tableskeeps a table as a JSX<Table>when a cell contains an<HTMLBlock>(block-level content a GFM cell can't represent).protectHTMLBlockContent+ theRDMX_HTMLBLOCKmarkers, the base64 encode/decode paths, and the table-specific comment-neutralization workaround. HTMLBlock handling collapses from four locations down to one.🧪 QA tips
<HTMLBlock>inside a<Table>cell and confirm the HTML renders without breaking the table, and sibling cells still get markdown:safeMode/runScriptssurvive, and multiple HTMLBlocks in one table all render.<HTMLBlock>and<HTMLBlock>in a generic<div>still render as before.__tests__/lib/mdxish/html-blocks.test.ts.Demo (before & after):
Screen.Recording.2026-05-25.at.7.32.39.pm.mov