Skip to content

Broken markdown when doc contains expressive code <script> tag #26084

@bratvanov

Description

@bratvanov

Example URL(s)

On pages like /workers/wrangler/commands/, “Copy Page” and "View Page as Markdown" return "Not Found" – check https://developers.cloudflare.com/workers/wrangler/commands/index.md

The breakage comes from the Expressive Code integration. It renders the copy-to-clipboard button by placing the raw snippet in data-code, so a command such as wrangler deploy [<SCRIPT>] [OPTIONS] ends up as:

<button  data-code="wrangler deploy [<SCRIPT>] [OPTIONS]">

node-html-parser then sees <SCRIPT and assumes a <script> tag was opened but never closed, which corrupts .sl-markdown-content and causes the 404.

Actual Behavior

Switching the parser here to parse(html, { parseNoneClosedTags: true }) makes the Markdown export succeed, but it’s a band-aid. That option instructs the parser to auto-close any tag it thinks is left hanging, so <SCRIPT> starters stay “open” until the end of the document. Because no real </SCRIPT> exists, the synthetic closing tag lands after the footer. As soon as .sl-markdown-content sits inside that synthetic wrapper, its closing </div> stops matching and the Markdown export starts including all of the footer/sidebar as well.

Expected Behavior

Perhaps a better fix would be to escape those raw tags inside data-code (and similar attributes) before they reach node-html-parser. The browser would still decode them for copy-to-clipboard, but the HTML would remain valid and the Markdown generation would stop breaking.

Metadata

Metadata

Assignees

Labels

engineeringProblems or updates to developers.cloudflare.com website

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions