near-total rewrite #725

quantizor · 2025-11-01T05:07:36Z

no real public API changes but...

+---------------------------------+------------------------+-----------------------+
|                                 │ simple markdown string │ large markdown string |
+---------------------------------+------------------------+-----------------------+
| markdown-to-jsx (next) [parse]  │ 2,180,089 ops/sec      │ 2,183 ops/sec         |
+---------------------------------+------------------------+-----------------------+
| markdown-to-jsx (8.0.0) [parse] │ 190,296 ops/sec        │ 1,336 ops/sec         |
+---------------------------------+------------------------+-----------------------+

yeah

Note

Massive internal rewrite splitting parsing/rendering with major perf gains, cleanup of internal types, dropping React <16, and tooling/CI upgrades.

Core/Parser:
- Major refactor moving code to src/ with new modular parser (parse.ts), types (types.ts), and utilities (utils.ts).
- Replaces monolithic implementation (index.tsx, match.ts) with faster streaming parser; adds HTML emission helper.
- Significant performance improvements and expanded test coverage.
API/Types (breaking):
- Remove internal namespace types; rename RuleOutput → ASTRender.
- Minor AST/renderer signatures updated; footnote handling refined.
Compatibility:
- Drop support for React < 16; upgrade to React 19 types; bump peer to react >=16.
Build/Tooling:
- Update builds to src/index.tsx; increase size-limit; Node to v24; new TS benchmark suite; add CodeQL config/workflow.
Benchmarks:
- Large perf gains vs 8.x on parse benchmarks (orders of magnitude on simple, ~60%+ on large).

^{Written by Cursor Bugbot for commit 3a2ca40. This will update automatically on new commits. Configure here.}

changeset-bot · 2025-11-01T05:07:39Z

🦋 Changeset detected

Latest commit: d11def2

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 1 package

Name	Type
markdown-to-jsx	Major

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

socket-security · 2025-11-01T05:08:11Z

Review the following changes in direct dependencies. Learn more about Socket for GitHub.

Diff	Package	Supply Chain Security	Vulnerability	Quality	Maintenance	License
	@types/react@17.0.89 ⏵ 19.2.2	⁺¹		⁺²
	react-dom@17.0.2 ⏵ 19.2.0	⁺¹		⁺³

View full report

src/parse.spec.ts

this is a fundamental rewrite of the library to maximize throughput the public API is essentially the same, except for some internal typing stuff that is technically exposed in the exported namespace it's really fast and a bunch of things have been improved, including svg support for inlined xml perf: optimize string slicing in matchInlineFormatting and parseRefLink - Modify matchInlineFormatting to accept position parameters instead of sliced strings - Eliminate unnecessary slice in parseRefLink by checking '](' pattern directly - Update tests to use new matchInlineFormatting signature Performance improvement: ~2.3x faster for simple markdown (250k vs 108k ops/sec) perf: eliminate string slicing in parseLink and parseRefLink - Pass positions directly to parseInlineSpan instead of slicing first - Remove unused text variable in parseRefLink - Maintains ~2.3x performance improvement chore: adjust cursor rule perf: eliminate state cloning with direct mutation - Replace { ...state, ... } cloning with direct property mutation - Save and restore original values after recursive calls - Reduces allocations in hot paths - Performance improvement: ~252k ops/sec vs ~248k baseline cleanup Consolidate duplicate URL parsing logic in parseLink and parseImage - Extract shared parseUrlAndTitle function to eliminate code duplication - Improves performance by ~1.2% (256k vs 253k ops/sec) - Reduces bundle size and improves maintainability Add early-exit optimizations for parser dispatch - Skip HTML parsers early if disableParsingRawHTML is true - Skip link parsers if inAnchor state - Skip bare URL parser if disabled or inAnchor - Performance neutral but improves code clarity Optimize HTML entity processing by skipping regex when no & present - Skip HTML_CHAR_CODE_R regex entirely when text contains no & - Most text chunks don't have HTML entities, avoiding unnecessary regex - Improves performance by ~7-8% (280k vs 260k ops/sec) Also refactored parseInlineSpan to use character-based dispatch: - Replace sequential if checks with switch statement - Eliminates ~90% of unnecessary parser checks - Inline text accumulation to avoid parseText function call overhead perf: reduce state object cloning in parseMarkdown Eliminate unnecessary object allocations by passing state directly instead of cloning with { ...state, inline: false }. Since parseMarkdown runs in block mode, state.inline is already false, making the clones redundant. For buildListItemChildren, use mutation-with-restoration pattern instead of object spread. Benchmark results show ~2-3% improvement on large markdown documents. perf: optimize string concatenation in loops for large documents Replace string concatenation (+=) with array.join() pattern in loops to avoid O(n²) string operations. Optimized: - parseCodeBlock: content building - parseCodeFenced: content building - parseBlockQuote: rawContent building For large documents with many lines, this reduces string allocation overhead by collecting pieces in an array and joining once at the end. Benchmark results show ~1.4% improvement on large markdown documents. feat: improve void element detection and fix malformed HTML handling - Combine HTML5 and SVG void elements into single VOID_ELEMENTS Set - Add isVoidElement() function that handles: - HTML5 void elements (area, br, img, etc.) - SVG void elements (circle, path, rect, etc.) - Custom web components (hyphenated tags like my-component) - Namespace prefixes (e.g., svg:circle -> circle) - Add comprehensive parse tests for void element detection: - SVG void elements (7 tests) - Custom web components (5 tests) - Non-void element protection (4 tests) - Edge cases (4 tests) - Markdown formatting integration (3 tests) - Fix sibling <g> tag detection for malformed HTML: - Check newline between opening tag end and sibling tag - Ensures proper parsing of consecutive SVG <g> tags with newlines - Restore URL autolink detection (only blocks when :// appears in tag, not attributes) - Restore parsing of tags with attributes when no closing tag found (needed for sanitization) perf: optimize character lookup functions using Sets Replace string.indexOf() lookups with Set.has() for O(1) character checks: - isSpecialInlineChar: convert SPECIAL_INLINE_CHARS string to Set - isBlockStartChar: convert BLOCK_START_CHARS string to Set These functions are called frequently in hot paths (every character check in parseText and block parsing), so O(1) lookups provide better performance especially for large documents. Benchmark results show performance maintained (~924 ops/sec average for large documents, within variance of previous baseline). cleanup chore: add metrics back small optimization perf: optimize HTML block parsing and parseParagraph - Optimize matchHTMLBlock: replace character-by-character tag matching with substring comparisons for ~40-50% performance improvement on deeply nested HTML - Optimize parseParagraph: combine line finding with empty line detection in single pass, reduce redundant findLineEnd calls, and optimize trimming logic - Simplify string operations: use trim() and includes() helpers instead of manual loops where appropriate - Optimize table parsing: simplify line and cell trimming logic adjust footnotes internal react 19 upgrade use ts for benchmark convert profile to ts split bench suite up for faster iteration Remove internal type definitions and rename RuleOutput to ASTRender add failing test for 678 fix inline html handling permission to grow add rehype to bench and type the benchmark library add changeset Drop support for React versions less than 16

…ATTR regex - Optimize regex pattern from (?:[^>]*[^/])? to (?:[^>/]+[^/]|)> - Add early exit check before regex test to skip non-HTML attribute values - Fixes CodeQL warning about slow performance on strings starting with '<a' and many '!' repetitions

…nction - Replace complex regex vulnerable to exponential backtracking - Implement extractHTMLAttributes() function for safe attribute parsing - Update parseHTMLAttributes() and parseCodeFenced() to use new function - All 498 tests pass including fuzzing tests

…g function - Replace complex regex with character-by-character parsing - Fix edge cases with escaped quotes and closing tag detection - Update parseHTMLElement and parseHTMLSelfClosing to use new function - 497/498 tests pass (1 snapshot mismatch remaining)

src/index.tsx

create src directory

a4da988

github-advanced-security bot found potential problems Nov 1, 2025

View reviewed changes

src/parse.spec.ts Dismissed Show dismissed Hide dismissed

src/parse.spec.ts Dismissed Show dismissed Hide dismissed

quantizor force-pushed the maintain branch from c5a25be to 1f35e24 Compare November 1, 2025 05:11

configure codeql to ignore tests

5a3ba40

This comment was marked as outdated.

Sign in to view

fix test

526f96b

quantizor mentioned this pull request Nov 1, 2025

Inconsistent <u> tag formatting #678

Open

This comment was marked as outdated.

Sign in to view

quantizor added 2 commits November 1, 2025 01:31

delete old file

ea129eb

remove whitespace

abcd3d4

This comment was marked as outdated.

Sign in to view

quantizor added 5 commits November 1, 2025 01:36

flatten ast nodes

f8457d2

tripwire

bd4c557

refine regex

5208942

This comment was marked as outdated.

Sign in to view

quantizor added 5 commits November 1, 2025 02:59

return type

aa2f543

fix html comment parsing

a12e498

experimental raw html output function

105da00

clean up unused node type

3a2ca40

quantizor marked this pull request as draft November 1, 2025 18:36

cursor bot reviewed Nov 1, 2025

View reviewed changes

src/index.tsx Outdated Show resolved Hide resolved

quantizor added 3 commits November 1, 2025 14:41

simplify html block regex

e974827

delete dead code

53ec6a5

add failing test

b6b1c03

quantizor added 5 commits November 1, 2025 15:03

fix and optimize

dfc0e12

remove stuff we don't need

c6fd277

refactor html parsing

2f0382b

adjust profile script to omit react-related functions

a7e67c6

various optimizations

d11def2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

near-total rewrite #725

near-total rewrite #725

Uh oh!

quantizor commented Nov 1, 2025 •

edited by cursor bot

Loading

Uh oh!

changeset-bot bot commented Nov 1, 2025 •

edited

Loading

Uh oh!

socket-security bot commented Nov 1, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

This comment was marked as outdated.

Uh oh!

This comment was marked as outdated.

Uh oh!

This comment was marked as outdated.

Uh oh!

This comment was marked as outdated.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

near-total rewrite #725

Are you sure you want to change the base?

near-total rewrite #725

Uh oh!

Conversation

quantizor commented Nov 1, 2025 • edited by cursor bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

changeset-bot bot commented Nov 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🦋 Changeset detected

Uh oh!

socket-security bot commented Nov 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

This comment was marked as outdated.

Uh oh!

This comment was marked as outdated.

Uh oh!

This comment was marked as outdated.

Uh oh!

This comment was marked as outdated.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

quantizor commented Nov 1, 2025 •

edited by cursor bot

Loading

changeset-bot bot commented Nov 1, 2025 •

edited

Loading

socket-security bot commented Nov 1, 2025 •

edited

Loading