Skip to content

Conversation

@quantizor
Copy link
Owner

@quantizor quantizor commented Nov 1, 2025

no real public API changes but...

+---------------------------------+------------------------+-----------------------+
|                                 │ simple markdown string │ large markdown string |
+---------------------------------+------------------------+-----------------------+
| markdown-to-jsx (next) [parse]  │ 2,180,089 ops/sec      │ 2,183 ops/sec         |
+---------------------------------+------------------------+-----------------------+
| markdown-to-jsx (8.0.0) [parse] │ 190,296 ops/sec        │ 1,336 ops/sec         |
+---------------------------------+------------------------+-----------------------+

yeah


Note

Massive internal rewrite splitting parsing/rendering with major perf gains, cleanup of internal types, dropping React <16, and tooling/CI upgrades.

  • Core/Parser:
    • Major refactor moving code to src/ with new modular parser (parse.ts), types (types.ts), and utilities (utils.ts).
    • Replaces monolithic implementation (index.tsx, match.ts) with faster streaming parser; adds HTML emission helper.
    • Significant performance improvements and expanded test coverage.
  • API/Types (breaking):
    • Remove internal namespace types; rename RuleOutputASTRender.
    • Minor AST/renderer signatures updated; footnote handling refined.
  • Compatibility:
    • Drop support for React < 16; upgrade to React 19 types; bump peer to react >=16.
  • Build/Tooling:
    • Update builds to src/index.tsx; increase size-limit; Node to v24; new TS benchmark suite; add CodeQL config/workflow.
  • Benchmarks:
    • Large perf gains vs 8.x on parse benchmarks (orders of magnitude on simple, ~60%+ on large).

Written by Cursor Bugbot for commit 3a2ca40. This will update automatically on new commits. Configure here.

@changeset-bot
Copy link

changeset-bot bot commented Nov 1, 2025

🦋 Changeset detected

Latest commit: d11def2

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 1 package
Name Type
markdown-to-jsx Major

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

@socket-security
Copy link

socket-security bot commented Nov 1, 2025

Review the following changes in direct dependencies. Learn more about Socket for GitHub.

Diff Package Supply Chain
Security
Vulnerability Quality Maintenance License
Updated@​types/​react@​17.0.89 ⏵ 19.2.2100 +110079 +295100
Updatedreact-dom@​17.0.2 ⏵ 19.2.0100 +110092 +397100

View full report

this is a fundamental rewrite of the library to maximize throughput

the public API is essentially the same, except for some internal typing stuff
that is technically exposed in the exported namespace

it's really fast and a bunch of things have been improved, including svg
support for inlined xml

perf: optimize string slicing in matchInlineFormatting and parseRefLink

- Modify matchInlineFormatting to accept position parameters instead of sliced strings
- Eliminate unnecessary slice in parseRefLink by checking '](' pattern directly
- Update tests to use new matchInlineFormatting signature

Performance improvement: ~2.3x faster for simple markdown (250k vs 108k ops/sec)

perf: eliminate string slicing in parseLink and parseRefLink

- Pass positions directly to parseInlineSpan instead of slicing first
- Remove unused text variable in parseRefLink
- Maintains ~2.3x performance improvement

chore: adjust cursor rule

perf: eliminate state cloning with direct mutation

- Replace { ...state, ... } cloning with direct property mutation
- Save and restore original values after recursive calls
- Reduces allocations in hot paths
- Performance improvement: ~252k ops/sec vs ~248k baseline

cleanup

Consolidate duplicate URL parsing logic in parseLink and parseImage

- Extract shared parseUrlAndTitle function to eliminate code duplication
- Improves performance by ~1.2% (256k vs 253k ops/sec)
- Reduces bundle size and improves maintainability

Add early-exit optimizations for parser dispatch

- Skip HTML parsers early if disableParsingRawHTML is true
- Skip link parsers if inAnchor state
- Skip bare URL parser if disabled or inAnchor
- Performance neutral but improves code clarity

Optimize HTML entity processing by skipping regex when no & present

- Skip HTML_CHAR_CODE_R regex entirely when text contains no &
- Most text chunks don't have HTML entities, avoiding unnecessary regex
- Improves performance by ~7-8% (280k vs 260k ops/sec)

Also refactored parseInlineSpan to use character-based dispatch:
- Replace sequential if checks with switch statement
- Eliminates ~90% of unnecessary parser checks
- Inline text accumulation to avoid parseText function call overhead

perf: reduce state object cloning in parseMarkdown

Eliminate unnecessary object allocations by passing state directly instead of cloning with { ...state, inline: false }. Since parseMarkdown runs in block mode, state.inline is already false, making the clones redundant.

For buildListItemChildren, use mutation-with-restoration pattern instead of object spread.

Benchmark results show ~2-3% improvement on large markdown documents.

perf: optimize string concatenation in loops for large documents

Replace string concatenation (+=) with array.join() pattern in loops to avoid O(n²) string operations. Optimized:
- parseCodeBlock: content building
- parseCodeFenced: content building
- parseBlockQuote: rawContent building

For large documents with many lines, this reduces string allocation overhead by collecting pieces in an array and joining once at the end.

Benchmark results show ~1.4% improvement on large markdown documents.

feat: improve void element detection and fix malformed HTML handling

- Combine HTML5 and SVG void elements into single VOID_ELEMENTS Set
- Add isVoidElement() function that handles:
  - HTML5 void elements (area, br, img, etc.)
  - SVG void elements (circle, path, rect, etc.)
  - Custom web components (hyphenated tags like my-component)
  - Namespace prefixes (e.g., svg:circle -> circle)
- Add comprehensive parse tests for void element detection:
  - SVG void elements (7 tests)
  - Custom web components (5 tests)
  - Non-void element protection (4 tests)
  - Edge cases (4 tests)
  - Markdown formatting integration (3 tests)
- Fix sibling <g> tag detection for malformed HTML:
  - Check newline between opening tag end and sibling tag
  - Ensures proper parsing of consecutive SVG <g> tags with newlines
- Restore URL autolink detection (only blocks when :// appears in tag, not attributes)
- Restore parsing of tags with attributes when no closing tag found (needed for sanitization)

perf: optimize character lookup functions using Sets

Replace string.indexOf() lookups with Set.has() for O(1) character checks:
- isSpecialInlineChar: convert SPECIAL_INLINE_CHARS string to Set
- isBlockStartChar: convert BLOCK_START_CHARS string to Set

These functions are called frequently in hot paths (every character check
in parseText and block parsing), so O(1) lookups provide better performance
especially for large documents.

Benchmark results show performance maintained (~924 ops/sec average for
large documents, within variance of previous baseline).

cleanup

chore: add metrics back

small optimization

perf: optimize HTML block parsing and parseParagraph

- Optimize matchHTMLBlock: replace character-by-character tag matching with substring comparisons for ~40-50% performance improvement on deeply nested HTML
- Optimize parseParagraph: combine line finding with empty line detection in single pass, reduce redundant findLineEnd calls, and optimize trimming logic
- Simplify string operations: use trim() and includes() helpers instead of manual loops where appropriate
- Optimize table parsing: simplify line and cell trimming logic

adjust footnotes

internal react 19 upgrade

use ts for benchmark

convert profile to ts

split bench suite up for faster iteration

Remove internal type definitions and rename RuleOutput to ASTRender

add failing test for 678

fix inline html handling

permission to grow

add rehype to bench and type the benchmark library

add changeset

Drop support for React versions less than 16
cursor[bot]

This comment was marked as outdated.

cursor[bot]

This comment was marked as outdated.

cursor[bot]

This comment was marked as outdated.

…ATTR regex

- Optimize regex pattern from (?:[^>]*[^/])? to (?:[^>/]+[^/]|)>
- Add early exit check before regex test to skip non-HTML attribute values
- Fixes CodeQL warning about slow performance on strings starting with '<a' and many '!' repetitions
…nction

- Replace complex regex vulnerable to exponential backtracking
- Implement extractHTMLAttributes() function for safe attribute parsing
- Update parseHTMLAttributes() and parseCodeFenced() to use new function
- All 498 tests pass including fuzzing tests
cursor[bot]

This comment was marked as outdated.

…g function

- Replace complex regex with character-by-character parsing
- Fix edge cases with escaped quotes and closing tag detection
- Update parseHTMLElement and parseHTMLSelfClosing to use new function
- 497/498 tests pass (1 snapshot mismatch remaining)
@quantizor quantizor marked this pull request as draft November 1, 2025 18:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants