-
-
Notifications
You must be signed in to change notification settings - Fork 196
near-total rewrite #725
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Draft
quantizor
wants to merge
24
commits into
main
Choose a base branch
from
maintain
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Draft
near-total rewrite #725
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
🦋 Changeset detectedLatest commit: d11def2 The changes in this PR will be included in the next version bump. This PR includes changesets to release 1 package
Not sure what this means? Click here to learn what changesets are. Click here if you're a maintainer who wants to add another changeset to this PR |
|
Review the following changes in direct dependencies. Learn more about Socket for GitHub.
|
this is a fundamental rewrite of the library to maximize throughput
the public API is essentially the same, except for some internal typing stuff
that is technically exposed in the exported namespace
it's really fast and a bunch of things have been improved, including svg
support for inlined xml
perf: optimize string slicing in matchInlineFormatting and parseRefLink
- Modify matchInlineFormatting to accept position parameters instead of sliced strings
- Eliminate unnecessary slice in parseRefLink by checking '](' pattern directly
- Update tests to use new matchInlineFormatting signature
Performance improvement: ~2.3x faster for simple markdown (250k vs 108k ops/sec)
perf: eliminate string slicing in parseLink and parseRefLink
- Pass positions directly to parseInlineSpan instead of slicing first
- Remove unused text variable in parseRefLink
- Maintains ~2.3x performance improvement
chore: adjust cursor rule
perf: eliminate state cloning with direct mutation
- Replace { ...state, ... } cloning with direct property mutation
- Save and restore original values after recursive calls
- Reduces allocations in hot paths
- Performance improvement: ~252k ops/sec vs ~248k baseline
cleanup
Consolidate duplicate URL parsing logic in parseLink and parseImage
- Extract shared parseUrlAndTitle function to eliminate code duplication
- Improves performance by ~1.2% (256k vs 253k ops/sec)
- Reduces bundle size and improves maintainability
Add early-exit optimizations for parser dispatch
- Skip HTML parsers early if disableParsingRawHTML is true
- Skip link parsers if inAnchor state
- Skip bare URL parser if disabled or inAnchor
- Performance neutral but improves code clarity
Optimize HTML entity processing by skipping regex when no & present
- Skip HTML_CHAR_CODE_R regex entirely when text contains no &
- Most text chunks don't have HTML entities, avoiding unnecessary regex
- Improves performance by ~7-8% (280k vs 260k ops/sec)
Also refactored parseInlineSpan to use character-based dispatch:
- Replace sequential if checks with switch statement
- Eliminates ~90% of unnecessary parser checks
- Inline text accumulation to avoid parseText function call overhead
perf: reduce state object cloning in parseMarkdown
Eliminate unnecessary object allocations by passing state directly instead of cloning with { ...state, inline: false }. Since parseMarkdown runs in block mode, state.inline is already false, making the clones redundant.
For buildListItemChildren, use mutation-with-restoration pattern instead of object spread.
Benchmark results show ~2-3% improvement on large markdown documents.
perf: optimize string concatenation in loops for large documents
Replace string concatenation (+=) with array.join() pattern in loops to avoid O(n²) string operations. Optimized:
- parseCodeBlock: content building
- parseCodeFenced: content building
- parseBlockQuote: rawContent building
For large documents with many lines, this reduces string allocation overhead by collecting pieces in an array and joining once at the end.
Benchmark results show ~1.4% improvement on large markdown documents.
feat: improve void element detection and fix malformed HTML handling
- Combine HTML5 and SVG void elements into single VOID_ELEMENTS Set
- Add isVoidElement() function that handles:
- HTML5 void elements (area, br, img, etc.)
- SVG void elements (circle, path, rect, etc.)
- Custom web components (hyphenated tags like my-component)
- Namespace prefixes (e.g., svg:circle -> circle)
- Add comprehensive parse tests for void element detection:
- SVG void elements (7 tests)
- Custom web components (5 tests)
- Non-void element protection (4 tests)
- Edge cases (4 tests)
- Markdown formatting integration (3 tests)
- Fix sibling <g> tag detection for malformed HTML:
- Check newline between opening tag end and sibling tag
- Ensures proper parsing of consecutive SVG <g> tags with newlines
- Restore URL autolink detection (only blocks when :// appears in tag, not attributes)
- Restore parsing of tags with attributes when no closing tag found (needed for sanitization)
perf: optimize character lookup functions using Sets
Replace string.indexOf() lookups with Set.has() for O(1) character checks:
- isSpecialInlineChar: convert SPECIAL_INLINE_CHARS string to Set
- isBlockStartChar: convert BLOCK_START_CHARS string to Set
These functions are called frequently in hot paths (every character check
in parseText and block parsing), so O(1) lookups provide better performance
especially for large documents.
Benchmark results show performance maintained (~924 ops/sec average for
large documents, within variance of previous baseline).
cleanup
chore: add metrics back
small optimization
perf: optimize HTML block parsing and parseParagraph
- Optimize matchHTMLBlock: replace character-by-character tag matching with substring comparisons for ~40-50% performance improvement on deeply nested HTML
- Optimize parseParagraph: combine line finding with empty line detection in single pass, reduce redundant findLineEnd calls, and optimize trimming logic
- Simplify string operations: use trim() and includes() helpers instead of manual loops where appropriate
- Optimize table parsing: simplify line and cell trimming logic
adjust footnotes
internal react 19 upgrade
use ts for benchmark
convert profile to ts
split bench suite up for faster iteration
Remove internal type definitions and rename RuleOutput to ASTRender
add failing test for 678
fix inline html handling
permission to grow
add rehype to bench and type the benchmark library
add changeset
Drop support for React versions less than 16
…ATTR regex - Optimize regex pattern from (?:[^>]*[^/])? to (?:[^>/]+[^/]|)> - Add early exit check before regex test to skip non-HTML attribute values - Fixes CodeQL warning about slow performance on strings starting with '<a' and many '!' repetitions
…nction - Replace complex regex vulnerable to exponential backtracking - Implement extractHTMLAttributes() function for safe attribute parsing - Update parseHTMLAttributes() and parseCodeFenced() to use new function - All 498 tests pass including fuzzing tests
…g function - Replace complex regex with character-by-character parsing - Fix edge cases with escaped quotes and closing tag detection - Update parseHTMLElement and parseHTMLSelfClosing to use new function - 497/498 tests pass (1 snapshot mismatch remaining)
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
no real public API changes but...
yeah
Note
Massive internal rewrite splitting parsing/rendering with major perf gains, cleanup of internal types, dropping React <16, and tooling/CI upgrades.
src/with new modular parser (parse.ts), types (types.ts), and utilities (utils.ts).index.tsx,match.ts) with faster streaming parser; adds HTML emission helper.RuleOutput→ASTRender.react >=16.src/index.tsx; increase size-limit; Node to v24; new TS benchmark suite; add CodeQL config/workflow.Written by Cursor Bugbot for commit 3a2ca40. This will update automatically on new commits. Configure here.