Optimize MHTML parser: 1.5x faster on large files by sweenzor · Pull Request #1928 · gildas-lormeau/SingleFile

sweenzor · 2026-03-21T20:05:09Z

Speeds up MHTML parsing ~1.5x on large files via three changes:

parse.js — Replace splice(len, 0, ...next) with a push() loop; use length truncation for soft line breaks.
util.js decodeBinary — Build string in 8KB chunks via String.fromCharCode.apply() instead of char-by-char.
util.js decodeBase64 — Direct Uint8Array loop instead of atob().split("").map(). Added try/catch for malformed input.

Size	Before	After	Speedup
10KB	0.51ms	0.37ms	1.4x
1MB	63.2ms	43.1ms	1.5x
10MB	572ms	423ms	1.4x
100MB	6063ms	3982ms	1.5x

To reproduce, save a page as MHTML from Chrome, then:

// node bench.mjs path/to/file.mhtml
import { readFileSync } from "fs";
import { parse } from "./src/lib/mhtml-to-html/parse.js";
const mhtml = new Uint8Array(readFileSync(process.argv[2]));
const runs = 10, times = [];
for (let i = 0; i < runs; i++) {
  const start = performance.now();
  parse(mhtml);
  times.push(performance.now() - start);
}
times.sort((a, b) => a - b);
console.log(`Median: ${times[Math.floor(runs / 2)].toFixed(1)}ms`);

Three targeted performance fixes in the JS MHTML parser hot path: 1. parse.js: Replace splice(...spread) with push loop for byte accumulation. The old `resource.data.splice(len, 0, ...next)` spread a Uint8Array into individual arguments on every line of MHTML content — O(n) per call in a tight loop over thousands of lines. A simple `push(next[i])` loop avoids the spread overhead entirely. Truncation via `data.length -= N` replaces splice for quoted-printable soft line break removal. 2. util.js decodeBinary: Replace character-by-character string concatenation (`data += String.fromCharCode(byte)`) with chunked `String.fromCharCode.apply(null, chunk)` joined at the end. The old approach was O(n²) due to string immutability; each `+=` allocated a new string. Chunks of 8192 bytes stay within the call stack limit for `apply`. 3. util.js decodeBase64: Replace `atob(v).split("").map(c => c.charCodeAt(0))` with a pre-allocated Uint8Array filled via a direct for-loop. The old approach created two intermediate arrays (one from split, one from map) that were immediately discarded. Benchmarked on synthetic MHTML fixtures (before → after): - 10KB: 0.51ms → 0.37ms (1.4x) - 1MB: 63.2ms → 43.1ms (1.5x) - 10MB: 572ms → 423ms (1.4x) - 100MB: 6063ms → 3982ms (1.5x) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Optimize MHTML parser: 1.5x faster on large files#1928

Optimize MHTML parser: 1.5x faster on large files#1928
sweenzor wants to merge 1 commit intogildas-lormeau:masterfrom
sweenzor:mhtml-parser-perf

sweenzor commented Mar 21, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

sweenzor commented Mar 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

sweenzor commented Mar 21, 2026 •

edited

Loading