Add Eisel-Lemire algorithm for faster String#to_f #15655

mensfeld · 2025-12-19T14:54:50Z

This PR adds the Eisel-Lemire algorithm for string-to-float conversion, providing significant performance improvements for String#to_f, especially for numbers with many significant digits.

Performance Results

Benchmark: 3,000,000 iterations per category

Input Type	Master	This PR	Improvement
Simple decimals (`"1.5"`, `"3.14"`)	0.142s	0.117s	17% faster
Prices (`"9.99"`, `"19.95"`)	0.141s	0.120s	15% faster
Small integers (`"5"`, `"42"`)	0.131s	0.114s	13% faster
Math constants (`"3.141592653589793"`)	0.615s	0.194s	3.2x faster
High precision (`"0.123456789012345"`)	0.504s	0.190s	2.7x faster
Scientific (`"1e5"`, `"2e10"`)	0.140s	0.139s	~same

Key Insights

Simple numbers (1-6 digits): 13-17% faster via ultra-fast paths
Complex numbers (10+ digits): 2.7-3.2x faster via Eisel-Lemire algorithm
No regressions for any input type (at least not detected by me)

Implementation Details

Algorithm Overview

The implementation adds three optimization levels to rb_cstr_to_dbl_raise:

Ultra-fast path for small integers (try_small_integer_fast_path)
- Handles: "5", "42", "-123" (up to 3 digits)
- Simple digit parsing, direct conversion to double
Ultra-fast path for simple decimals (try_simple_decimal_fast_path)
- Handles: "1.5", "9.99", "199.95" (up to 3+3 digits)
- Parses integer and fractional parts separately
- Uses precomputed divisors (10, 100, 1000)
Eisel-Lemire algorithm (rb_eisel_lemire64)
- Handles complex numbers with many significant digits
- Uses 128-bit multiplication with precomputed powers of 5
- Falls back to strtod for ambiguous rounding cases

Technical Details

128-bit multiplication: Uses __uint128_t when available, falls back to portable 64-bit emulation
Powers of 5 table: 651 precomputed 128-bit values for exponents [-342, 308]
Underscore handling: Proper Ruby underscore validation (between digits only)
Fallback: Falls back to strtod for edge cases (hex floats, >19 digits, ambiguous rounding)

References

Eisel-Lemire paper - "Number Parsing at a Gigabyte per Second" (Software: Practice and Experience, 2021)
fast_float C++ library
Go implementation
Nigel Tao's blog post - Excellent explanation of the algorithm

Benchmark Script

ITERATIONS = 3_000_000

def bench(name, strings)
  start = Process.clock_gettime(Process::CLOCK_MONOTONIC)
  (ITERATIONS / strings.size).times { strings.each(&:to_f) }
  elapsed = Process.clock_gettime(Process::CLOCK_MONOTONIC) - start
  printf "%-35s %0.3fs\n", name, elapsed
end

bench("Simple decimals (1.5, 3.14)",
      %w[1.5 2.0 3.14 99.99 0.5 0.25 10.0 7.5 42.0 100.0])

bench("Prices (9.99, 19.95)",
      %w[9.99 19.95 29.99 49.95 99.99 149.99 199.95 299.99 399.95 499.99])

bench("Small integers (5, 42)",
      %w[5 42 123 7 99 256 1 0 50 999])

bench("Math constants (Pi, E)",
      %w[3.141592653589793 2.718281828459045 1.4142135623730951])

bench("High precision decimals",
      %w[0.123456789012345 9.876543210987654 1.111111111111111])

bench("Scientific (1e5, 2e10)",
      %w[1e5 2e6 3e7 4e8 5e9 1e10])

Add optimized parsing paths for common float string formats that bypass the full strtod implementation: 1. Small integer fast path - handles "5", "42", "-123" (up to 3 digits) 2. Simple decimal fast path - handles "1.5", "9.99", "199.95" patterns (up to 3 integer + 3 fractional digits) These fast paths are only used when badcheck is false (String#to_f), not for strict parsing (Kernel#Float). Based on insights from: - Eisel-Lemire algorithm (https://arxiv.org/abs/2101.11408) - Nigel Tao's blog post (https://nigeltao.github.io/blog/2020/eisel-lemire.html) The key insight is that for simple numbers, the overhead of strtod (locale handling, full parsing) is unnecessary. Direct integer arithmetic is faster for common cases like prices, coordinates, and simple measurements.

This commit adds the Eisel-Lemire algorithm for string-to-float conversion, providing significant performance improvements for complex numbers while maintaining fast paths for simple cases. Performance improvements for String#to_f: - Simple decimals (1.5, 3.14): ~0.12s (fast path) - Prices (9.99, 19.95): ~0.12s (fast path) - Math constants (Pi, E): ~0.19s (was ~0.63s = 3.3x faster) - High precision decimals: ~0.19s (3x faster) - Scientific notation: ~0.14s (Eisel-Lemire) Implementation details: - 128-bit multiplication helpers for Eisel-Lemire algorithm - Powers-of-5 lookup table (651 entries) in eisel_lemire_pow5.inc - Core Eisel-Lemire algorithm (rb_eisel_lemire64) - Decimal parser with proper Ruby underscore handling - Fast paths for simple integers and decimals The algorithm is based on: - Daniel Lemire's paper: "Number Parsing at a Gigabyte per Second" - fast_float C++ library - Go standard library implementation - Nigel Tao's blog post on Eisel-Lemire All 487 float/string tests pass (27,393 assertions).

mensfeld added 4 commits December 19, 2025 15:33

Add PR.md with benchmark comparison and details

33c6dc1

remove PR content file

543aaeb

This comment has been minimized.

Sign in to view

mensfeld and others added 3 commits December 19, 2025 16:07

fix styling remarks

fd4a8dd

Merge branch 'master' into fast-float-optimizations

4fcf6ee

Update depend

207dd54

nobu added ruby 4.1 Performance labels Dec 20, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add Eisel-Lemire algorithm for faster String#to_f #15655

Add Eisel-Lemire algorithm for faster String#to_f #15655

Uh oh!

mensfeld commented Dec 19, 2025 •

edited

Loading

Uh oh!

This comment has been minimized.

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Add Eisel-Lemire algorithm for faster String#to_f #15655

Are you sure you want to change the base?

Add Eisel-Lemire algorithm for faster String#to_f #15655

Uh oh!

Conversation

mensfeld commented Dec 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Performance Results

Key Insights

Implementation Details

Algorithm Overview

Technical Details

References

Benchmark Script

Uh oh!

This comment has been minimized.

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

mensfeld commented Dec 19, 2025 •

edited

Loading