adding short input benchmarks by lemire · Pull Request #927 · simdutf/simdutf

lemire · 2026-01-23T20:03:30Z

This PR adds a new shortbench tool to benchmark SIMDUTF functions over incremental input sizes, providing detailed performance metrics for short inputs.

The shortbench tool supports benchmarking multiple functions:

validate_utf8 (default)
utf8_length_from_latin1
utf16_length_from_utf8
utf32_length_from_utf8
count_utf8

# Build the tool
cmake -B build -D SIMDUTF_BENCHMARKS=ON
cmake --build build

# List available functions
./build/benchmarks/shortbench --list

# Benchmark validate_utf8 on README.md (default function)
./build/benchmarks/shortbench README.md

# Benchmark utf8_length_from_latin1 with custom max size
./build/benchmarks/shortbench --function utf8_length_from_latin1 --max-size 256 somefile.txt

# Get help
./build/benchmarks/shortbench --help

You can run all functions with

./build/benchmarks/shortbench --all

and then process the result with a script.

The tool outputs a table with columns for:

Size (input bytes)
Total Time (ns)
Time/Byte (ns)
Error (%) - timing variability estimate
Cycles/Byte, Insns/Byte, Insns/Cycle (when performance counters available)

Example output:

Size       Total Time (ns)    Time/Byte (ns)    Err%    Cycles/Byte     Ins/Byte        Ins/Cycle      
----------------------------------------------------------------------------------------------------------------
1          ....     
...

This complements the existing benchmark tool which focuses on larger inputs and transcoding operations. The shortbench tool is particularly useful for analyzing performance characteristics of functions on small inputs where overhead and startup costs are significant.

This is meant to help with issue #925

sleepingeight

LGTM. Thanks for the benchmark.

lemire · 2026-01-30T16:55:48Z

So I am getting the following results on an Intel Ice Lake processor with GCC 15. Essentially, all function calls take more or less at least 5 ns (there are exceptions like simdutf::find).