Improve the performance of validate utf8 and ascii on short inputs by lemire · Pull Request #929 · simdutf/simdutf

lemire · 2026-01-31T20:09:21Z

Our current implementation of validate utf8 and validate ascii functions use a full SIMD approach, using 64-byte blocks. It works well for larger blocks, but for tiny inputs, it can deliver good performance.

LLVM 20/Apple M4.

Current main branch

./build/benchmarks/shortbench --function validate_utf8
# Warning: Performance events not available on this system. Under macOS and Linux, you may need to run with sudo or configure performance counters.
# Benchmarking validate_utf8 on default zero input
# Input size: 128 bytes
# Max benchmark size: 128 bytes
# Current system: arm64

Size       Total Time (ns)    Time/Byte (ns)       Err%
---------------------------------------------------------------
1          146.8                146.8                 2
2          143.0                71.5                  2
3          142.6                47.5                  2
4          141.4                35.3                  2
5          144.2                28.8                  2
6          144.5                24.1                  2
7          145.1                20.7                  2
8          143.0                17.9                  2
9          139.6                15.5                  2
10         143.8                14.4                  2
11         147.3                13.4                  2
12         144.8                12.1                  2
13         147.2                11.3                  2
14         145.4                10.4                  2
15         147.2                9.8                   2
16         143.6                9.0                   2
17         141.9                8.3                   2
18         146.4                8.1                   3
19         144.9                7.6                   2
20         145.7                7.3                   2
21         144.3                6.9                   2
22         147.4                6.7                   3
23         148.9                6.5                   3
24         146.1                6.1                   2
25         145.2                5.8                   3
26         148.2                5.7                   3
27         145.8                5.4                   2
28         147.8                5.3                   4
29         145.9                5.0                   3
30         145.2                4.8                   3
31         147.5                4.8                   3
32         145.0                4.5                   3
33         144.3                4.4                   3
34         146.2                4.3                   2
35         145.4                4.2                   3
36         154.2                4.3                   1
37         146.4                4.0                   3
38         146.4                3.9                   3
39         149.3                3.8                   3
40         147.2                3.7                   1
41         144.5                3.5                   3
42         147.0                3.5                   3
43         146.9                3.4                   1
44         148.3                3.4                   3
45         149.8                3.3                   3
46         151.6                3.3                   1
47         150.1                3.2                   3
48         146.7                3.1                   3
49         146.4                3.0                   3
50         148.5                3.0                   4
51         148.5                2.9                   4
52         154.0                3.0                   5
53         149.8                2.8                   3
54         150.8                2.8                   4
55         152.3                2.8                   4
56         149.0                2.7                   4
57         148.5                2.6                   4
58         150.0                2.6                   3
59         156.0                2.6                   6
60         152.9                2.5                   4
61         149.9                2.5                   4
62         152.1                2.5                   4
63         151.6                2.4                   4
64         150.2                2.3                   5
65         204.1                3.1                   4
66         211.6                3.2                   4
67         203.0                3.0                   2
68         202.7                3.0                   5
69         207.3                3.0                   5
70         210.2                3.0                   6
71         215.8                3.0                   6
72         212.8                3.0                   5
73         212.7                2.9                   1
74         208.5                2.8                   5
75         212.9                2.8                   3
76         213.6                2.8                   7
77         223.9                2.9                   8
78         221.3                2.8                   8
79         226.5                2.9                   9
80         213.6                2.7                   7
81         218.7                2.7                   8
82         226.1                2.8                   9
83         213.7                2.6                   9
84         223.0                2.7                   8
85         218.5                2.6                   9
86         221.4                2.6                   9
87         226.2                2.6                   5
88         226.0                2.6                   8
89         219.3                2.5                   9
90         221.2                2.5                   9
91         227.0                2.5                   9
92         219.7                2.4                  10
93         226.9                2.4                  10
94         219.1                2.3                   8
95         223.7                2.4                   9
96         215.7                2.2                   9
97         221.0                2.3                   9
98         225.7                2.3                   9
99         216.2                2.2                   9
100        225.6                2.3                   1
101        220.6                2.2                   2
102        223.0                2.2                   2
103        226.2                2.2                  11
104        227.3                2.2                   9
105        219.6                2.1                   9
106        220.9                2.1                   9
107        225.5                2.1                  11
108        222.4                2.1                   2
109        226.6                2.1                   9
110        221.8                2.0                   9
111        228.6                2.1                   2
112        217.3                1.9                   4
113        222.4                2.0                   7
114        227.1                2.0                   5
115        218.2                1.9                   7
116        224.4                1.9                   9
117        221.2                1.9                   9
118        224.4                1.9                  10
119        226.9                1.9                  11
120        227.4                1.9                  12
121        219.8                1.8                  10
122        222.9                1.8                   8
123        226.9                1.8                  11
124        221.8                1.8                   9
125        223.9                1.8                  10
126        221.5                1.8                   9
127        225.7                1.8                  10
128        227.0                1.8                   0

This PR:

./build/benchmarks/shortbench --function validate_utf8
# Warning: Performance events not available on this system. Under macOS and Linux, you may need to run with sudo or configure performance counters.
# Benchmarking validate_utf8 on default zero input
# Input size: 128 bytes
# Max benchmark size: 128 bytes
# Current system: arm64

Size       Total Time (ns)    Time/Byte (ns)       Err%
---------------------------------------------------------------
1          12.4                 12.4                  2
2          13.3                 6.6                   2
3          13.6                 4.5                   2
4          14.7                 3.7                   3
5          15.1                 3.0                   3
6          16.0                 2.7                   2
7          16.7                 2.4                   2
8          17.1                 2.1                   3
9          17.6                 2.0                   1
10         18.2                 1.8                   3
11         18.6                 1.7                   5
12         19.4                 1.6                   3
13         20.1                 1.5                   3
14         20.3                 1.4                   3
15         20.8                 1.4                   5
16         12.8                 0.8                   3
17         13.4                 0.8                   4
18         14.2                 0.8                   3
19         14.8                 0.8                   3
20         15.6                 0.8                   5
21         16.0                 0.8                   3
22         16.9                 0.8                   5
23         17.5                 0.8                   3
24         18.0                 0.8                   3
25         18.4                 0.7                   3
26         19.0                 0.7                   3
27         19.5                 0.7                   3
28         20.4                 0.7                   4
29         20.7                 0.7                   5
30         21.4                 0.7                   3
31         22.5                 0.7                   0
32         14.7                 0.5                   2
33         16.1                 0.5                   6
34         16.9                 0.5                   0
35         17.4                 0.5                   3
36         17.8                 0.5                   3
37         18.1                 0.5                   3
38         18.6                 0.5                   3
39         19.1                 0.5                   3
40         19.7                 0.5                   0
41         20.1                 0.5                   3
42         20.5                 0.5                   3
43         21.4                 0.5                   0
44         22.2                 0.5                   1
45         22.8                 0.5                   1
46         23.2                 0.5                   2
47         24.1                 0.5                   1
48         17.3                 0.4                   3
49         18.0                 0.4                   1
50         18.4                 0.4                   2
51         18.6                 0.4                   3
52         19.2                 0.4                   3
53         19.6                 0.4                   3
54         20.0                 0.4                   3
55         20.5                 0.4                   3
56         21.1                 0.4                   2
57         21.7                 0.4                   3
58         22.4                 0.4                   2
59         22.9                 0.4                   3
60         23.9                 0.4                   4
61         25.0                 0.4                   0
62         24.7                 0.4                   1
63         25.2                 0.4                   3
64         75.3                 1.2                   5
65         148.2                2.3                   5
66         149.4                2.3                   4
67         149.9                2.2                   5
68         147.1                2.2                   6
69         144.6                2.1                   7
70         138.1                2.0                   7
71         136.0                1.9                   6
72         136.4                1.9                   7
73         135.2                1.9                   5
74         135.3                1.8                   5
75         135.2                1.8                   6
76         136.6                1.8                   5
77         136.6                1.8                   6
78         136.5                1.8                   6
79         137.3                1.7                   6
80         148.1                1.9                   2
81         144.9                1.8                   4
82         141.4                1.7                   5
83         136.9                1.6                   6
84         134.7                1.6                   6
85         134.4                1.6                   5
86         135.0                1.6                   6
87         136.0                1.6                   6
88         135.8                1.5                   4
89         135.2                1.5                   5
90         136.4                1.5                   6
91         136.2                1.5                   6
92         137.0                1.5                   5
93         137.2                1.5                   4
94         138.4                1.5                   5
95         138.8                1.5                   6
96         138.0                1.4                   6
97         138.0                1.4                   1
98         134.2                1.4                   6
99         133.6                1.3                   5
100        134.7                1.3                   5
101        136.8                1.4                   1
102        136.1                1.3                   4
103        136.4                1.3                   5
104        136.4                1.3                   4
105        138.1                1.3                   5
106        138.0                1.3                   5
107        139.6                1.3                   4
108        139.3                1.3                   5
109        140.3                1.3                   4
110        140.3                1.3                   5
111        141.1                1.3                   5
112        133.6                1.2                   5
113        135.5                1.2                   1
114        134.7                1.2                   5
115        135.2                1.2                   5
116        135.7                1.2                   4
117        136.1                1.2                   5
118        137.6                1.2                   4
119        137.9                1.2                   4
120        138.7                1.2                   4
121        138.4                1.1                   4
122        140.4                1.2                   5
123        142.7                1.2                   1
124        141.7                1.1                   6
125        142.1                1.1                   5
126        141.8                1.1                   4
127        143.3                1.1                   5
128        133.6                1.0                   4

pauldreik · 2026-02-01T15:10:26Z

I graphed your pasted results:

it looks like a nice improvement!

But if (after the change) it takes 25 ns for 63 byte and 133.6 ns for 128, it looks like it would be even faster to validate 63 byte at a time?

EDIT: this is a factor 5 slower than x86. is there something fishy going on here?

pauldreik · 2026-02-01T15:40:27Z

I repeated your measurement on my system, x86 compiled with gcc 14 -march=znver5. There is almost no difference.

lemire · 2026-02-01T19:31:35Z

@pauldreik

I am closing this PR. The reason we had this effect is that my build was in debug mode. :-/

Sorry for the noise.

lemire added 3 commits January 31, 2026 13:29

for short inputs get_remainder is inefficient.

f522270

removing remainder to improve performance on short inputs

1e3ba24

Merge branch 'master' into remove_get_remainder

3ab3046

lemire requested a review from pauldreik January 31, 2026 20:09

lemire closed this Feb 5, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve the performance of validate utf8 and ascii on short inputs#929

Improve the performance of validate utf8 and ascii on short inputs#929
lemire wants to merge 3 commits intomasterfrom
remove_get_remainder

lemire commented Jan 31, 2026

Uh oh!

pauldreik commented Feb 1, 2026 •

edited

Loading

Uh oh!

pauldreik commented Feb 1, 2026

Uh oh!

lemire commented Feb 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

lemire commented Jan 31, 2026

Uh oh!

pauldreik commented Feb 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pauldreik commented Feb 1, 2026

Uh oh!

lemire commented Feb 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

pauldreik commented Feb 1, 2026 •

edited

Loading