Skip to content

Improve the performance of validate utf8 and ascii on short inputs#929

Closed
lemire wants to merge 3 commits intomasterfrom
remove_get_remainder
Closed

Improve the performance of validate utf8 and ascii on short inputs#929
lemire wants to merge 3 commits intomasterfrom
remove_get_remainder

Conversation

@lemire
Copy link
Member

@lemire lemire commented Jan 31, 2026

Our current implementation of validate utf8 and validate ascii functions use a full SIMD approach, using 64-byte blocks. It works well for larger blocks, but for tiny inputs, it can deliver good performance.

LLVM 20/Apple M4.

Current main branch

./build/benchmarks/shortbench --function validate_utf8
# Warning: Performance events not available on this system. Under macOS and Linux, you may need to run with sudo or configure performance counters.
# Benchmarking validate_utf8 on default zero input
# Input size: 128 bytes
# Max benchmark size: 128 bytes
# Current system: arm64

Size       Total Time (ns)    Time/Byte (ns)       Err%
---------------------------------------------------------------
1          146.8                146.8                 2
2          143.0                71.5                  2
3          142.6                47.5                  2
4          141.4                35.3                  2
5          144.2                28.8                  2
6          144.5                24.1                  2
7          145.1                20.7                  2
8          143.0                17.9                  2
9          139.6                15.5                  2
10         143.8                14.4                  2
11         147.3                13.4                  2
12         144.8                12.1                  2
13         147.2                11.3                  2
14         145.4                10.4                  2
15         147.2                9.8                   2
16         143.6                9.0                   2
17         141.9                8.3                   2
18         146.4                8.1                   3
19         144.9                7.6                   2
20         145.7                7.3                   2
21         144.3                6.9                   2
22         147.4                6.7                   3
23         148.9                6.5                   3
24         146.1                6.1                   2
25         145.2                5.8                   3
26         148.2                5.7                   3
27         145.8                5.4                   2
28         147.8                5.3                   4
29         145.9                5.0                   3
30         145.2                4.8                   3
31         147.5                4.8                   3
32         145.0                4.5                   3
33         144.3                4.4                   3
34         146.2                4.3                   2
35         145.4                4.2                   3
36         154.2                4.3                   1
37         146.4                4.0                   3
38         146.4                3.9                   3
39         149.3                3.8                   3
40         147.2                3.7                   1
41         144.5                3.5                   3
42         147.0                3.5                   3
43         146.9                3.4                   1
44         148.3                3.4                   3
45         149.8                3.3                   3
46         151.6                3.3                   1
47         150.1                3.2                   3
48         146.7                3.1                   3
49         146.4                3.0                   3
50         148.5                3.0                   4
51         148.5                2.9                   4
52         154.0                3.0                   5
53         149.8                2.8                   3
54         150.8                2.8                   4
55         152.3                2.8                   4
56         149.0                2.7                   4
57         148.5                2.6                   4
58         150.0                2.6                   3
59         156.0                2.6                   6
60         152.9                2.5                   4
61         149.9                2.5                   4
62         152.1                2.5                   4
63         151.6                2.4                   4
64         150.2                2.3                   5
65         204.1                3.1                   4
66         211.6                3.2                   4
67         203.0                3.0                   2
68         202.7                3.0                   5
69         207.3                3.0                   5
70         210.2                3.0                   6
71         215.8                3.0                   6
72         212.8                3.0                   5
73         212.7                2.9                   1
74         208.5                2.8                   5
75         212.9                2.8                   3
76         213.6                2.8                   7
77         223.9                2.9                   8
78         221.3                2.8                   8
79         226.5                2.9                   9
80         213.6                2.7                   7
81         218.7                2.7                   8
82         226.1                2.8                   9
83         213.7                2.6                   9
84         223.0                2.7                   8
85         218.5                2.6                   9
86         221.4                2.6                   9
87         226.2                2.6                   5
88         226.0                2.6                   8
89         219.3                2.5                   9
90         221.2                2.5                   9
91         227.0                2.5                   9
92         219.7                2.4                  10
93         226.9                2.4                  10
94         219.1                2.3                   8
95         223.7                2.4                   9
96         215.7                2.2                   9
97         221.0                2.3                   9
98         225.7                2.3                   9
99         216.2                2.2                   9
100        225.6                2.3                   1
101        220.6                2.2                   2
102        223.0                2.2                   2
103        226.2                2.2                  11
104        227.3                2.2                   9
105        219.6                2.1                   9
106        220.9                2.1                   9
107        225.5                2.1                  11
108        222.4                2.1                   2
109        226.6                2.1                   9
110        221.8                2.0                   9
111        228.6                2.1                   2
112        217.3                1.9                   4
113        222.4                2.0                   7
114        227.1                2.0                   5
115        218.2                1.9                   7
116        224.4                1.9                   9
117        221.2                1.9                   9
118        224.4                1.9                  10
119        226.9                1.9                  11
120        227.4                1.9                  12
121        219.8                1.8                  10
122        222.9                1.8                   8
123        226.9                1.8                  11
124        221.8                1.8                   9
125        223.9                1.8                  10
126        221.5                1.8                   9
127        225.7                1.8                  10
128        227.0                1.8                   0

This PR:

./build/benchmarks/shortbench --function validate_utf8
# Warning: Performance events not available on this system. Under macOS and Linux, you may need to run with sudo or configure performance counters.
# Benchmarking validate_utf8 on default zero input
# Input size: 128 bytes
# Max benchmark size: 128 bytes
# Current system: arm64

Size       Total Time (ns)    Time/Byte (ns)       Err%
---------------------------------------------------------------
1          12.4                 12.4                  2
2          13.3                 6.6                   2
3          13.6                 4.5                   2
4          14.7                 3.7                   3
5          15.1                 3.0                   3
6          16.0                 2.7                   2
7          16.7                 2.4                   2
8          17.1                 2.1                   3
9          17.6                 2.0                   1
10         18.2                 1.8                   3
11         18.6                 1.7                   5
12         19.4                 1.6                   3
13         20.1                 1.5                   3
14         20.3                 1.4                   3
15         20.8                 1.4                   5
16         12.8                 0.8                   3
17         13.4                 0.8                   4
18         14.2                 0.8                   3
19         14.8                 0.8                   3
20         15.6                 0.8                   5
21         16.0                 0.8                   3
22         16.9                 0.8                   5
23         17.5                 0.8                   3
24         18.0                 0.8                   3
25         18.4                 0.7                   3
26         19.0                 0.7                   3
27         19.5                 0.7                   3
28         20.4                 0.7                   4
29         20.7                 0.7                   5
30         21.4                 0.7                   3
31         22.5                 0.7                   0
32         14.7                 0.5                   2
33         16.1                 0.5                   6
34         16.9                 0.5                   0
35         17.4                 0.5                   3
36         17.8                 0.5                   3
37         18.1                 0.5                   3
38         18.6                 0.5                   3
39         19.1                 0.5                   3
40         19.7                 0.5                   0
41         20.1                 0.5                   3
42         20.5                 0.5                   3
43         21.4                 0.5                   0
44         22.2                 0.5                   1
45         22.8                 0.5                   1
46         23.2                 0.5                   2
47         24.1                 0.5                   1
48         17.3                 0.4                   3
49         18.0                 0.4                   1
50         18.4                 0.4                   2
51         18.6                 0.4                   3
52         19.2                 0.4                   3
53         19.6                 0.4                   3
54         20.0                 0.4                   3
55         20.5                 0.4                   3
56         21.1                 0.4                   2
57         21.7                 0.4                   3
58         22.4                 0.4                   2
59         22.9                 0.4                   3
60         23.9                 0.4                   4
61         25.0                 0.4                   0
62         24.7                 0.4                   1
63         25.2                 0.4                   3
64         75.3                 1.2                   5
65         148.2                2.3                   5
66         149.4                2.3                   4
67         149.9                2.2                   5
68         147.1                2.2                   6
69         144.6                2.1                   7
70         138.1                2.0                   7
71         136.0                1.9                   6
72         136.4                1.9                   7
73         135.2                1.9                   5
74         135.3                1.8                   5
75         135.2                1.8                   6
76         136.6                1.8                   5
77         136.6                1.8                   6
78         136.5                1.8                   6
79         137.3                1.7                   6
80         148.1                1.9                   2
81         144.9                1.8                   4
82         141.4                1.7                   5
83         136.9                1.6                   6
84         134.7                1.6                   6
85         134.4                1.6                   5
86         135.0                1.6                   6
87         136.0                1.6                   6
88         135.8                1.5                   4
89         135.2                1.5                   5
90         136.4                1.5                   6
91         136.2                1.5                   6
92         137.0                1.5                   5
93         137.2                1.5                   4
94         138.4                1.5                   5
95         138.8                1.5                   6
96         138.0                1.4                   6
97         138.0                1.4                   1
98         134.2                1.4                   6
99         133.6                1.3                   5
100        134.7                1.3                   5
101        136.8                1.4                   1
102        136.1                1.3                   4
103        136.4                1.3                   5
104        136.4                1.3                   4
105        138.1                1.3                   5
106        138.0                1.3                   5
107        139.6                1.3                   4
108        139.3                1.3                   5
109        140.3                1.3                   4
110        140.3                1.3                   5
111        141.1                1.3                   5
112        133.6                1.2                   5
113        135.5                1.2                   1
114        134.7                1.2                   5
115        135.2                1.2                   5
116        135.7                1.2                   4
117        136.1                1.2                   5
118        137.6                1.2                   4
119        137.9                1.2                   4
120        138.7                1.2                   4
121        138.4                1.1                   4
122        140.4                1.2                   5
123        142.7                1.2                   1
124        141.7                1.1                   6
125        142.1                1.1                   5
126        141.8                1.1                   4
127        143.3                1.1                   5
128        133.6                1.0                   4

@lemire lemire requested a review from pauldreik January 31, 2026 20:09
@pauldreik
Copy link
Collaborator

pauldreik commented Feb 1, 2026

I graphed your pasted results:
bild

it looks like a nice improvement!

But if (after the change) it takes 25 ns for 63 byte and 133.6 ns for 128, it looks like it would be even faster to validate 63 byte at a time?

EDIT: this is a factor 5 slower than x86. is there something fishy going on here?

@pauldreik
Copy link
Collaborator

I repeated your measurement on my system, x86 compiled with gcc 14 -march=znver5. There is almost no difference.

bild

@lemire
Copy link
Member Author

lemire commented Feb 1, 2026

@pauldreik

I am closing this PR. The reason we had this effect is that my build was in debug mode. :-/

Sorry for the noise.

@lemire lemire closed this Feb 5, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants