Skip to content

Optimize string allocation in uri_parser_rfc3986 when getting IPv6 hosts and url paths#21550

Merged
kocsismate merged 9 commits intophp:masterfrom
LamentXU123:optimize-3
Mar 28, 2026
Merged

Optimize string allocation in uri_parser_rfc3986 when getting IPv6 hosts and url paths#21550
kocsismate merged 9 commits intophp:masterfrom
LamentXU123:optimize-3

Conversation

@LamentXU123
Copy link
Copy Markdown
Contributor

Currently, php_uri_parser_rfc3986_host_read and php_uri_parser_rfc3986_path_read use the smart_str API to build the host and path strings. I think smart_str_append* introduces unnecessary overhead here, since it use repetitive boundary checks and dynamic memory reallocations.

This PR optimize string allocation in uri_parser_rfc3986 by replacing smart_str with pre-calculated zend_string_alloc. Focusing on two main functions, php_uri_parser_rfc3986_host_read() and php_uri_parser_rfc3986_path_read(), which effect getHost() and getrawHost()

for php_uri_parser_rfc3986_host_read()
This PR constructs the IPv6/IPFuture host directly using a fixed-length zend_string (formatted as [ + hostText + ] + \0). This replaces the previous smart_str appending process.

for php_uri_parser_rfc3986_path_read()
This PR first traverses the segments to pre-calculate the total length (total_len). This includes the leading / and segment delimiters. Then it performs a single zend_string_alloc() and populates the path content and delimiters using memcpy, and finally append the \0 terminator.

Benchmark script: bench.php

for getHost()

└─$ php ../../bench.php --old ~/Desktop/php/php-raw/php-src-master/sapi/cli/php --new ~/Desktop/php/opt-3/php-src-master/sapi/cli/php --iters 20000000 --mode host 
-------------------------------------------------------
|          Test | old avg(ns) | new avg(ns) | diff(%) |
-------------------------------------------------------
|    ipv6_short |         714 |         656 |   8.12% |
-------------------------------------------------------
|     ipv6_full |         806 |         746 |   7.44% |
-------------------------------------------------------
| ipv6_loopback |         743 |         671 |   9.69% |
-------------------------------------------------------
|    ipv6_mixed |         944 |         861 |   8.79% |
-------------------------------------------------------

for getrawHost()

└─$ php ../../bench.php --old ~/Desktop/php/php-raw/php-src-master/sapi/cli/php --new ~/Desktop/php/opt-3/php-src-master/sapi/cli/php --iters 20000000 --mode raw_host
-------------------------------------------------------
|          Test | old avg(ns) | new avg(ns) | diff(%) |
-------------------------------------------------------
|    ipv6_short |         557 |         509 |   8.62% |
-------------------------------------------------------
|     ipv6_full |         621 |         572 |   7.89% |
-------------------------------------------------------
| ipv6_loopback |         570 |         550 |   3.51% |
-------------------------------------------------------
|    ipv6_mixed |         667 |         679 |  -1.80% |
-------------------------------------------------------

@LamentXU123 LamentXU123 marked this pull request as ready for review March 27, 2026 11:40
@iluuu1994
Copy link
Copy Markdown
Member

If this case really needs to be optimized, I would prefer extending smart_str to support appending to the string without bounds checks, with debug assertions. But I'm not code-owner.

Co-authored-by: Tim Düsterhus <timwolla@googlemail.com>
@LamentXU123
Copy link
Copy Markdown
Contributor Author

If this case really needs to be optimized, I would prefer extending smart_str to support appending to the string without bounds checks, with debug assertions. But I'm not code-owner.

I think this is the better solution initially. But I don't want to touch smart_str just because of this small patch.

@TimWolla
Copy link
Copy Markdown
Member

@LamentXU123 Thank you. Can you prepare another benchmark with the latest changes? Ideally using hyperfine. The getPath() implementation would be more interesting here, since most URLs have a path, whereas IPv6 literals as hosts are more rare.

Co-authored-by: Tim Düsterhus <timwolla@googlemail.com>
@LamentXU123
Copy link
Copy Markdown
Contributor Author

LamentXU123 commented Mar 27, 2026

Thank you. Can you prepare another benchmark with the latest changes?

Sure. The optimized one is 1.05x faster in this benchmark case.

<?php

$url = "http://example.com/segment1/segment2/segment3/segment4/segment5/segment6/segment7/segment8/segment9/segment10/a/b/c/d/e/f/g/h/i/j/k/l/m/n/o/p/q/r/s/t/u/v/w/x/y/z";

$sum = 0;
for ($i = 0; $i < 100000; $i++) {
    $uri = Uri\Rfc3986\Uri::parse($url);
    $path = $uri->getPath();
    $sum += strlen($path);
}

echo $sum, PHP_EOL;
└─$ hyperfine '~/Desktop/php/php-opt/php-src-master/sapi/cli/php bench.php' '~/Desktop/php/php-raw/php-src-master/sapi/cli/php bench.php'
Benchmark 1: ~/Desktop/php/php-opt/php-src-master/sapi/cli/php bench.php
  Time (mean ± σ):     592.3 ms ±  20.0 ms    [User: 498.2 ms, System: 83.7 ms]
  Range (min … max):   571.4 ms … 631.6 ms    10 runs
 
Benchmark 2: ~/Desktop/php/php-raw/php-src-master/sapi/cli/php bench.php
  Time (mean ± σ):     623.8 ms ±  16.6 ms    [User: 569.0 ms, System: 46.9 ms]
  Range (min … max):   600.4 ms … 655.4 ms    10 runs
 
Summary
  ~/Desktop/php/php-opt/php-src-master/sapi/cli/php bench.php ran
    1.05 ± 0.05 times faster than ~/Desktop/php/php-raw/php-src-master/sapi/cli/php bench.php  

In cases when the path is less nested, like:

<?php

$url = "http://example.com/segment1";

$sum = 0;
for ($i = 0; $i < 1000000; $i++) {
    $uri = Uri\Rfc3986\Uri::parse($url);
    $path = $uri->getPath();
    $sum += strlen($path);
}

echo $sum, PHP_EOL;

They are at the similar speed, the optimized one is 1.01x faster

└─$ hyperfine '~/Desktop/php/php-opt/php-src-master/sapi/cli/php bench.php' '~/Desktop/php/php-raw/php-src-master/sapi/cli/php bench.php'
Benchmark 1: ~/Desktop/php/php-opt/php-src-master/sapi/cli/php bench.php
  Time (mean ± σ):     984.1 ms ±  27.9 ms    [User: 883.8 ms, System: 83.6 ms]
  Range (min … max):   927.8 ms … 1014.5 ms    10 runs
 
Benchmark 2: ~/Desktop/php/php-raw/php-src-master/sapi/cli/php bench.php
  Time (mean ± σ):     992.0 ms ±  29.1 ms    [User: 914.3 ms, System: 62.4 ms]
  Range (min … max):   940.4 ms … 1026.5 ms    10 runs
 
Summary
  ~/Desktop/php/php-opt/php-src-master/sapi/cli/php bench.php ran
    1.01 ± 0.04 times faster than ~/Desktop/php/php-raw/php-src-master/sapi/cli/php bench.php 

@LamentXU123 LamentXU123 changed the title Optimize string allocation in uri_parser_rfc3986 when dealing with IPv6 hosts Optimize string allocation in uri_parser_rfc3986 when getting IPv6 hosts and url paths Mar 27, 2026
Copy link
Copy Markdown
Member

@TimWolla TimWolla left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't love the additional complexity, but it seems worth it.

@kocsismate kocsismate merged commit c45b2be into php:master Mar 28, 2026
19 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants