Returns how many code points are found in the given UTF-8 string.
Description
Invalid spans of bytes count as a single code point according to the maximal subpart rule. This function is a fallback method for calling mb_strlen( $text, 'UTF-8' ).
When negative values are provided for the byte offsets or length, this will always report zero code points.
Example:
4 === _wp_utf8_codepoint_count( 'text' );
// Groups are 'test', "\x90" as '�', 'wp', "\xE2\x80" as '�', "\xC0" as '�', and 'test'.
13 === _wp_utf8_codepoint_count( "test\x90wp\xE2\x80\xC0test" );Parameters
$textstringrequired- Count code points in this string.
$byte_offset?intrequired- Start counting after this many bytes in
$text. Must be positive. $max_byte_length?intoptional- Stop counting after having scanned past this many bytes.
Default is to scan until the end of the string. Must be positive.Default:
PHP_INT_MAX
Source
function _wp_utf8_codepoint_count( string $text, ?int $byte_offset = 0, ?int $max_byte_length = PHP_INT_MAX ): int {
if ( $byte_offset < 0 ) {
return 0;
}
$count = 0;
$at = $byte_offset;
$end = strlen( $text );
$invalid_length = 0;
$max_byte_length = min( $end - $at, $max_byte_length );
while ( $at < $end && ( $at - $byte_offset ) < $max_byte_length ) {
$count += _wp_scan_utf8( $text, $at, $invalid_length, $max_byte_length - ( $at - $byte_offset ) );
$count += $invalid_length > 0 ? 1 : 0;
$at += $invalid_length;
}
return $count;
}
Changelog
| Version | Description |
|---|---|
| 6.9.0 | Introduced. |
User Contributed Notes
You must log in before being able to contribute a note or feedback.