Fallback mechanism for replacing invalid spans of UTF-8 bytes.
Description
Example:
'Pi�a' === _wp_scrub_utf8_fallback( "Pi\xF1a" ); // “ñ” is 0xF1 in Windows-1252.
See also
Parameters
$bytesstringrequired- UTF-8 encoded string which might contain spans of invalid bytes.
Source
function _wp_scrub_utf8_fallback( string $bytes ): string {
$bytes_length = strlen( $bytes );
$next_byte_at = 0;
$was_at = 0;
$invalid_length = 0;
$scrubbed = '';
while ( $next_byte_at <= $bytes_length ) {
_wp_scan_utf8( $bytes, $next_byte_at, $invalid_length );
if ( $next_byte_at >= $bytes_length ) {
if ( 0 === $was_at ) {
return $bytes;
}
return $scrubbed . substr( $bytes, $was_at, $next_byte_at - $was_at - $invalid_length );
}
$scrubbed .= substr( $bytes, $was_at, $next_byte_at - $was_at );
$scrubbed .= "\u{FFFD}";
$next_byte_at += $invalid_length;
$was_at = $next_byte_at;
}
return $scrubbed;
}
Changelog
| Version | Description |
|---|---|
| 6.9.0 | Introduced. |
User Contributed Notes
You must log in before being able to contribute a note or feedback.