Make WordPress Core


Ignore:
Timestamp:
08/12/2025 06:13:48 PM (8 months ago)
Author:
dmsnell
Message:

Add wp_is_valid_utf8() for normalizing UTF-8 checks.

There are several existing mechanisms in Core to determine if a given string contains valid UTF-8 bytes or not. These are spread out and depend on which extensions are installed on the running system and what is set for blog_charset. The seems_utf8() function is one of these mechanisms.

seems_utf8() does not properly validate UTF-8, unfortunately, and is slow, and the purpose of the function is veiled behind its name and historic legacy.

This patch deprecates seems_utf() and introduces wp_is_valid_utf8(); a new, spec-compliant, efficient, and focused UTF-8 validator. This new validator defers to mb_check_encoding() where present, otherwise validating with a pure-PHP implementation. This makes the spec-compliant validator available on all systems regardless of their runtime environment.

Developed in https://github.com/WordPress/wordpress-develop/pull/9317
Discussed in https://core.trac.wordpress.org/ticket/38044

Props dmsnell, jonsurrell, jorbin.
Fixes #38044.

File:
1 edited

Legend:

Unmodified
Added
Removed
  • trunk/src/wp-admin/includes/export.php

    r58009 r60630  
    244244     */
    245245    function wxr_cdata( $str ) {
    246         if ( ! seems_utf8( $str ) ) {
     246        if ( ! wp_is_valid_utf8( $str ) ) {
    247247            $str = utf8_encode( $str );
    248248        }
Note: See TracChangeset for help on using the changeset viewer.