Make WordPress Core

Opened 4 months ago

Last modified 3 months ago

#63804 new enhancement

Proposal of wp_trim(): JavaScript-compatible alternative for trim()

Reported by: takayukister's profile takayukister Owned by:
Milestone: Awaiting Review Priority: normal
Severity: normal Version:
Component: Formatting Keywords: has-patch has-unit-tests
Focuses: Cc:

Description

PHP's trim() function, by default, only strips limited ASCII characters of whitespace. On the other hand, JavaScript's String.prototype.trim() method strips all Unicode white space characters and line terminators. This difference often confuses.

PHP 8.4 introduced mb_trim(), the multi-byte safe alternative for trim(). Unfortunately, though, String.prototype.trim() and mb_trim() do not perform identically. There are slight differences: String.prototype.trim() strips U+FEFF (Zero-width no-break space); mb_trim() doesn't.

For safe coding, I propose introducing wp_trim(), a PHP function that performs identically to String.prototype.trim(). The following is a simple implementation example of wp_trim():

function wp_trim( string $string ): string {
        $whitespaces = '\x09-\x0D\x20\x85\xA0\x{1680}\x{2000}-\x{200A}\x{2028}\x{2029}\x{202F}\x{205F}\x{3000}\x{FEFF}';

        $string = preg_replace(
                sprintf( '/[%s]+$/u', $whitespaces ),
                '',
                $string
        );

        $string = preg_replace(
                sprintf( '/^[%s]+/u', $whitespaces ),
                '',
                $string
        );

        return $string;
}

Among the white space characters that wp_trim() strips, U+3000 (Ideographic space) is one especially often used in Japanese user-generated content. It is also called "full-width space", which is the character input when you hit the space bar in the Japanese input mode. I think we should replace trim() with wp_trim() anywhere you use it for trimming user input text.

Change History (4)

#1 @dmsnell
4 months ago

Thanks @takayukister — there are clearly growing needs to harmonize code between PHP and JavaScript. I wonder if it wouldn’t be clearer to think about functions acknowledging just that. For instance, instead of wp_trim(), which itself would introduce a third new set of semantics, to write js_trim(), where we could always compare the behavior of that function against String.prototype.trim().

Note also that we have existing mechanisms to match the behavior in mb_trim() by passing a second parameter, the trim list. There’s another way to think about this proposal as providing that set of characters.

<?php
$js_trimmables = "\x09\x0A\x0B\x0C\x0D\x20\u{0085}\u{00A0}\u{1680}\u{2000}…\u{200A}\u{2028}\u{2029}\u{202F}\u{205F}\u{3000}\u{FEFF}"; // No ".." ranges in `mb_trim()`

echo mb_trim( $string, $js_trimmables );

function js_trim( $string ) {
        global $js_trimmables;

        return mb_trim( $string, $js_trimmables );
}

#2 @takayukister
4 months ago

js_trim() is easier to understand. I agree on the naming.

If you use mb_trim(), you'll need:

  • to define mb_trim() in wp-includes/compat.php because it's not yet available in most of the user environments.
  • to specify the third parameter to UTF-8, otherwise the internal encoding will be used.

This ticket was mentioned in PR #9519 on WordPress/wordpress-develop by @tusharbharti.


3 months ago
#3

  • Keywords has-patch has-unit-tests added

PHP’s trim() function, by default, only strips a limited set of ASCII whitespace characters, and mb_trim(), introduced in PHP 8.4, does not behave identically to JavaScript’s String.prototype.trim().

This PR implements js_trim(), a PHP function that replicates JavaScript’s String.prototype.trim() behavior.

It works by defining a set of $js_trimmables characters, which are passed to mb_trim() with UTF-8 encoding.

In addition, this PR adds a polyfill for mb_trim() in compat.php to support PHP versions below 8.4 with unit tests for both js_trim() and mb_trim()

Trac ticket: https://core.trac.wordpress.org/ticket/63804

#4 @tusharbharti
3 months ago

Hi, as we have to port mb_trim() to compat as polyfills for lower version, should we also add polyfills for mb_ltrim() and mb_rtrim().

Note: See TracTickets for help on using tickets.