Opened 4 months ago
Last modified 3 months ago
#63804 new enhancement
Proposal of wp_trim(): JavaScript-compatible alternative for trim()
| Reported by: |
|
Owned by: | |
|---|---|---|---|
| Milestone: | Awaiting Review | Priority: | normal |
| Severity: | normal | Version: | |
| Component: | Formatting | Keywords: | has-patch has-unit-tests |
| Focuses: | Cc: |
Description
PHP's trim() function, by default, only strips limited ASCII characters of whitespace. On the other hand, JavaScript's String.prototype.trim() method strips all Unicode white space characters and line terminators. This difference often confuses.
PHP 8.4 introduced mb_trim(), the multi-byte safe alternative for trim(). Unfortunately, though, String.prototype.trim() and mb_trim() do not perform identically. There are slight differences: String.prototype.trim() strips U+FEFF (Zero-width no-break space); mb_trim() doesn't.
For safe coding, I propose introducing wp_trim(), a PHP function that performs identically to String.prototype.trim(). The following is a simple implementation example of wp_trim():
function wp_trim( string $string ): string {
$whitespaces = '\x09-\x0D\x20\x85\xA0\x{1680}\x{2000}-\x{200A}\x{2028}\x{2029}\x{202F}\x{205F}\x{3000}\x{FEFF}';
$string = preg_replace(
sprintf( '/[%s]+$/u', $whitespaces ),
'',
$string
);
$string = preg_replace(
sprintf( '/^[%s]+/u', $whitespaces ),
'',
$string
);
return $string;
}
Among the white space characters that wp_trim() strips, U+3000 (Ideographic space) is one especially often used in Japanese user-generated content. It is also called "full-width space", which is the character input when you hit the space bar in the Japanese input mode. I think we should replace trim() with wp_trim() anywhere you use it for trimming user input text.
Change History (4)
#2
@
4 months ago
js_trim() is easier to understand. I agree on the naming.
If you use mb_trim(), you'll need:
- to define
mb_trim()in wp-includes/compat.php because it's not yet available in most of the user environments. - to specify the third parameter to
UTF-8, otherwise the internal encoding will be used.
This ticket was mentioned in PR #9519 on WordPress/wordpress-develop by @tusharbharti.
3 months ago
#3
- Keywords has-patch has-unit-tests added
PHP’s trim() function, by default, only strips a limited set of ASCII whitespace characters, and mb_trim(), introduced in PHP 8.4, does not behave identically to JavaScript’s String.prototype.trim().
This PR implements js_trim(), a PHP function that replicates JavaScript’s String.prototype.trim() behavior.
It works by defining a set of $js_trimmables characters, which are passed to mb_trim() with UTF-8 encoding.
In addition, this PR adds a polyfill for mb_trim() in compat.php to support PHP versions below 8.4 with unit tests for both js_trim() and mb_trim()
Trac ticket: https://core.trac.wordpress.org/ticket/63804
Thanks @takayukister — there are clearly growing needs to harmonize code between PHP and JavaScript. I wonder if it wouldn’t be clearer to think about functions acknowledging just that. For instance, instead of
wp_trim(), which itself would introduce a third new set of semantics, to writejs_trim(), where we could always compare the behavior of that function againstString.prototype.trim().Note also that we have existing mechanisms to match the behavior in
mb_trim()by passing a second parameter, the trim list. There’s another way to think about this proposal as providing that set of characters.