valid_unicode( int $i ): bool

Determines if a Unicode codepoint is valid.

Description

The definition of a valid Unicode codepoint is taken from the XML definition:

Characters

… Legal characters are tab, carriage return, line feed, and the legal characters of Unicode and ISO/IEC 10646.
… Char ::= #x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] | [#x10000-#x10FFFF]

See also

Parameters

$iintrequired
Unicode codepoint.

Return

bool Whether or not the codepoint is a valid Unicode codepoint.

Source

function valid_unicode( $i ) {
	$i = (int) $i;

	return (
		0x9 === $i || // U+0009 HORIZONTAL TABULATION (HT)
		0xA === $i || // U+000A LINE FEED (LF)
		0xD === $i || // U+000D CARRIAGE RETURN (CR)
		/*
		 * The valid Unicode characters according to the XML specification:
		 *
		 * > any Unicode character, excluding the surrogate blocks, FFFE, and FFFF.
		 */
		( 0x20 <= $i && $i <= 0xD7FF ) ||
		( 0xE000 <= $i && $i <= 0xFFFD ) ||
		( 0x10000 <= $i && $i <= 0x10FFFF )
	);
}

Changelog

VersionDescription
2.7.0Introduced.

User Contributed Notes

You must log in before being able to contribute a note or feedback.