Conversation
|
The following accounts have interacted with this PR and/or linked issues. I will continue to update these lists as activity occurs. You can also manually ask me to refresh this list by adding the Core Committers: Use this line as a base for the props when committing in SVN: To understand the WordPress project's expectations around crediting contributors, please review the Contributor Attribution page in the Core Handbook. |
Trac ticket: Core-63694 `wp_kses_hair()` is built around an impressive state machine for parsing the `$attr` of an HTML tag, that is, the span of text after the tag name and before the closing `>`. Unfortunately, that parsing code doesn’t fully-implement the HTML specification and may be prone to mis-parsing. This patch replaces the existing state machine with a straight-forward use of the HTML API to parse the attributes for us, constructing a shell take for the `$attr` string and reading the attributes structurally. This shell is necessary because a previous stage of the pipeline has already separated what it thinks is the so-called “attribute list” from a tag. Props: dmsnell
68c7746 to
b476339
Compare
Trac ticket: Core-63694 `wp_kses_hair()` is built around an impressive state machine for parsing the `$attr` of an HTML tag, that is, the span of text after the tag name and before the closing `>`. Unfortunately, that parsing code doesn’t fully-implement the HTML specification and may be prone to mis-parsing. This patch replaces the existing state machine with a straight-forward use of the HTML API to parse the attributes for us, constructing a shell take for the `$attr` string and reading the attributes structurally. This shell is necessary because a previous stage of the pipeline has already separated what it thinks is the so-called “attribute list” from a tag. Props: dmsnell
b476339 to
6146ecd
Compare
Trac ticket: Core-63694 `wp_kses_hair()` is built around an impressive state machine for parsing the `$attr` of an HTML tag, that is, the span of text after the tag name and before the closing `>`. Unfortunately, that parsing code doesn’t fully-implement the HTML specification and may be prone to mis-parsing. This patch replaces the existing state machine with a straight-forward use of the HTML API to parse the attributes for us, constructing a shell take for the `$attr` string and reading the attributes structurally. This shell is necessary because a previous stage of the pipeline has already separated what it thinks is the so-called “attribute list” from a tag. Props: dmsnell
6146ecd to
d64f56e
Compare
Test using WordPress PlaygroundThe changes in this pull request can previewed and tested using a WordPress Playground instance. WordPress Playground is an experimental project that creates a full WordPress instance entirely within the browser. Some things to be aware of
For more details about these limitations and more, check out the Limitations page in the WordPress Playground documentation. |
Prep work for WordPress#9248 See also WordPress#9251
Prep work for WordPress#9248 See also WordPress#9251
Prep work for WordPress#9252. Part of WordPress#9248.
) Prep work for WordPress#9252. Part of WordPress#9248.
) Prep work for WordPress#9252. Part of WordPress#9248.
Prep work for WordPress#9248. # Conflicts: # tests/phpunit/tests/comment.php
Trac ticket: Core-63694 `wp_kses_hair()` is built around an impressive state machine for parsing the `$attr` of an HTML tag, that is, the span of text after the tag name and before the closing `>`. Unfortunately, that parsing code doesn’t fully-implement the HTML specification and may be prone to mis-parsing. This patch replaces the existing state machine with a straight-forward use of the HTML API to parse the attributes for us, constructing a shell take for the `$attr` string and reading the attributes structurally. This shell is necessary because a previous stage of the pipeline has already separated what it thinks is the so-called “attribute list” from a tag. Props: dmsnell
d119749 to
e525b75
Compare
Trac ticket: Core-63694 `wp_kses_hair()` is built around an impressive state machine for parsing the `$attr` of an HTML tag, that is, the span of text after the tag name and before the closing `>`. Unfortunately, that parsing code doesn’t fully-implement the HTML specification and may be prone to mis-parsing. This patch replaces the existing state machine with a straight-forward use of the HTML API to parse the attributes for us, constructing a shell take for the `$attr` string and reading the attributes structurally. This shell is necessary because a previous stage of the pipeline has already separated what it thinks is the so-called “attribute list” from a tag. Props: dmsnell
e525b75 to
9ae464a
Compare
Trac ticket: Core-63694 `wp_kses_hair()` is built around an impressive state machine for parsing the `$attr` of an HTML tag, that is, the span of text after the tag name and before the closing `>`. Unfortunately, that parsing code doesn’t fully-implement the HTML specification and may be prone to mis-parsing. This patch replaces the existing state machine with a straight-forward use of the HTML API to parse the attributes for us, constructing a shell take for the `$attr` string and reading the attributes structurally. This shell is necessary because a previous stage of the pipeline has already separated what it thinks is the so-called “attribute list” from a tag. Props: dmsnell
9ae464a to
5e878cf
Compare
|
This function had quirks that change with this PR and I want to understand them. I created a test suite for My review of the most common usages on suggest that this change is safe to make and would not negatively impact plugin authors.
This is a tricky question. It doesn't seem like folks rely on specifics of the input representation being present in the output, however it's certainly possible. In one of the examples from plugins, After some reflection, I believe the behavior you've implemented here is a good decision. Consider that the input is HTML and the output ( behavior diffdiff --git a/tests/phpunit/tests/kses/wpKsesHair.php b/tests/phpunit/tests/kses/wpKsesHair.php
index 2ed83679f2e3d..05d573bc070bc 100644
--- a/tests/phpunit/tests/kses/wpKsesHair.php
+++ b/tests/phpunit/tests/kses/wpKsesHair.php
@@ -57,7 +57,7 @@ public function data_attribute_parsing() {
'title' => array(
'name' => 'title',
'value' => 'My Title',
- 'whole' => "title='My Title'",
+ 'whole' => 'title="My Title"',
'vless' => 'n',
),
),
@@ -188,8 +188,8 @@ public function data_attribute_parsing() {
array(
'title' => array(
'name' => 'title',
- 'value' => '<test>',
- 'whole' => 'title="<test>"',
+ 'value' => '<test>',
+ 'whole' => 'title="<test>"',
'vless' => 'n',
),
),
@@ -200,8 +200,8 @@ public function data_attribute_parsing() {
array(
'title' => array(
'name' => 'title',
- 'value' => '<hex>',
- 'whole' => 'title="<hex>"',
+ 'value' => '<hex>',
+ 'whole' => 'title="<hex>"',
'vless' => 'n',
),
),
@@ -212,8 +212,8 @@ public function data_attribute_parsing() {
array(
'title' => array(
'name' => 'title',
- 'value' => '<HEX>',
- 'whole' => 'title="<HEX>"',
+ 'value' => '<HEX>',
+ 'whole' => 'title="<HEX>"',
'vless' => 'n',
),
),
@@ -224,8 +224,8 @@ public function data_attribute_parsing() {
array(
'title' => array(
'name' => 'title',
- 'value' => '&invalid; &#; &#x;',
- 'whole' => 'title="&invalid; &#; &#x;"',
+ 'value' => '&invalid; &#; &#x;',
+ 'whole' => 'title="&invalid; &#; &#x;"',
'vless' => 'n',
),
),
@@ -249,7 +249,7 @@ public function data_attribute_parsing() {
'data-text' => array(
'name' => 'data-text',
'value' => 'Single quoted value',
- 'whole' => "data-text='Single quoted value'",
+ 'whole' => 'data-text="Single quoted value"',
'vless' => 'n',
),
),
@@ -267,7 +267,7 @@ public function data_attribute_parsing() {
'alt' => array(
'name' => 'alt',
'value' => 'single',
- 'whole' => "alt='single'",
+ 'whole' => 'alt="single"',
'vless' => 'n',
),
'id' => array(
@@ -284,8 +284,8 @@ public function data_attribute_parsing() {
array(
'title' => array(
'name' => 'title',
- 'value' => "It's working",
- 'whole' => 'title="It\'s working"',
+ 'value' => 'It's working',
+ 'whole' => 'title="It's working"',
'vless' => 'n',
),
),
@@ -296,8 +296,8 @@ public function data_attribute_parsing() {
array(
'title' => array(
'name' => 'title',
- 'value' => 'He said "hello"',
- 'whole' => 'title=\'He said "hello"\'',
+ 'value' => 'He said "hello"',
+ 'whole' => 'title="He said "hello""',
'vless' => 'n',
),
),
@@ -327,12 +327,32 @@ public function data_attribute_parsing() {
yield 'invalid attribute name starting with number' => array(
'1invalid="value"',
- array(),
+ array(
+ '1invalid' => array(
+ 'name' => '1invalid',
+ 'value' => 'value',
+ 'whole' => '1invalid="value"',
+ 'vless' => 'n',
+ ),
+ ),
);
yield 'invalid attribute name special chars' => array(
'@invalid="value" $bad="value"',
- array(),
+ array(
+ '@invalid' => array(
+ 'name' => '@invalid',
+ 'value' => 'value',
+ 'whole' => '@invalid="value"',
+ 'vless' => 'n',
+ ),
+ '$bad' => array(
+ 'name' => '$bad',
+ 'value' => 'value',
+ 'whole' => '$bad="value"',
+ 'vless' => 'n',
+ ),
+ ),
);
yield 'duplicate attributes first wins' => array(
@@ -355,7 +375,20 @@ public function data_attribute_parsing() {
yield 'malformed unclosed double quote' => array(
'title="unclosed class="test"',
- array(),
+ array(
+ 'title' => array(
+ 'name' => 'title',
+ 'value' => 'unclosed class=',
+ 'whole' => 'title="unclosed class="',
+ 'vless' => 'n',
+ ),
+ 'test"' => array(
+ 'name' => 'test"',
+ 'value' => '',
+ 'whole' => 'test"',
+ 'vless' => 'y',
+ ),
+ ),
);
yield 'very long attribute value' => array(
@@ -610,7 +643,7 @@ public function data_attribute_parsing() {
'alt' => array(
'name' => 'alt',
'value' => '',
- 'whole' => "alt=''",
+ 'whole' => 'alt=""',
'vless' => 'n',
),
'class' => array(
@@ -625,7 +658,7 @@ public function data_attribute_parsing() {
yield 'forward slashes between attributes' => array(
'att / att2=2 /// att3="3"',
array(
- 'att' => array(
+ 'att' => array(
'name' => 'att',
'value' => '',
'whole' => 'att',
@@ -652,13 +685,13 @@ public function data_attribute_parsing() {
'att' => array(
'name' => 'att',
'value' => 'val',
- 'whole' => "att='val'",
+ 'whole' => 'att="val"',
'vless' => 'n',
),
'att2' => array(
'name' => 'att2',
'value' => 'val2',
- 'whole' => "att2='val2'",
+ 'whole' => 'att2="val2"',
'vless' => 'n',
),
),
@@ -670,13 +703,13 @@ public function data_attribute_parsing() {
'att' => array(
'name' => 'att',
'value' => 'val',
- 'whole' => "att='val'",
+ 'whole' => 'att="val"',
'vless' => 'n',
),
'att2' => array(
'name' => 'att2',
'value' => 'val2',
- 'whole' => "att2='val2'",
+ 'whole' => 'att2="val2"',
'vless' => 'n',
),
),
@@ -688,13 +721,13 @@ public function data_attribute_parsing() {
'att' => array(
'name' => 'att',
'value' => 'val',
- 'whole' => "att='val'",
+ 'whole' => 'att="val"',
'vless' => 'n',
),
'att2' => array(
'name' => 'att2',
'value' => 'val2',
- 'whole' => "att2='val2'",
+ 'whole' => 'att2="val2"',
'vless' => 'n',
),
),
@@ -706,13 +739,13 @@ public function data_attribute_parsing() {
'att' => array(
'name' => 'att',
'value' => 'val',
- 'whole' => "att='val'",
+ 'whole' => 'att="val"',
'vless' => 'n',
),
'att2' => array(
'name' => 'att2',
'value' => 'val2',
- 'whole' => "att2='val2'",
+ 'whole' => 'att2="val2"',
'vless' => 'n',
),
),
@@ -739,34 +772,67 @@ public function data_attribute_parsing() {
// Malformed Equals Patterns.
yield 'multiple equals signs' => array(
'att=="val"',
- array(),
+ array(
+ 'att' => array(
+ 'name' => 'att',
+ 'value' => '="val"',
+ 'whole' => 'att="="val""',
+ 'vless' => 'n',
+ ),
+ ),
);
yield 'equals with strange spacing' => array(
'att= ="val"',
- array(),
+ array(
+ 'att' => array(
+ 'name' => 'att',
+ 'value' => '="val"',
+ 'whole' => 'att="="val""',
+ 'vless' => 'n',
+ ),
+ ),
);
yield 'triple equals signs' => array(
'att==="val"',
- array(),
+ array(
+ 'att' => array(
+ 'name' => 'att',
+ 'value' => '=="val"',
+ 'whole' => 'att="=="val""',
+ 'vless' => 'n',
+ ),
+ ),
);
yield 'equals echo pattern' => array(
"att==echo 'something'",
array(
- 'att' => array(
+ 'att' => array(
'name' => 'att',
'value' => '=echo',
'whole' => 'att="=echo"',
'vless' => 'n',
),
+ "'something'" => array(
+ 'name' => "'something'",
+ 'value' => '',
+ 'whole' => "'something'",
+ 'vless' => 'y',
+ ),
),
);
yield 'attribute starting with equals' => array(
'= bool k=v',
array(
+ '=' => array(
+ 'name' => '=',
+ 'value' => '',
+ 'whole' => '=',
+ 'vless' => 'y',
+ ),
'bool' => array(
'name' => 'bool',
'value' => '',
@@ -785,18 +851,43 @@ public function data_attribute_parsing() {
yield 'mixed quotes and equals chaos' => array(
'k=v ="' . "' j=w",
array(
- 'k' => array(
+ 'k' => array(
'name' => 'k',
'value' => 'v',
'whole' => 'k="v"',
'vless' => 'n',
),
+ '="' . "'" => array(
+ 'name' => '="' . "'",
+ 'value' => '',
+ 'whole' => '="' . "'",
+ 'vless' => 'y',
+ ),
+ 'j' => array(
+ 'name' => 'j',
+ 'value' => 'w',
+ 'whole' => 'j="w"',
+ 'vless' => 'n',
+ ),
),
);
yield 'triple equals quoted whitespace' => array(
'===" "',
- array(),
+ array(
+ '=' => array(
+ 'name' => '=',
+ 'value' => '="',
+ 'whole' => '=="=""',
+ 'vless' => 'n',
+ ),
+ '"' => array(
+ 'name' => '"',
+ 'value' => '',
+ 'whole' => '"',
+ 'vless' => 'y',
+ ),
+ ),
);
yield 'boolean with contradictory value' => array(
@@ -820,7 +911,13 @@ public function data_attribute_parsing() {
yield 'empty attribute name with value' => array(
'="value" class="test"',
array(
- 'class' => array(
+ '="value"' => array(
+ 'name' => '="value"',
+ 'value' => '',
+ 'whole' => '="value"',
+ 'vless' => 'y',
+ ),
+ 'class' => array(
'name' => 'class',
'value' => 'test',
'whole' => 'class="test"',
@@ -890,7 +987,7 @@ public function data_protocol_filtering() {
'href' => array(
'name' => 'href',
'value' => 'alert(1)',
- 'whole' => "href='alert(1)'",
+ 'whole' => 'href="alert(1)"',
'vless' => 'n',
),
),
@@ -925,8 +1022,8 @@ public function data_protocol_filtering() {
array(
'src' => array(
'name' => 'src',
- 'value' => 'text/html,<script>alert(1)</script>',
- 'whole' => 'src="text/html,<script>alert(1)</script>"',
+ 'value' => 'text/html,<script>alert(1)</script>',
+ 'whole' => 'src="text/html,<script>alert(1)</script>"',
'vless' => 'n',
),
),Here are two examples from the most most popular plugins in the WP Directory search: From YITH (this appears to be part of the yith library used in many of their plugins): function yith_plugin_fw_html_attributes_to_string( $attributes = array(), $echo = false ) {
$html_attributes = '';
if ( ! ! $attributes ) {
if ( is_string( $attributes ) ) {
$parsed_attrs = wp_kses_hair( $attributes, wp_allowed_protocols() );
$attributes = array();
foreach ( $parsed_attrs as $attr ) {
$attributes[ $attr['name'] ] = 'n' === $attr['vless'] ? $attr['value'] : null;
}
}
if ( is_array( $attributes ) ) {
$html_attributes = array();
foreach ( $attributes as $key => $value ) {
if ( ! is_null( $value ) ) {
$html_attributes[] = esc_attr( $key ) . '="' . esc_attr( $value ) . '"';
} else {
$html_attributes[] = esc_attr( $key );
}
}
$html_attributes = implode( ' ', $html_attributes );
}
}
if ( $echo ) {
// Already escaped above.
echo $html_attributes; // phpcs:ignore WordPress.Security.EscapeOutput.OutputNotEscaped
}
return $html_attributes;
} $params = wp_kses_hair( $params, array( 'http' ) );
$width = isset( $params['width'] ) ? (int) $params['width']['value'] : 0;
$height = isset( $params['height'] ) ? (int) $params['height']['value'] : 0;
$wh = '';
if ( $width && $height ) {
$wh = "&w=$width&h=$height";
}
$url = esc_url_raw( "https://www.youtube.com/watch?v={$match[3]}{$wh}" ); |
sirreal
left a comment
There was a problem hiding this comment.
This seems like a good improvement.
src/wp-includes/kses.php
Outdated
| * ); | ||
| * | ||
| * @since 1.0.0 | ||
| * @since 6.9.0 Rebuilt on HTML API |
There was a problem hiding this comment.
| * @since 6.9.0 Rebuilt on HTML API | |
| * @since 7.0.0 Rebuilt on HTML API |
Trac ticket: Core-63694 `wp_kses_hair()` is built around an impressive state machine for parsing the `$attr` of an HTML tag, that is, the span of text after the tag name and before the closing `>`. Unfortunately, that parsing code doesn’t fully-implement the HTML specification and may be prone to mis-parsing. This patch replaces the existing state machine with a straight-forward use of the HTML API to parse the attributes for us, constructing a shell take for the `$attr` string and reading the attributes structurally. This shell is necessary because a previous stage of the pipeline has already separated what it thinks is the so-called “attribute list” from a tag. Props: adamziel, dmsnell, jonsurrell, jorbin, westonruter. Co-authored-by: Adam Zieliński <zieladam@git.wordpress.org> Co-authored-by: Aaron Jorbin <jorbin@git.wordpress.org> Co-authored-by: Jon Surrell <jonsurrell@git.wordpress.org> CO-authored-by: Weston Ruter <westonruter@git.wordpress.org> Github-PR: 9248 Github-PR-URL: WordPress#9248 Trac-Ticket: 63694 Trac-Ticket-URL: https://core.trac.wordpress.org/ticket/63724 Git-Branch: html-api/refactor-wp-kses-hair-take-3
Trac ticket: Core-63724
Replaces #7407, dmsnell#5
Coordination in #9256
Review feedback
valueandwholeproperties of the returned array indicate the raw parsed bytes from the HTML (with some exceptions). This means that HTML character references are not decoded. This represents an abstraction leak between the HTML and structural return value.Implementation
wp_kses_hair()is built around an impressive state machine for parsing the$attrof an HTML tag, that is, the span of text after the tag name and before the closing>. Unfortunately, that parsing code doesn’t fully-implement the HTML specification and may be prone to mis-parsing.This patch replaces the existing state machine with a straight-forward use of the HTML API to parse the attributes for us, constructing a shell tag for the
$attrstring and reading the attributes structurally. This shell is necessary because a previous stage of the pipeline has already separated what it thinks is the so-called “attribute list” from a tag.Dependencies
relkeywords toAelements. #9252assertEqualHTML()inwp_rel_nofollow()tests. #9251assertEqualHTMLinwp_rel_ugc()tests. #9255assertEqualHTML()inwp_kses()tests. #9257assertEqualHTML()in post filtering tests. #9258oEmbedfiltering tests. #9259assertEqualHTML()in media tests. #9264