-
-
Notifications
You must be signed in to change notification settings - Fork 9.8k
[DomCrawler] Fall back to DOMDocument parser to create invalid elements #62252
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| } catch (\DOMException) { | ||
| continue; | ||
| $dom = new \DOMDocument('1.0', $target->encoding); | ||
| $dom->loadHTML('<'.$source->tagName.'>', \LIBXML_HTML_NOIMPLIED | \LIBXML_HTML_NODEFDTD); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure I fully understand what this code is supposed to do.
Are there more of these workarounds? I find this concerning. There are a lot of subtle rules that apply here in the DOM spec. For example, there is the rule about namespaces, there are rules about parser context (which you don't take into account here). What about the namespace of the created tag? What about the uppercase/lowercase wrt the namespace rules of this tag?
TLDR: this looks dangerous. I guess at one point I should make my slides / content public about parser differentials and how they almost always lead to XSS or other types of injection.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense, I had also this feeling in #62240 (comment)
OK to close @longwave?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes let's close - we can't hope to emulate the exact behaviour of the previous parser in all edge cases, we will work around it in the Drupal test suite instead.
Followup to #62240.
Instead of bailing out immediately on error, we now try harder to create an invalid element by passing any malformed tag into
DOMDocument::loadHTML()for it to parse.