[DomCrawler] Fall back to DOMDocument parser to create invalid elements #62252

longwave · 2025-10-31T12:25:10Z

Q	A
Branch?	7.4
Bug fix?	yes
New feature?	no
Deprecations?	no
Issues	Fix #62236
License	MIT

Followup to #62240.

Instead of bailing out immediately on error, we now try harder to create an invalid element by passing any malformed tag into DOMDocument::loadHTML() for it to parse.

src/Symfony/Component/DomCrawler/Tests/CrawlerTest.php

src/Symfony/Component/DomCrawler/Crawler.php

ndossche · 2025-10-31T19:03:49Z

src/Symfony/Component/DomCrawler/Crawler.php

                } catch (\DOMException) {
-                    continue;
+                    $dom = new \DOMDocument('1.0', $target->encoding);
+                    $dom->loadHTML('<'.$source->tagName.'>', \LIBXML_HTML_NOIMPLIED | \LIBXML_HTML_NODEFDTD);


I'm not sure I fully understand what this code is supposed to do.
Are there more of these workarounds? I find this concerning. There are a lot of subtle rules that apply here in the DOM spec. For example, there is the rule about namespaces, there are rules about parser context (which you don't take into account here). What about the namespace of the created tag? What about the uppercase/lowercase wrt the namespace rules of this tag?
TLDR: this looks dangerous. I guess at one point I should make my slides / content public about parser differentials and how they almost always lead to XSS or other types of injection.

Makes sense, I had also this feeling in #62240 (comment)

OK to close @longwave?

Yes let's close - we can't hope to emulate the exact behaviour of the previous parser in all edge cases, we will work around it in the Drupal test suite instead.

Try harder to create invalid elements.

8d4231e

carsonbot added Status: Needs Review Bug DomCrawler labels Oct 31, 2025

carsonbot added this to the 7.4 milestone Oct 31, 2025

longwave mentioned this pull request Oct 31, 2025

[DomCrawler] Native HTML5 parser throws new DOMException on HTML tag containing ampersand #62236

Closed

nicolas-grekas reviewed Oct 31, 2025

View reviewed changes

src/Symfony/Component/DomCrawler/Tests/CrawlerTest.php Show resolved Hide resolved

nicolas-grekas reviewed Oct 31, 2025

View reviewed changes

src/Symfony/Component/DomCrawler/Crawler.php Outdated Show resolved Hide resolved

longwave added 2 commits October 31, 2025 12:42

Unify tests.

a551bf3

Create empty doc instead of cloning.

28b36a5

nicolas-grekas approved these changes Oct 31, 2025

View reviewed changes

carsonbot added Status: Reviewed and removed Status: Needs Review labels Oct 31, 2025

nicolas-grekas reviewed Oct 31, 2025

View reviewed changes

src/Symfony/Component/DomCrawler/Crawler.php Outdated Show resolved Hide resolved

longwave added 3 commits October 31, 2025 13:09

Simplify DOM setup.

d1aba08

Remove silencer.

c0bb696

Prefix constants.

f150360

ndossche reviewed Oct 31, 2025

View reviewed changes

longwave closed this Nov 1, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[DomCrawler] Fall back to DOMDocument parser to create invalid elements #62252

[DomCrawler] Fall back to DOMDocument parser to create invalid elements #62252

Uh oh!

longwave commented Oct 31, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ndossche Oct 31, 2025 •

edited

Loading

Uh oh!

nicolas-grekas Nov 1, 2025

Uh oh!

longwave Nov 1, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Uh oh!

[DomCrawler] Fall back to DOMDocument parser to create invalid elements #62252

[DomCrawler] Fall back to DOMDocument parser to create invalid elements #62252

Uh oh!

Conversation

longwave commented Oct 31, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ndossche Oct 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

nicolas-grekas Nov 1, 2025

Choose a reason for hiding this comment

Uh oh!

longwave Nov 1, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

ndossche Oct 31, 2025 •

edited

Loading