Skip to content

[DomCrawler] BC break with exotic charsets - error at Dom\HTMLDocument::createFromString() #62625

@ThomasLandauer

Description

@ThomasLandauer

Symfony version(s) affected

7.4.0

Description

In short:
I'm getting this error reported on line 1113 of Crawler.php

Dom\HTMLDocument::createFromString(): Argument #3 ($overrideEncoding) must be a valid document encoding

This line was added in #61475, so ping @nicolas-grekas

How to reproduce

More context:
I'm parsing emails, and if they don't have a plain-text part, I pass the body into new Crawler().

The stack trace shows it ran through line 161, so I'm guessing the problem arises from line 150, which probably parses some charset header - right?
My email has this MIME header:

Content-Type: text/html; charset=us-ascii

So I'm figuring that us-ascii gets passed down to Dom\HTMLDocument::createFromString(), but isn't supported there?
PHP docs don't list allowed charsets - is this machine dependent?

Possible Solution

My idea:
Wrap the Dom\HTMLDocument::createFromString() line into a try...catch, to see if it works. If not, just omit the detected charset, and let PHP do whatever it can.

Alternative:
If a list of allowed charsets does exist, then at the regex-detection, the charset could already be matched against that list (and dropped, in case). I'm not sure though if everything that runs through the regex will end up down at Dom\HTMLDocument::createFromString()...

Additional Context

No response

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions