Skip to content

Conversation

@cranberryofdoom
Copy link
Contributor

Context

We're currently using libxmljs to parse XML for SAML IDP metadata. When we fetch the metadata from the service, the response.body may sometimes come back as a string with the following invisible first character: \uFEFF.

This character is the Unicode Byte Order Mark (BOM). It’s a special invisible character that's often used at the beginning of text files to indicate the byte order (endianness) of the file, and it is particularly common for UTF-8 encoded files.

I noticed that the behavior between libxmljs.parseXml and libxmljs.parseXmlAsync handled this invisible character differently.

libxmljs.parseXml succeeds with no issue because if the buffer is a string, it will default to an encoding of UTF-8.

However, libxmljs.parseXmlAsync fails because it only uses DEFAULT_XML_PARSE_OPTIONS.encoding. This is not defined, which then it throws with the following error: Error: Start tag expected, '<' not found.

The Change

libxmljs.parseXml and libxmljs.parseXmlAsync both now have the following as its encoding argument: options.encoding || DEFAULT_XML_PARSE_OPTIONS.encoding || (typeof buffer === "string" ? "UTF-8" : null), which should handle this invisible character case.

@rchipka
Copy link
Member

rchipka commented Apr 8, 2025

Nice work, thanks!!

@rchipka rchipka merged commit 39092a9 into libxmljs:master Apr 8, 2025
0 of 3 checks passed
axel-capodaglio pushed a commit to axel-capodaglio/libxmljs that referenced this pull request Apr 11, 2025
@elb-notion
Copy link

Hey @rchipka, will there be a new release any time soon? Would love to get some of the recent updates in without having to import the github repo instead of the package

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants