Skip to content

Conversation

@tonygermano
Copy link
Member

Removes the dependency on xercesImpl and xml-apis to resolve classloader conflicts and reduce security risks in Java 17+ environments. Refactors custom Reader classes to use the standard JDK XMLReader interface instead of extending the Apache Xerces SAXParser.

Changes:

  • Dependency Removal: Removed xercesImpl and xml-apis from the build configuration.
  • New Base Class: Introduced AbstractXMLReader to centralize common XMLReader logic (handler management, feature/property compliance) and reduce code duplication across plugins.
  • Reader Refactoring: Updated DelimitedReader, EDIReader, ER7Reader, and NCPDPReader:
    • Replaced inheritance (extends SAXParser) with extends AbstractXMLReader.
    • Added standard parse(String) overload via the base class.
    • Implemented strict SAX2 compliance for getFeature/setFeature to correctly report namespace support to downstream transformers.
  • Code Cleanup: Removed unused imports of org.apache.xerces.parsers.SAXParser.
  • Documentation Cleanup: Removed references to removed libraries in THIRD-PARTY-README.txt

This change ensures the application uses the built-in JDK XML parser, reducing the artifact size and aligning with modern Java best practices.

@gibson9583 gibson9583 requested review from a team, gibson9583, kpalang, mgaffigan and pacmano1 and removed request for a team December 10, 2025 15:33
@pacmano1 pacmano1 requested a review from Copilot December 10, 2025 15:34
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR successfully removes the Xerces and xml-apis dependencies from the codebase, replacing them with a custom AbstractXMLReader base class that provides standard JDK XMLReader functionality. This modernization eliminates classloader conflicts, reduces security risks, and decreases artifact size by leveraging the built-in Java XML parser.

Key changes:

  • Introduced AbstractXMLReader to centralize XMLReader interface boilerplate and provide strict SAX2 compliance
  • Refactored 4 custom reader classes (DelimitedReader, EDIReader, ER7Reader, NCPDPReader) to extend AbstractXMLReader instead of Xerces SAXParser
  • Removed xercesImpl and xml-apis JARs from both client and server dependencies

Reviewed changes

Copilot reviewed 8 out of 12 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
server/src/org/openintegrationengine/engine/plugins/datatypes/AbstractXMLReader.java New base class providing XMLReader interface implementation with handler management and SAX2 feature compliance
server/src/com/mirth/connect/plugins/datatypes/ncpdp/NCPDPReader.java Refactored to extend AbstractXMLReader, added ensureHandlerSet() call, updated license header
server/src/com/mirth/connect/plugins/datatypes/hl7v2/ER7Reader.java Refactored to extend AbstractXMLReader, added ensureHandlerSet() call, updated license header
server/src/com/mirth/connect/plugins/datatypes/edi/EDIReader.java Refactored to extend AbstractXMLReader, added ensureHandlerSet() call, updated license header
server/src/com/mirth/connect/plugins/datatypes/delimited/DelimitedReader.java Refactored to extend AbstractXMLReader, added ensureHandlerSet() call, removed unused ContentHandler import, updated license header
server/docs/thirdparty/THIRD-PARTY-README.txt Removed documentation for xml-apis and Xerces dependencies
server/.classpath Removed xercesImpl-2.12.2.jar and xml-apis-1.4.01.jar entries
client/.classpath Removed xercesImpl-2.12.2.jar and xml-apis-1.4.01.jar entries

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

return;
}
else if ("http://xml.org/sax/features/namespace-prefixes".equals(name)) {
if (value == true) {
Copy link

Copilot AI Dec 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use value instead of value == true for consistency with boolean checks. This is the more idiomatic way to check boolean values in Java.

Change from:

if (value == true) {

To:

if (value) {
Suggested change
if (value == true) {
if (value) {

Copilot uses AI. Check for mistakes.
@Override
public void setFeature(String name, boolean value) throws SAXNotRecognizedException, SAXNotSupportedException {
if ("http://xml.org/sax/features/namespaces".equals(name)) {
if (value == false) {
Copy link

Copilot AI Dec 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use !value instead of value == false for consistency with the value == true check on line 221. This is the more idiomatic way to check boolean values in Java.

Change from:

if (value == false) {

To:

if (!value) {
Suggested change
if (value == false) {
if (!value) {

Copilot uses AI. Check for mistakes.
Copy link
Contributor

@mgaffigan mgaffigan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll approve after I can kick the tires a bit - nothing obvious apart from the bool comparisons copilot already called out.

I assume this is a breaking change for any data type plugin. Should be called out in release notes.

@tonygermano
Copy link
Member Author

I'll approve after I can kick the tires a bit - nothing obvious apart from the bool comparisons copilot already called out.

I assume this is a breaking change for any data type plugin. Should be called out in release notes.

Yes, it would be a breaking change for anything that specifically imports from packages below org.apache.xerces or perhaps relies on behavior specific to the older parsers or transformers. If another datatype had cloned one of these as a starting point it would be affected. These are all datatypes that convert from something that isn't xml to xml. It doesn't affect the DICOM or strict hl7 types since those use dcm4che and hapi for the xml conversion rather than something implemented in this project.

I am going to have follow-up PRs which are dependent on this one to remove other outdated xml dependencies, but this one had to go first since the standalone xerces release only supports JAXP version 1.4, which is what was bundled with Java 6. The internal implementation has been at JAXP 1.6 since Java 8. With this jar on the classpath, standard javax.xml.parsers factory classes were utilizing the older implementation which didn't recognize current features.

To test this I created channels where the source inbound type was the target datatype and the outbound type was XML. The destination reversed this and went from xml to the target type. I created a sample message and verified the transformations to and from xml so the final output was the same as the original. I did this for delimited, hl7v2, and edi. I did not know where to find a sample message to test NCPDP, but I assume it's fine since the others all worked.

The bool comparison thing I went back and forth on before I opened the PR. I decided to keep it with the equality comparison since the parameter represents a value to be set, and is not meant for flow control, so it made more sense to me to do the explicit comparison as I would do for any value in a key/val pair that wasn't a boolean. I don't have a strong preference either way, though.

mgaffigan
mgaffigan previously approved these changes Dec 10, 2025
Copy link
Contributor

@mgaffigan mgaffigan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested by creating type to xml channels on Mirth 4.4.0 and this branch. Compared results given same source messages.

Here's a NCPDP Telecom Sample File. Note: File includes non-printable characters.

kayyagari
kayyagari previously approved these changes Dec 16, 2025
Copy link

@kayyagari kayyagari left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Built with Java 17 and tested using a channel with both inbound and outbound datatypes set to a combination of Hl7v2.x and XML.

Copy link
Collaborator

@NicoPiel NicoPiel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the motivation includes “correctly report namespace support to downstream transformers,” can we add a regression test that runs a reader through the actual transformation path (SAXSource/Transformer) to confirm no additional feature/property probes break execution?

*/
@Override
public Object getProperty(String name) throws SAXNotRecognizedException, SAXNotSupportedException {
throw new SAXNotRecognizedException("Property not recognized: " + name);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some transformer pipelines probe standard SAX properties (e.g., lexical handler). Should we recognize a minimal set to reduce compatibility risk, even if unsupported?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to LexicalHandler, support for this is optional.

If the reader does not report lexical events, it will throw a SAXNotRecognizedException when you attempt to register the handler.

*/
@Override
public void setProperty(String name, Object value) throws SAXNotRecognizedException, SAXNotSupportedException {
throw new SAXNotRecognizedException("Property not recognized: " + name);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some transformer pipelines probe standard SAX properties (e.g., lexical handler). Should we recognize a minimal set to reduce compatibility risk, even if unsupported?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to LexicalHandler, support for this is optional.

If the reader does not report lexical events, it will throw a SAXNotRecognizedException when you attempt to register the handler.

@tonygermano tonygermano dismissed stale reviews from kayyagari and mgaffigan via 5403e8f December 18, 2025 06:20
tonygermano and others added 2 commits December 18, 2025 02:46
Removes the dependency on `xercesImpl` and `xml-apis` to resolve
classloader conflicts and reduce security risks in Java 17+
environments. Refactors custom Reader classes to use the standard JDK
`XMLReader` interface instead of extending the Apache Xerces
`SAXParser`.

Changes:
- **Dependency Removal:** Removed `xercesImpl` and `xml-apis` from the
  build configuration.
- **New Base Class:** Introduced `AbstractXMLReader` to centralize
  common `XMLReader` logic (handler management, feature/property
  compliance) and reduce code duplication across plugins.
- **Reader Refactoring:** Updated `DelimitedReader`, `EDIReader`,
  `ER7Reader`, and `NCPDPReader`:
  - Replaced inheritance (`extends SAXParser`) with `extends AbstractXMLReader`.
  - Added standard `parse(String)` overload via the base class.
  - Implemented strict SAX2 compliance for `getFeature`/`setFeature`
    to correctly report namespace support to downstream transformers.
- **Code Cleanup:** Removed unused imports of
  `org.apache.xerces.parsers.SAXParser`.
- **Documentation Cleanup:** Removed references to removed libraries in
  `THIRD-PARTY-README.txt`

This change ensures the application uses the built-in JDK XML parser,
reducing the artifact size and aligning with modern Java best practices.

Signed-off-by: Tony Germano <tony@germano.name>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Signed-off-by: Tony Germano <39179051+tonygermano@users.noreply.github.com>
@tonygermano tonygermano marked this pull request as draft December 18, 2025 14:18
Signed-off-by: Tony Germano <tony@germano.name>
Signed-off-by: Tony Germano <tony@germano.name>
According to the javadoc for ContentHandler.startElement, if there are
no attributes, then an empty Attributes object should be passed rather
than null.

For both startElement and endElement if qualified names are not
available then the qname parameter should be an empty string. It was
already being used this way in most instances, but a few were passing
null instead.

Signed-off-by: Tony Germano <tony@germano.name>
@tonygermano
Copy link
Member Author

@NicoPiel

Since the motivation includes “correctly report namespace support to downstream transformers,” can we add a regression test that runs a reader through the actual transformation path (SAXSource/Transformer) to confirm no additional feature/property probes break execution?

"correctly report namespace support to downstream transformers" is alluding to the fact that the XMLReader.setFeature method states:

All XMLReaders are required to support setting http://xml.org/sax/features/namespaces to true and http://xml.org/sax/features/namespace-prefixes to false.

These features affect the argument values used when the XMLReader calls ContentHandler.startElement and ContentHandler.endElement. None of the plugins that extend this class actually read XML, but read other file types and convert them to xml, so nearly all xml features and properties will be unsupported. The existing plugins were hard coded (for the most part) to respond as if the two required features are in their minimum required state. Subclasses that wish to add support for additional features or properties can override the methods in this abstract class.

The main purpose of this PR was to remove the imports of org.apache.xerces.parsers.SAXParser. The proper way to do this seemed to be to implement XMLReader instead of extending org.apache.xerces.parsers.SAXParser. I created the abstract class to avoid having the same boilerplate implementations in all four plugins.

can we add a regression test that runs a reader through the actual transformation path (SAXSource/Transformer) to confirm no additional feature/property probes break execution?

I pushed some additional updates that now include tests for the new class and some additional changes after reading more of the interface javadocs to make sure the plugins follow the contract (mainly passing an empty Attributes object instead of null.) I think the current behavior of throwing exceptions for any probes for unsupported options is correct per the spec, and to pretend that we support a feature/property that we don't support could be worse for compatibility.

When merging I plan to squash the first 4 commits and keep the last one separate.

@tonygermano tonygermano marked this pull request as ready for review December 19, 2025 12:58
@tonygermano tonygermano requested a review from NicoPiel December 19, 2025 12:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants