Skip to content

Avoid using URL.openStream due to JDK cache#6269

Closed
headius wants to merge 1 commit intojruby:masterfrom
headius:avoid_url_openstream
Closed

Avoid using URL.openStream due to JDK cache#6269
headius wants to merge 1 commit intojruby:masterfrom
headius:avoid_url_openstream

Conversation

@headius
Copy link
Member

@headius headius commented Jun 5, 2020

The caching done by OpenJDK's jar connection logic appears to be
sensitive to heavy concurrent use. When requesting that a URL open
a stream on itself, the default behavior is for it to use an
internal cache of previously opened jar files. A map from the URL
to the JarFile is maintained, where JarFile eventually bottoms out
in an open ZipFile.

Unfortunately URLClassLoader uses the same cache to reuse open jar
files, while also adding those files to its list of closables.
When the URLClassLoader itself is closed, all closables are
closed, which may end up closing a cached JarFile.

So if different URLClassLoader instances are used to access the
same jar files, it is likely that a shared JarFile instance will
be used and eventually closed, triggering errors.

This commit alters our use of URL.openStream to always instead
open a URLConnection and then disable caching before proceeding to
actually open the stream. This avoids the use of cached jar files,
ensuring we don't hit this bug.

A similar change will likely also need to be done for all places
where we do ClassLoader.getResourceAsStream, since if the
classloader is a URLClassLoader it will backend on this same cache
and once again eat the poison pill.

Part of, and possibly sufficient for, issue #6218.

The caching done by OpenJDK's jar connection logic appears to be
sensitive to heavy concurrent use. When requesting that a URL open
a stream on itself, the default behavior is for it to use an
internal cache of previously opened jar files. A map from the URL
to the JarFile is maintained, where JarFile eventually bottoms out
in an open ZipFile.

Unfortunately URLClassLoader uses the same cache to reuse open jar
files, while also adding those files to its list of closables.
When the URLClassLoader itself is closed, all closables are
closed, which may end up closing a cached JarFile.

So if different URLClassLoader instances are used to access the
same jar files, it is likely that a shared JarFile instance will
be used and eventually closed, triggering errors.

This commit alters our use of URL.openStream to always instead
open a URLConnection and then disable caching before proceeding to
actually open the stream. This avoids the use of cached jar files,
ensuring we don't hit this bug.

A similar change will likely also need to be done for all places
where we do ClassLoader.getResourceAsStream, since if the
classloader is a URLClassLoader it will backend on this same cache
and once again eat the poison pill.

Part of, and possibly sufficient for, issue jruby#6218.
@headius
Copy link
Member Author

headius commented Jun 9, 2020

I've merged a less-invasive fix in #6273, which leaves system jar files open but closes any that we created ourselves (for nested jars). This PR would be necessary to fully isolate us from the broken JarFile cache in OpenJDK, but I think #6273 will effectively avoid the bug for all JRuby use cases.

headius added a commit to headius/jruby that referenced this pull request Aug 18, 2020
Closing the URL classloader is the most hygenic way to tidy up,
but due to jruby#6269 we must avoid doing so for jar files already on
the filesystem. We fixed this previously by simply leaving the
classloader open and attempting to manually close all temporary
jar files it may have encountered. This seems to be insufficient,
since at least on Windows those jar files are being opened with
a different stream type that does not expose the original jar
file.

This patch uses a different mechanism for closing those jar files,
injecting a non-closing URLClassLoader into the hierarchy for
uncloseable files and only closing the child URLClassLoader which
does contain jar connections that should be closed.

This was also an attempt to fix jruby#3657 but it does not appear to
help yet.
@headius
Copy link
Member Author

headius commented Sep 21, 2020

There may be a renewed reason to use this patch: keeping these jar files open, even if they don't leak, means they will never get deleted on Windows. That has led to "leaking" these temp files as reported in #3657

@headius
Copy link
Member Author

headius commented Sep 22, 2020

After much work on trying to get temp files to delete on Windows, I've determined that it's largely an unsolvable problem.

Files can be deleted on Windows if opened with FILE_SHARE_DELETE flag, but most files opened by the JDK do not use this. In addition, we can't control files opened by a URLClassLoader because it's deep in the bowels of JDK code. Experiments to mark the jars as delete-on-exit have also not helped, since at the time of exit the jars are still held in a global cache.

So given that there's a larger, more complicated challenge on Windows, and our localized fix in #6273 seems to have addressed all reported issues, I'm going to close this. I do hope to see the JDK get a fix for it in the future, and we may explore whether we can turn off caching reliably on Java 9 and higher.

@headius headius closed this Sep 22, 2020
@headius headius deleted the avoid_url_openstream branch September 22, 2020 19:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant