ConcurrentLocalContextProvider leaks memory per thread

**Environment Information**
  
Provide at least:
* JRuby version: commit 6f9df83e (pom.xml says 9.4.10.0-SNAPSHOT)
* Operating system: Linux 6.1.0-26-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.1.112-1 (2024-09-30) x86_64 GNU/Linux

**Test Case**

[ThreadMemoryLeak.java](https://github.com/user-attachments/files/17716880/ThreadMemoryLeak.java.txt)

This code creates 10k threads, each of them executing a trivial amount of JRuby code, to simulate a heavily multi-threaded server application churning through threads. It then drops a heap dump for analysis.

Note: This simulates a use case seen in a production Cantaloupe server, which for some reason seems to create and terminate ~4 threads per minute in our case (I don#t understand Jetty's automatic thread pool management there). Because it keeps running for weeks at a time, it churns through tens of thousands of threads, accumulating tens of thousands of ~100kB `LocalContext` objects waiting to be cleaned up in `terminate()`. It eventually slows to a crawl due to memory starvation, or runs out of memory entirely and crashes.

**Expected Behavior**

JRuby creates a `LocalContext` for each thread, holds it in a `ThreadLocal`, and cleans it up when that particular thread has terminated.
The heap dump is a few MB and its size does not scale with the number of terminated threads.

The synthetic example doesn't actually have any concurrent threads; they all are more or less sequential. Thus it should be capable of running with a very small heap, no matter how high you crank the total number of threads that will be created and destroyed (line 36).

**Actual Behavior**

JRuby doesn't remove the `LocalContext` for each thread until the ScriptingContainer itself is disposed. In a long-running server application, that would be "never". In the example, it doesn't happen before the heap dump is written, at which point the JVM is essentially terminating.

The heap dump is hundreds of MB and scales with the number of threads that have been terminated. It is completely dominated by thousands of stale local contexts held by the ScriptingContainer's `ConcurrentLocalContextProvider`.

![biggest-object](https://github.com/user-attachments/assets/ae34c7a7-55d1-4a1e-a46b-492bf213ead2)

![accumulated-objects](https://github.com/user-attachments/assets/ace8d9eb-1434-4a15-b52d-92fa7c094e71)

Setting the number of threads to a high value causes massive heap consumption, and even the synthetic example will eventually run out of memory.

**Probable Cause**

`ConcurrentLocalContextProvider` creates a `LocalContext` for each thread, and only disposes of it in `terminate()`. I do not know the lifecycles of most objects involved, but evidently `termiante()` doesn't get called before JVM termination here. Because these objects are still reachable (via `ConcurrentLocalContextProvider`'s `contextRefs` member), they cannot be garbage-collected.

**Suggested Fix**

Hold a `Reference` (a `PhantomReference` should do, actually) on each thread that a `LocalContext` is created for. When the thread is terminated and becomes unreachable, that `Reference` will show up in its associated `ReferenceQueue`. A background service / cleaner thread can then watch that `ReferenceQueue` and call `remove()` on the relevant `LocalContext` objects. `terminate()` obviously would need to get rid of that service thread, and `remove()` the remaining `LocalContext`s.

Alternatively, the service thread could periodically scan the `contextRefs` array and `remove()` any contexts whose thread has died. This should clean up the context more quickly if anything is still holding a reference to the terminated thread, keeping it form being garbage collected.

Maybe it is also possible to piggy-back scanning onto some other operation, but I'm not sure what the performance impacts of that would be. Or what to do if that operation never happens.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

ConcurrentLocalContextProvider leaks memory per thread #8422

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

ConcurrentLocalContextProvider leaks memory per thread #8422

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions