Small string values backed by HUGE non-shared ByteList leading to large memory usage under 9k

After working through this problem a bit in IRC, we've come up with a fairly simple reproduction script here:

https://gist.github.com/bbrowning/90c2296e048fb806b4a1

On JRuby 9k, that script will OOM with the default 512MB heap. Under JRuby 1.7.20, I can run it with only a 16MB heap without issues. This script basically parses a string and creates an array of arrays with the parsed data. You end up with something like:

```
[[:MSGID, msgid],[:STRING, "Translation 1"],[:MSGID, msgid],[:STRING, "Translation 2"]...]
```

When looking at a heap dump, those "Translation X" strings are RubyStrings backed by unique ByteLists where each ByteList has a byte[] array containing the entirety of remaining `str` and then an offset and length pointing to just a tiny portion of the byte array.

So, it's like the ByteLists think they're being shared here but they actually are not being shared, ending up with multiple copies of very large ByteLists. They either need to be shared or not shared but pruned to not contain the excess string data.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Small string values backed by HUGE non-shared ByteList leading to large memory usage under 9k #3019

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Small string values backed by HUGE non-shared ByteList leading to large memory usage under 9k #3019

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions