Skip to content

Optimize Memory Usage by Interning Frequently Occurring Strings in JSON#1585

Merged
DzmitryFomchyn merged 2 commits into
mainfrom
df-json-deserialisation-optimisation
Jul 8, 2024
Merged

Optimize Memory Usage by Interning Frequently Occurring Strings in JSON#1585
DzmitryFomchyn merged 2 commits into
mainfrom
df-json-deserialisation-optimisation

Conversation

@DzmitryFomchyn

@DzmitryFomchyn DzmitryFomchyn commented Jul 6, 2024

Copy link
Copy Markdown
Contributor

https://mapbox.atlassian.net/browse/NAVAND-3254

In the directions response format, there are string elements that can have a limited number of values but frequently occur in the JSON response. Some of them are: IntersectionLanes.indications(), IntersectionLanes.validIndication(), StepManeuver.type(), and StepManeuver.modifier(). The GSON parser does not support string interning, which leads to new string allocations every time. Consequently, the parsed route object contains many duplicate strings. In this PR, we introduce custom TypeAdapters that intern selected strings to save memory.

Some analysis for IntersectionLanes objects:

  • Long route from Minsk to Oslo, through Munich, Stuttgart, and Paris (4.5K km)
    • 7561 indications, 4773 valid indications, only 6 unique strings.
    • Approximately 567 KB of memory saved.
  • Munich to Stuttgart (232 km)
    • 500 indications, 295 valid indications, only 5 unique strings.
    • Approximately 36 KB of memory saved.
  • Short route inside Stuttgart (6 km)
    • 126 indications, 42 valid indications, only 5 unique strings.
    • Approximately 7 KB of memory saved.

@DzmitryFomchyn DzmitryFomchyn requested a review from a team as a code owner July 6, 2024 19:57
@dmitry-novikov

Copy link
Copy Markdown

Is it possible to configure GSON to use String.intern() by default?

@DzmitryFomchyn

Copy link
Copy Markdown
Contributor Author

Is it possible to configure GSON to use String.intern() by default?

Unfortunately, no, GSON doesn't support it. And actually, we don't want every string to be interned because many fields have arbitrary data (not like enum values) and it doesn't make sense to intern value that exists only once, this will degrade performance.

Ideal solution would be to choose which fields we want to intern, for example, by marking them with annotation like @StringIntern

  @StringIntern
  @SerializedName("valid_indication")
  public abstract String validIndication();

But it's not possible without patching GSON SDK and I'm not sure if it makes sense to do it now given our pans on changing parsing mechanism.

@DzmitryFomchyn DzmitryFomchyn merged commit 05a7bfe into main Jul 8, 2024
@DzmitryFomchyn DzmitryFomchyn deleted the df-json-deserialisation-optimisation branch July 8, 2024 10:53
@vadzim-vys

Copy link
Copy Markdown
Contributor

But it's not possible without patching GSON SDK and I'm not sure if it makes sense to do it now given our pans on changing parsing mechanism.

It's should be possible. Serialization process is controlled by code generated with mapbox fork of auto value gson library - https://github.com/mapbox/auto-value-gson, I think we should be able to add support of @StringIntern there.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants