Skip to content

Conversation

@Jille
Copy link

@Jille Jille commented Aug 10, 2024

Partial decoders can greatly benefit from being able to skip over (large) arrays and maps faster. Currently this process requires iterating through all (keys and) values of the map/array recursively, decoding all of them (at least, decoding enough of the data to decide on their length).

Extension types neatly include their size, which makes them easy to skip over because their size in bytes is known early.

Encoders can choose to ignore this extension completely. The implementation for decoders is fairly simply because it just requires skipping 2/3/4 or 6 bytes ahead in the input stream and continuing as normal.
Obviously, encoders using this should not be used with decoders that don't support it.

We've been using this technique in my company for ~6 months and it's been very effective in speeding up partial decoding of large, nested msgpack values.

We have a function that parses and rewrites msgpack to wrap arrays and maps with this extension type in places where we have write-once read many situations.

Given the goal of speeding up decoders, I propose to make it only legal to wrap arrays and maps as other types are cheap enough to decode the length of. Decoders could optionally support it, but encoders shouldn't wrap those. Future compatibility with this requirement is unlikely to be an issue, because future spec changes would likely be extensions which can already be skipped over cheaply.

Partial decoders can greatly benefit from being able to skip over
(large) arrays and maps faster. Currently this process requires
iterating through all (keys and) values of the map/array recursively,
decoding all of them (at least, decoding enough of the data to decide on
their length).

Extension types neatly include their size, which makes them easy to skip
over because their size in bytes is known early.

Encoders can choose to ignore this extension completely. The
implementation for decoders is fairly simply because it just requires
skipping 2/3/4 or 6 bytes ahead in the input stream and continuing as
normal.
Obviously, encoders using this should not be used with decoders that
don't support it.

We've been using this technique in my company for ~6 months and it's
been very effective in speeding up partial decoding of large, nested
msgpack values.

We have a function that parses and rewrites msgpack to wrap arrays and
maps with this extension type in places where we have write-once read
many situations.

Given the goal of speeding up decoders, I propose to make it only legal
to wrap arrays and maps as other types are cheap enough to decode the
length of. Decoders could optionally support it, but encoders shouldn't
wrap those. Future compatibility with this requirement is unlikely to be
an issue, because future spec changes would likely be extensions which
can already be skipped over cheaply.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant