Introduce Length Encoding extension as -2 #338
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Partial decoders can greatly benefit from being able to skip over (large) arrays and maps faster. Currently this process requires iterating through all (keys and) values of the map/array recursively, decoding all of them (at least, decoding enough of the data to decide on their length).
Extension types neatly include their size, which makes them easy to skip over because their size in bytes is known early.
Encoders can choose to ignore this extension completely. The implementation for decoders is fairly simply because it just requires skipping 2/3/4 or 6 bytes ahead in the input stream and continuing as normal.
Obviously, encoders using this should not be used with decoders that don't support it.
We've been using this technique in my company for ~6 months and it's been very effective in speeding up partial decoding of large, nested msgpack values.
We have a function that parses and rewrites msgpack to wrap arrays and maps with this extension type in places where we have write-once read many situations.
Given the goal of speeding up decoders, I propose to make it only legal to wrap arrays and maps as other types are cheap enough to decode the length of. Decoders could optionally support it, but encoders shouldn't wrap those. Future compatibility with this requirement is unlikely to be an issue, because future spec changes would likely be extensions which can already be skipped over cheaply.