-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Description
Is your feature request related to a problem? Please describe.
A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]
The current entity_key serDe (version 2) is below:
def serialize_entity_key(
entity_key: EntityKeyProto, entity_key_serialization_version=1
) -> bytes:
"""
Serialize entity key to a bytestring so it can be used as a lookup key in a hash table.
We need this encoding to be stable; therefore we cannot just use protobuf serialization
here since it does not guarantee that two proto messages containing the same data will
serialize to the same byte string[1].
[1] https://developers.google.com/protocol-buffers/docs/encoding
"""
sorted_keys, sorted_values = zip(
*sorted(zip(entity_key.join_keys, entity_key.entity_values))
)
output: List[bytes] = []
for k in sorted_keys:
output.append(struct.pack("<I", ValueType.STRING))
output.append(k.encode("utf8"))
for v in sorted_values:
val_bytes, value_type = _serialize_val(
v.WhichOneof("val"),
v,
entity_key_serialization_version=entity_key_serialization_version,
)
output.append(struct.pack("<I", value_type))
output.append(struct.pack("<I", len(val_bytes)))
output.append(val_bytes)
return b"".join(output)
e.g, for sorted_keys = {tuple: 1} item_id and sorted_values = {tuple: 1} int64_val: 1\n will give output:
[b'\x02\x00\x00\x00', b'item_id', b'\x04\x00\x00\x00', b'\x08\x00\x00\x00', b'\x01\x00\x00\x00\x00\x00\x00\x00']
This makes deserialization not doable. In order to deserialize we can append the "length" of value to the join_key, such as for the same test key and value we can get the output:
[b'\x02\x00\x00\x00', b'\x07\x00\x00\x00', b'item_id', b'\x04\x00\x00\x00', b'\x08\x00\x00\x00', b'\x01\x00\x00\x00\x00\x00\x00\x00']
Then we can deserialize the bytes to proto.
Describe the solution you'd like
A clear and concise description of what you want to happen.
Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.
Additional context
Add any other context or screenshots about the feature request here.