Skip to content

Feat: New version of entity_key serDe #4283

@HaoXuAI

Description

@HaoXuAI

Is your feature request related to a problem? Please describe.
A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]

The current entity_key serDe (version 2) is below:

def serialize_entity_key(
    entity_key: EntityKeyProto, entity_key_serialization_version=1
) -> bytes:
    """
    Serialize entity key to a bytestring so it can be used as a lookup key in a hash table.

    We need this encoding to be stable; therefore we cannot just use protobuf serialization
    here since it does not guarantee that two proto messages containing the same data will
    serialize to the same byte string[1].

    [1] https://developers.google.com/protocol-buffers/docs/encoding
    """
    sorted_keys, sorted_values = zip(
        *sorted(zip(entity_key.join_keys, entity_key.entity_values))
    )

    output: List[bytes] = []
    for k in sorted_keys:
        output.append(struct.pack("<I", ValueType.STRING))
        output.append(k.encode("utf8"))
    for v in sorted_values:
        val_bytes, value_type = _serialize_val(
            v.WhichOneof("val"),
            v,
            entity_key_serialization_version=entity_key_serialization_version,
        )

        output.append(struct.pack("<I", value_type))

        output.append(struct.pack("<I", len(val_bytes)))
        output.append(val_bytes)

    return b"".join(output)

e.g, for sorted_keys = {tuple: 1} item_id and sorted_values = {tuple: 1} int64_val: 1\n will give output:
[b'\x02\x00\x00\x00', b'item_id', b'\x04\x00\x00\x00', b'\x08\x00\x00\x00', b'\x01\x00\x00\x00\x00\x00\x00\x00']

This makes deserialization not doable. In order to deserialize we can append the "length" of value to the join_key, such as for the same test key and value we can get the output:
[b'\x02\x00\x00\x00', b'\x07\x00\x00\x00', b'item_id', b'\x04\x00\x00\x00', b'\x08\x00\x00\x00', b'\x01\x00\x00\x00\x00\x00\x00\x00']

Then we can deserialize the bytes to proto.

Describe the solution you'd like
A clear and concise description of what you want to happen.

Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.

Additional context
Add any other context or screenshots about the feature request here.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions