Skip to content

Bug: StreamedResultSet double-encodes merged byte chunks after querying, throwing error #234

@harmon

Description

@harmon

When a "bytes" field is split across chunks when querying it, the result iterator code merges the strings, then will try to encode the value from str to bytes twice, causing an error on the second attempt. The solution we found is to not parse it after merging the chunks, since it happens again on all merged values anyway.

Here's the problematic line:

return _parse_value(merged, field.type_)

Environment details

  • OS type and version: MacOSX 10.15.6
  • Python version: Python 3.8.6
  • pip version: pip 20.2.1
  • google-cloud-spanner version: 3.0.0

Steps to reproduce

  1. Create table with the following schema:
CREATE TABLE Test (id STRING(36) NOT NULL, megafield BYTES(MAX)) PRIMARY KEY (id)
  1. Run the code sample below to trigger the expection

Code example

"""
CREATE TABLE Test (id STRING(36) NOT NULL, megafield BYTES(MAX)) PRIMARY KEY (id)
"""

import base64
from google.cloud import spanner
from google.auth.credentials import AnonymousCredentials

###################################
# HOTFIX
###################################
from google.cloud.spanner_v1.streamed import StreamedResultSet, _merge_by_type

def _merge_chunk(self, value):
    """Merge pending chunk with next value.

    :type value: :class:`~google.protobuf.struct_pb2.Value`
    :param value: continuation of chunked value from previous
                  partial result set.

    :rtype: :class:`~google.protobuf.struct_pb2.Value`
    :returns: the merged value
    """
    current_column = len(self._current_row)
    field = self.fields[current_column]
    merged = _merge_by_type(self._pending_chunk, value, field.type_)
    self._pending_chunk = None
    # Bug fix:
    return merged  #_parse_value(merged, field.type_)

# Uncomment this to fix the bug:
# StreamedResultSet._merge_chunk = _merge_chunk
###################################
# END OF HOTFIX
###################################

instance_id = 'test'
database_id = 'test-db'

spanner_client = spanner.Client(
    project='test',
    client_options={"api_endpoint": 'localhost:9010'},
    credentials=AnonymousCredentials()
)

instance = spanner_client.instance(instance_id)
database = instance.database(database_id)

# This must be large enough that the SDK will split the megafield payload across two query chunks
# and try to recombine them, causing the error:
data = base64.standard_b64encode(("a" * 1000000).encode("utf8"))

with database.batch() as batch:
    batch.insert(
        table="Test",
        columns=("id", "megafield"),
        values=[
            (1, data),
        ],
    )

with database.snapshot() as snapshot:
    results = snapshot.execute_sql(
        "SELECT * FROM Test"
    )

    for row in results:
        print("Id: ", row[0])
        print("Megafield: ", row[1][:100])

Stack trace

Traceback (most recent call last):
  File "/Users/user1/Code/test.py", line 55, in <module>
    for row in results:
  File "/Users/user1/.pyenv/versions/project-3.8.6/lib/python3.8/site-packages/google/cloud/spanner_v1/streamed.py", line 139, in __iter__
    self._consume_next()
  File "/Users/user1/.pyenv/versions/project-3.8.6/lib/python3.8/site-packages/google/cloud/spanner_v1/streamed.py", line 132, in _consume_next
    self._merge_values(values)
  File "/Users/user1/.pyenv/versions/project-3.8.6/lib/python3.8/site-packages/google/cloud/spanner_v1/streamed.py", line 103, in _merge_values
    self._current_row.append(_parse_value(value, field.type_))
  File "/Users/user1/.pyenv/versions/project-3.8.6/lib/python3.8/site-packages/google/cloud/spanner_v1/_helpers.py", line 170, in _parse_value
    result = value.encode("utf8")
AttributeError: 'bytes' object has no attribute 'encode'

Metadata

Metadata

Assignees

Labels

api: spannerIssues related to the googleapis/python-spanner API.priority: p2Moderately-important priority. Fix may not be included in next release.type: bugError or flaw in code with unintended results or allowing sub-optimal usage patterns.

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions