Bug: StreamedResultSet double-encodes merged byte chunks after querying, throwing error

When a "bytes" field is split across chunks when querying it, the result iterator code merges the strings, then will try to encode the value from str to bytes twice, causing an error on the second attempt. The solution we found is to not parse it after merging the chunks, since it happens again on all merged values anyway.

Here's the problematic line: 

https://github.com/googleapis/python-spanner/blob/1f80a395ff4ac51a785ea6bdfbe3e18a8cf4c951/google/cloud/spanner_v1/streamed.py#L91

#### Environment details

  - OS type and version: MacOSX 10.15.6
  - Python version: Python 3.8.6
  - pip version: pip 20.2.1
  - `google-cloud-spanner` version: 3.0.0

#### Steps to reproduce

1. Create table with the following schema:

```
CREATE TABLE Test (id STRING(36) NOT NULL, megafield BYTES(MAX)) PRIMARY KEY (id)
```
2. Run the code sample below to trigger the expection


#### Code example

```python
"""
CREATE TABLE Test (id STRING(36) NOT NULL, megafield BYTES(MAX)) PRIMARY KEY (id)
"""

import base64
from google.cloud import spanner
from google.auth.credentials import AnonymousCredentials

###################################
# HOTFIX
###################################
from google.cloud.spanner_v1.streamed import StreamedResultSet, _merge_by_type

def _merge_chunk(self, value):
    """Merge pending chunk with next value.

    :type value: :class:`~google.protobuf.struct_pb2.Value`
    :param value: continuation of chunked value from previous
                  partial result set.

    :rtype: :class:`~google.protobuf.struct_pb2.Value`
    :returns: the merged value
    """
    current_column = len(self._current_row)
    field = self.fields[current_column]
    merged = _merge_by_type(self._pending_chunk, value, field.type_)
    self._pending_chunk = None
    # Bug fix:
    return merged  #_parse_value(merged, field.type_)

# Uncomment this to fix the bug:
# StreamedResultSet._merge_chunk = _merge_chunk
###################################
# END OF HOTFIX
###################################

instance_id = 'test'
database_id = 'test-db'

spanner_client = spanner.Client(
    project='test',
    client_options={"api_endpoint": 'localhost:9010'},
    credentials=AnonymousCredentials()
)

instance = spanner_client.instance(instance_id)
database = instance.database(database_id)

# This must be large enough that the SDK will split the megafield payload across two query chunks
# and try to recombine them, causing the error:
data = base64.standard_b64encode(("a" * 1000000).encode("utf8"))

with database.batch() as batch:
    batch.insert(
        table="Test",
        columns=("id", "megafield"),
        values=[
            (1, data),
        ],
    )

with database.snapshot() as snapshot:
    results = snapshot.execute_sql(
        "SELECT * FROM Test"
    )

    for row in results:
        print("Id: ", row[0])
        print("Megafield: ", row[1][:100])
```

#### Stack trace
```
Traceback (most recent call last):
  File "/Users/user1/Code/test.py", line 55, in <module>
    for row in results:
  File "/Users/user1/.pyenv/versions/project-3.8.6/lib/python3.8/site-packages/google/cloud/spanner_v1/streamed.py", line 139, in __iter__
    self._consume_next()
  File "/Users/user1/.pyenv/versions/project-3.8.6/lib/python3.8/site-packages/google/cloud/spanner_v1/streamed.py", line 132, in _consume_next
    self._merge_values(values)
  File "/Users/user1/.pyenv/versions/project-3.8.6/lib/python3.8/site-packages/google/cloud/spanner_v1/streamed.py", line 103, in _merge_values
    self._current_row.append(_parse_value(value, field.type_))
  File "/Users/user1/.pyenv/versions/project-3.8.6/lib/python3.8/site-packages/google/cloud/spanner_v1/_helpers.py", line 170, in _parse_value
    result = value.encode("utf8")
AttributeError: 'bytes' object has no attribute 'encode'
```



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Bug: StreamedResultSet double-encodes merged byte chunks after querying, throwing error #234

Environment details

Steps to reproduce

Code example

Stack trace

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Bug: StreamedResultSet double-encodes merged byte chunks after querying, throwing error #234

Description

Environment details

Steps to reproduce

Code example

Stack trace

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions