Skip to content

Use of Batch adds duplicate, redundant gcloud-python/x.xx.x to User-Agent header in an unbounded fashion #565

@michio-nikaido

Description

@michio-nikaido

Overview

We first became aware of this issue when we began seeing "Error 413 (Request Entity Too Large)" responses to a variety of GCS requests after we implemented blob.reload() calls in a Batch context. We discovered after capturing the HTTP requests that the User-Agent headers for both the "outer" batch request and the contained requests were all filled with hundreds of 'gcloud-python/1.40.0' specifications. For example, we've seen a single batch request containing 51 reload requests with each User-Agent specification containing over 100 repetitions of 'gcloud-python/1.40.0'. Overall, this request had 5,300 occurrences of the same 'gcloud-python/1.40.0'

Analysis

When batch calls are made, a ' gcloud-python/x.xx.x ' is blindly appended to the underlying connection's _client_info.user_agent string:

https://github.com/googleapis/python-storage/blob/master/google/cloud/storage/_http.py#L59

As a result, successive batch calls result in the connection._client_info.user-agent string accumulating more and more, duplicate ' gcloud-python/x.xx.x ' specifications. If a Client object continues making batch requests, the user-agent property continues to grow without limit.

Environment details

  • OS type and version: Ubuntu 20.04
  • Python version: 3.7.11
  • pip version: 21.2.4
  • google-cloud-storage version: 1.42.0

Repro Script

from google.api_core import exceptions as g_exc
from google.cloud.storage.blob import Blob
from google.cloud.storage.client import Client

project = '<GCP PROJCT NAME>'
bucket_name = '<GCS BUCKET NAME>'

# Client.__init__ initializes self._connection which initiallizes a Connection object as defined in
# google.cloud.storage._http.py which automatically appends a 'gcloud-python/x.xx.x' spec to the
# the user_agent property
c = Client(project=project)
print(c._connection._client_info.user_agent)

bucket = c.get_bucket(bucket_name)
blobs = list(c.list_blobs(bucket))[:1]

# ----- First batch call adds duplicate user_agent spec
print("Before batch: "  + c._connection._client_info.user_agent)

try:
    # Creation of a Batch object results in another call which again appends another (duplicate) 
    # 'gcloud-python/x.xx.x' to the user_agent property.
    with c.batch()as batch:
        print("Batch context: " + batch._client_info.user_agent)
        for blob in blobs:
            blob.reload()
except g_exc.NotFound:
    pass
print("After batch: "  + c._connection._client_info.user_agent)

# ----- Subsequent batch calls continue to add more duplicates ad nauseam
for i in range(5):
    try:
        with c.batch()as batch:
            print(f"Iteration {i} - Batch context: " + batch._client_info.user_agent)
            for blob in blobs:
                blob.reload()
    except g_exc.NotFound:
        pass

Repro instructions

  1. Modify repro script to specify valid 'project' and 'bucket_name' values
  2. Execute script

Metadata

Metadata

Assignees

Labels

api: storageIssues related to the googleapis/python-storage API.priority: p1Important issue which blocks shipping the next release. Will be fixed prior to next release.priority: p2Moderately-important priority. Fix may not be included in next release.type: bugError or flaw in code with unintended results or allowing sub-optimal usage patterns.

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions