-
Notifications
You must be signed in to change notification settings - Fork 167
Description
Overview
We first became aware of this issue when we began seeing "Error 413 (Request Entity Too Large)" responses to a variety of GCS requests after we implemented blob.reload() calls in a Batch context. We discovered after capturing the HTTP requests that the User-Agent headers for both the "outer" batch request and the contained requests were all filled with hundreds of 'gcloud-python/1.40.0' specifications. For example, we've seen a single batch request containing 51 reload requests with each User-Agent specification containing over 100 repetitions of 'gcloud-python/1.40.0'. Overall, this request had 5,300 occurrences of the same 'gcloud-python/1.40.0'
Analysis
When batch calls are made, a ' gcloud-python/x.xx.x ' is blindly appended to the underlying connection's _client_info.user_agent string:
https://github.com/googleapis/python-storage/blob/master/google/cloud/storage/_http.py#L59
As a result, successive batch calls result in the connection._client_info.user-agent string accumulating more and more, duplicate ' gcloud-python/x.xx.x ' specifications. If a Client object continues making batch requests, the user-agent property continues to grow without limit.
Environment details
- OS type and version: Ubuntu 20.04
- Python version: 3.7.11
- pip version: 21.2.4
google-cloud-storageversion: 1.42.0
Repro Script
from google.api_core import exceptions as g_exc
from google.cloud.storage.blob import Blob
from google.cloud.storage.client import Client
project = '<GCP PROJCT NAME>'
bucket_name = '<GCS BUCKET NAME>'
# Client.__init__ initializes self._connection which initiallizes a Connection object as defined in
# google.cloud.storage._http.py which automatically appends a 'gcloud-python/x.xx.x' spec to the
# the user_agent property
c = Client(project=project)
print(c._connection._client_info.user_agent)
bucket = c.get_bucket(bucket_name)
blobs = list(c.list_blobs(bucket))[:1]
# ----- First batch call adds duplicate user_agent spec
print("Before batch: " + c._connection._client_info.user_agent)
try:
# Creation of a Batch object results in another call which again appends another (duplicate)
# 'gcloud-python/x.xx.x' to the user_agent property.
with c.batch()as batch:
print("Batch context: " + batch._client_info.user_agent)
for blob in blobs:
blob.reload()
except g_exc.NotFound:
pass
print("After batch: " + c._connection._client_info.user_agent)
# ----- Subsequent batch calls continue to add more duplicates ad nauseam
for i in range(5):
try:
with c.batch()as batch:
print(f"Iteration {i} - Batch context: " + batch._client_info.user_agent)
for blob in blobs:
blob.reload()
except g_exc.NotFound:
pass
Repro instructions
- Modify repro script to specify valid 'project' and 'bucket_name' values
- Execute script