Skip to content

Possible memory leak when using Google Sheets API #535

@mrc0mmand

Description

@mrc0mmand

There seems to be a memory leak when using the google-api-client with GSheets.

Environment:

$ python --version
Python 3.6.4
$ pip show google-api-python-client
Name: google-api-python-client
Version: 1.7.3

Here's a simple reproducer (without a .client_secret.json):

#!/usr/bin/env python3

import httplib2
import os
from apiclient import discovery
from memory_profiler import profile
from oauth2client import client, tools
from oauth2client.file import Storage
from time import sleep

SCOPES = "https://www.googleapis.com/auth/spreadsheets.readonly"
# See https://cloud.google.com/docs/authentication/getting-started
CLIENT_SECRET_FILE = ".client_secret.json"
APPLICATION_NAME = "ClientDebug"
DISCOVERY_URL = "https://sheets.googleapis.com/$discovery/rest?version=v4"

def get_credentials():
    home_dir = os.path.expanduser("~")
    credential_dir = os.path.join(home_dir, ".credentials")
    flags = None
    if not os.path.exists(credential_dir):
        os.makedirs(credential_dir)
    credential_path = os.path.join(credential_dir,
                                "sheets.googleapis.com-clientdebug.json")

    store = Storage(credential_path)
    credentials = store.get()
    if not credentials or credentials.invalid:
        flow = client.flow_from_clientsecrets(CLIENT_SECRET_FILE, SCOPES)
        flow.user_agent = APPLICATION_NAME
        credentials = tools.run_flow(flow, store, flags)

    return credentials

@profile(precision=4)
def get_responses(creds):
    """Fetch spreadsheet data."""
    sheet_id = "1TowKJrFVbT4Bfp-HFcMh_CZ5anfH0CLfmoqCz9SUr9c"

    http = creds.authorize(httplib2.Http())
    service = discovery.build("sheets", "v4", http=http,
                discoveryServiceUrl=(DISCOVERY_URL), cache_discovery=False)
    result = service.spreadsheets().values().get(
            spreadsheetId=sheet_id, range="A1:O").execute()
    values = result.get("values", [])

    print("Got {} rows".format(len(values)))

if __name__ == "__main__":
    creds = get_credentials()
    for i in range(0, 50):
        get_responses(creds)
        sleep(2)

For measurements I used memory_profiler module with following results:

First and second iteration

Got 760 rows
Filename: ./main.py

Line #    Mem usage    Increment   Line Contents
================================================
    35  26.5195 MiB  26.5195 MiB   @profile(precision=4)
    36                             def get_responses(creds):
    37                                 """Fetch spreadsheet data."""
    38  26.5195 MiB   0.0000 MiB       sheet_id = "1TowKJrFVbT4Bfp-HFcMh_CZ5anfH0CLfmoqCz9SUr9c"
    39                             
    40  26.5195 MiB   0.0000 MiB       http = creds.authorize(httplib2.Http())
    41  26.5195 MiB   0.0000 MiB       service = discovery.build("sheets", "v4", http=http,
    42  29.2891 MiB   2.7695 MiB                   discoveryServiceUrl=(DISCOVERY_URL), cache_discovery=False)
    43  49.5742 MiB  20.2852 MiB       result = service.spreadsheets().values().get(
    44  49.5742 MiB   0.0000 MiB               spreadsheetId=sheet_id, range="A1:O").execute()
    45  49.5742 MiB   0.0000 MiB       values = result.get("values", [])
    46                             
    47  49.5742 MiB   0.0000 MiB       print("Got {} rows".format(len(values)))


Got 760 rows
Filename: ./main.py

Line #    Mem usage    Increment   Line Contents
================================================
    35  49.5742 MiB  49.5742 MiB   @profile(precision=4)
    36                             def get_responses(creds):
    37                                 """Fetch spreadsheet data."""
    38  49.5742 MiB   0.0000 MiB       sheet_id = "1TowKJrFVbT4Bfp-HFcMh_CZ5anfH0CLfmoqCz9SUr9c"
    39                             
    40  49.5742 MiB   0.0000 MiB       http = creds.authorize(httplib2.Http())
    41  49.5742 MiB   0.0000 MiB       service = discovery.build("sheets", "v4", http=http,
    42  49.5742 MiB   0.0000 MiB                   discoveryServiceUrl=(DISCOVERY_URL), cache_discovery=False)
    43  67.9922 MiB  18.4180 MiB       result = service.spreadsheets().values().get(
    44  67.9922 MiB   0.0000 MiB               spreadsheetId=sheet_id, range="A1:O").execute()
    45  67.9922 MiB   0.0000 MiB       values = result.get("values", [])
    46                             
    47  67.9922 MiB   0.0000 MiB       print("Got {} rows".format(len(values)))

Last iteration

Got 760 rows
Filename: ./main.py

Line #    Mem usage    Increment   Line Contents
================================================
    35 229.6055 MiB 229.6055 MiB   @profile(precision=4)
    36                             def get_responses(creds):
    37                                 """Fetch spreadsheet data."""
    38 229.6055 MiB   0.0000 MiB       sheet_id = "1TowKJrFVbT4Bfp-HFcMh_CZ5anfH0CLfmoqCz9SUr9c"
    39                             
    40 229.6055 MiB   0.0000 MiB       http = creds.authorize(httplib2.Http())
    41 229.6055 MiB   0.0000 MiB       service = discovery.build("sheets", "v4", http=http,
    42 229.6055 MiB   0.0000 MiB                   discoveryServiceUrl=(DISCOVERY_URL), cache_discovery=False)
    43 229.6055 MiB   0.0000 MiB       result = service.spreadsheets().values().get(
    44 229.6055 MiB   0.0000 MiB               spreadsheetId=sheet_id, range="A1:O").execute()
    45 229.6055 MiB   0.0000 MiB       values = result.get("values", [])
    46                             
    47 229.6055 MiB   0.0000 MiB       print("Got {} rows".format(len(values)))

There's clearly a memory leak, as the reproducer fetches the same data over and over again, yet the memory consumption keeps rising. Full log can be found here.

As a temporary workaround for one of my long-running applications I use an explicit garbage collector call, which mitigates this issue, at least for now:

...
import gc
...
    result = service.spreadsheets().values().get(
            spreadsheetId=sheet_id, range="A1:O").execute()
    values = result.get("values", [])

    gc.collect()
...

I went a little deeper, and the main culprit seems to be in the createMethod function when creating dynamic method batchUpdate:

Method 'batchUpdate, approx. __doc__ size: 2886834                                                                                                                                             
<class 'function'>                                                                                                                                                            
Filename: /home/fsumsal/venv/googleapiclient/lib64/python3.6/site-packages/googleapiclient/discovery.py                                                                       
                                                                                                                                                                              
Line #    Mem usage    Increment   Line Contents                                                                                                                              
================================================                                                                                                                              
  1064     48.7 MiB     48.7 MiB     @profile                                                                                                                                 
  1065                               def _add_basic_methods(self, resourceDesc, rootDesc, schema):                                                                            
  1066                                 # If this is the root Resource, add a new_batch_http_request() method.                                                                 
  1067     48.7 MiB      0.0 MiB       if resourceDesc == rootDesc:                                                                                                           
...                          
  1086                                                                                                                                                                        
  1087                                 # Add basic methods to Resource                                                                                                        
  1088     48.7 MiB      0.0 MiB       if 'methods' in resourceDesc:                                                                                                          
  1089     66.8 MiB      0.0 MiB         for methodName, methodDesc in six.iteritems(resourceDesc['methods']):                                                                
  1090     56.0 MiB      0.0 MiB           fixedMethodName, method = createMethod(                                                                                            
  1091     66.8 MiB     18.1 MiB               methodName, methodDesc, rootDesc, schema)                                                                                      
  1092     66.8 MiB      0.0 MiB           print(type(method))                                                                                                                
  1093     66.8 MiB      0.0 MiB           self._set_dynamic_attr(fixedMethodName,             
  1094     66.8 MiB      0.0 MiB                                  method.__get__(self, self.__class__))                                                                                        
  1095                                     # Add in _media methods. The functionality of the attached method will                                                                              
  1096                                     # change when it sees that the method name ends in _media.                                                                                          
  1097     66.8 MiB      0.0 MiB           if methodDesc.get('supportsMediaDownload', False):  
  1098                                       fixedMethodName, method = createMethod(           
  1099                                           methodName + '_media', methodDesc, rootDesc, schema)                                                                                          
  1100                                       self._set_dynamic_attr(fixedMethodName,           
  1101                                                              method.__get__(self, self.__class__))

(This method has a huge docstring.)

Nevertheless, there is probably a reference loop somewhere, as the gc.collect() call manages to collect all those unreachable objects.

Metadata

Metadata

Assignees

Labels

api: sheetsIssues related to the Sheets API API.status: blockedResolving the issue is dependent on other work.type: feature request‘Nice-to-have’ improvement, new feature or different behavior or design.

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions