Skip to content

BigTable: On read_row(), provide default to retrieve only most recent cell values #4468

@zakons

Description

@zakons

The current signature on the Bigtable Table read_row() method has a default of filter_=None:

def read_row(self, row_key, filter_=None):
    ...

In cases where a cell value may have been updated multiple times, the default will be to return the full time series with timestamps for each value which can slow down read performance in a non-obvious way.

In the current Python API the cells() method on row_data (PartialRowData) makes a deep copy of the cells, which compounds the performance issue.

@property
def cells(self):
    """Property returning all the cells accumulated on this partial row.

    :rtype: dict
    :returns: Dictionary of the :class:`Cell` objects accumulated. This
              dictionary has two-levels of keys (first for column families
              and second for column names/qualifiers within a family). For
              a given column, a list of :class:`Cell` objects is stored.
    """
    return copy.deepcopy(self._cells)

Consider:

  1. Making a default filter on read_row() to retrieve only the most recent value of any cell unless the full or partial time series is requested.
  2. Allowing a ColumnFamily to implicitly or explicitly limit cells to only one value (no timeseries).
  3. Adding a cell_value(column_family_id, column, index=0) method to row_data (PartialRowData) to allow more efficient retrieval of a single cell value.

Metadata

Metadata

Assignees

Labels

api: bigtableIssues related to the Bigtable API.performancetype: feature request‘Nice-to-have’ improvement, new feature or different behavior or design.

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions