-
Notifications
You must be signed in to change notification settings - Fork 1.6k
Closed
Labels
api: bigtableIssues related to the Bigtable API.Issues related to the Bigtable API.performancetype: feature request‘Nice-to-have’ improvement, new feature or different behavior or design.‘Nice-to-have’ improvement, new feature or different behavior or design.
Description
The current signature on the Bigtable Table read_row() method has a default of filter_=None:
def read_row(self, row_key, filter_=None):
...In cases where a cell value may have been updated multiple times, the default will be to return the full time series with timestamps for each value which can slow down read performance in a non-obvious way.
In the current Python API the cells() method on row_data (PartialRowData) makes a deep copy of the cells, which compounds the performance issue.
@property
def cells(self):
"""Property returning all the cells accumulated on this partial row.
:rtype: dict
:returns: Dictionary of the :class:`Cell` objects accumulated. This
dictionary has two-levels of keys (first for column families
and second for column names/qualifiers within a family). For
a given column, a list of :class:`Cell` objects is stored.
"""
return copy.deepcopy(self._cells)Consider:
- Making a default filter on
read_row()to retrieve only the most recent value of any cell unless the full or partial time series is requested. - Allowing a
ColumnFamilyto implicitly or explicitly limit cells to only one value (no timeseries). - Adding a
cell_value(column_family_id, column, index=0)method torow_data(PartialRowData) to allow more efficient retrieval of a single cell value.
Metadata
Metadata
Assignees
Labels
api: bigtableIssues related to the Bigtable API.Issues related to the Bigtable API.performancetype: feature request‘Nice-to-have’ improvement, new feature or different behavior or design.‘Nice-to-have’ improvement, new feature or different behavior or design.