Skip to content

Conversation

@gkevinzheng
Copy link
Contributor

@gkevinzheng gkevinzheng commented Oct 14, 2025

Additional system tests for testing shim compatibility with the classic client interface.

Also fixes a bug where encountering a retriable error after another retriable error mid-stream raised an exception instead of retrying.

Fixes #1156

@gkevinzheng gkevinzheng requested review from a team as code owners October 14, 2025 18:50
@product-auto-label product-auto-label bot added the size: l Pull request size is large. label Oct 14, 2025
@product-auto-label product-auto-label bot added the api: bigtable Issues related to the googleapis/python-bigtable API. label Oct 14, 2025
@gkevinzheng gkevinzheng changed the base branch from main to v3_staging October 15, 2025 14:40
Copy link
Contributor

@daniel-sanche daniel-sanche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple comments for now, but I'll look at the tests in more detail soon

env_vars: {
key: "NOX_SESSION"
value: "system-3.12"
} No newline at end of file
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I pulled in main, so this shouldn't be necessary anymore

self._row_merger = _RowMerger(self._row_merger.last_seen_row_key)
self.response_iterator = self.read_method(retry_request)
self.response_iterator = self.read_method(retry_request, retry=self.retry)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why was this change needed?

Copy link
Contributor Author

@gkevinzheng gkevinzheng Oct 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change was needed because the read_rows method that gets passed in gets called with the GAPIC default retry instead of the retry that the user passed in or DEFAULT_READ_RETRY for this instance. If there's an error with this specific instance of self.read_method it doesn't get retried and errors out.

For example, if a retriable InternalError occurred mid-stream, then in this on_error handler the same error occurred, the error in the on_error handler would not be retried because the specific retry was not passed in for that RPC call.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit torn, because this does feel like a very real bug. But the code here is also very complex, so I'm hesitant to make changes to the retry policy in case there are side-effects we're not considering

But I guess this is coming in the context of adding more system tests. Do you think the tests we have now are sufficient to catch any errors?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, on second thought, this is a PR into the v3_staging branch, not main. We will be doing much more extensive changes here as part of the shim, before cutting a release. So I think adding this fix makes sense

class SelectiveMethodsErrorInjector(UnaryStreamClientInterceptor):
def __init__(self):
# As long as there are more items on this list, the items on the list
# are as follows:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume after the list is exhausted, it always behaves as normal?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it should. I'll put a comment for that.

data_table, rows_to_delete, row_keys, columns, CELL_VAL_READ_ROWS_RETRY
)

yield data_table
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this re-populating the table before each test function? Is that necessary?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me refactor the read_rows system tests to reconfigure the error injector instead of re-populating the table before each test function.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit confused by the organization here. Why are some tests in a class, and others stand-alone? Do we need a separate fixture to populate the table? Do we really want to re-populate the table more than once per run?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe take a look at the metrics tests, and see if you can structure things that way. Instead of building a new data_table_read_rows_with_error_injector for each function, you can build it once and clear its state between functions



def _populate_table(data_table, rows_to_delete, row_keys):
def _add_test_error_handler(retry):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a docstring could be helpful here

It looks like it overrides the on_error function to assert that backoff values are within expected bounds?

self._row_merger = _RowMerger(self._row_merger.last_seen_row_key)
self.response_iterator = self.read_method(retry_request)
self.response_iterator = self.read_method(retry_request, retry=self.retry)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit torn, because this does feel like a very real bug. But the code here is also very complex, so I'm hesitant to make changes to the retry policy in case there are side-effects we're not considering

But I guess this is coming in the context of adding more system tests. Do you think the tests we have now are sufficient to catch any errors?

self._row_merger = _RowMerger(self._row_merger.last_seen_row_key)
self.response_iterator = self.read_method(retry_request)
self.response_iterator = self.read_method(retry_request, retry=self.retry)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, on second thought, this is a PR into the v3_staging branch, not main. We will be doing much more extensive changes here as part of the shim, before cutting a release. So I think adding this fix makes sense

data_table, rows_to_delete, row_keys, columns, CELL_VAL_READ_ROWS_RETRY
)

yield data_table
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit confused by the organization here. Why are some tests in a class, and others stand-alone? Do we need a separate fixture to populate the table? Do we really want to re-populate the table more than once per run?

for row in rows_to_delete:
row.clear()
row.delete()
row.commit()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should use a try...finally block to make sure resources are fully deleted, anywhere you spin up new resources. (Especially for clusters and tables)

data_table, rows_to_delete, row_keys, columns, CELL_VAL_READ_ROWS_RETRY
)

yield data_table
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe take a look at the metrics tests, and see if you can structure things that way. Instead of building a new data_table_read_rows_with_error_injector for each function, you can build it once and clear its state between functions

Copy link
Contributor

@daniel-sanche daniel-sanche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@gkevinzheng gkevinzheng merged commit ca71261 into v3_staging Nov 20, 2025
7 of 9 checks passed
@gkevinzheng gkevinzheng deleted the system-tests-legacy branch November 20, 2025 14:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

api: bigtable Issues related to the googleapis/python-bigtable API. size: l Pull request size is large.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants