Skip to content

Conversation

@tseaver
Copy link
Contributor

@tseaver tseaver commented Aug 3, 2016

Toward #2042.

This PR tries to rationalize the retries in our system tests into three types:

  • Retry methods which raise particular exceptions (the class @daspecster implemented already)
  • Retry methods until the result passes a predicate (generalizing the backoff stuff in logging_).
  • Retry methods until a predicate on the instance's __self__ passes (e.g., until job.state == "done").

The PR adds classes for each of these categories, and attempts to use them where appropriate, replacing manual time.sleep() calls and existing "backoff" functions / loops.

It also tweaks the tox -e system-tests runner to attempt all system test modules, even if one fails, and tries to run them in order from least to most flake.

BTW, cannot figure out how to get the translate tests to run at all, and there is one of the error_reporting tests which blows up at me.

@daspecster
Copy link
Contributor

@tseaver nice! Did you happen to try this on any of the pubsub failures? I tried retrying them last night but they still failed.

@dhermes
Copy link
Contributor

dhermes commented Aug 3, 2016

Lint failure

system_tests/retry.py:163:9: E301 expected 1 blank line, found 0

from run_system_test import run_module_tests


MODULES = (

This comment was marked as spam.

This comment was marked as spam.

@dhermes
Copy link
Contributor

dhermes commented Aug 3, 2016

Really great work!

How did you decide which flaky test cases to target?

@tseaver
Copy link
Contributor Author

tseaver commented Aug 4, 2016

@dhermes Thanks! I was tackling those which fail routinely, first on my box and then on Travis. I imagine we haven't whacked the last mole yet.

:param logger: Logger to use. If None, print.
"""
def __init__(self, instance_predicate,
max_tries=MAX_TRIES, delay=1, backoff=BACKOFF, logger=None):

This comment was marked as spam.

tseaver added 2 commits August 4, 2016 15:10
Better indicates its purpose relative to other retry implementations.

Rename 'tries' -> 'max_tries' for clarity.

Also, rework the exponential backoff bits to make a little clearer.
Allows choosing to retry caught exceptions based on exception attributes.

Use the new predicate feature for Bigtable retry/backoff:  we want to
retry only gRCP 'AbortionErrors' which have a status code of
'UNAVAILABLE'.
tseaver added 14 commits August 4, 2016 15:10
Permits retrying a method call until it returns the expected result,
e.g. until the newly-created 'Widget' shows up in the 'list_widgets'
API call (eventual consistency).
Permits retrying a method call until it its instance is in a given state,
e.g. until a job is 'done'.
The imperative form is cleaner.
…te':

The back end may raise a transient Conflict because the instance created
by the previous testcase is not-dead-yet.
@tseaver
Copy link
Contributor Author

tseaver commented Aug 7, 2016

@dhermes Any remaining issues? I think we disagree about the third RetryInstanceState class: should we ask @jgeewax to be the tie-breaker? I believe @daspecster is mostly offline through the end of next wee,.

@dhermes
Copy link
Contributor

dhermes commented Aug 8, 2016

My argument for a tie-breaker: fewer classes is better 😀

@tseaver
Copy link
Contributor Author

tseaver commented Aug 8, 2016

Mine: the third class makes the intent clearer: I still have to puzzle out exactly how your examples work every time I look at them.

@dhermes
Copy link
Contributor

dhermes commented Aug 9, 2016

LGTM. Let's get this merged before it gets stale

@tseaver tseaver merged commit eab9cd6 into googleapis:master Aug 9, 2016
@tseaver tseaver deleted the consolidate-retry-logic branch August 9, 2016 18:39
@dhermes dhermes mentioned this pull request Sep 19, 2016
parthea pushed a commit that referenced this pull request Nov 24, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cla: yes This human has signed the Contributor License Agreement. testing

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants