-
Notifications
You must be signed in to change notification settings - Fork 1.6k
Consolidate retry logic #2050
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Consolidate retry logic #2050
Conversation
|
@tseaver nice! Did you happen to try this on any of the pubsub failures? I tried retrying them last night but they still failed. |
|
Lint failure |
system_tests/attempt_system_tests.py
Outdated
| from run_system_test import run_module_tests | ||
|
|
||
|
|
||
| MODULES = ( |
This comment was marked as spam.
This comment was marked as spam.
Sorry, something went wrong.
This comment was marked as spam.
This comment was marked as spam.
Sorry, something went wrong.
|
Really great work! How did you decide which flaky test cases to target? |
|
@dhermes Thanks! I was tackling those which fail routinely, first on my box and then on Travis. I imagine we haven't whacked the last mole yet. |
system_tests/retry.py
Outdated
| :param logger: Logger to use. If None, print. | ||
| """ | ||
| def __init__(self, instance_predicate, | ||
| max_tries=MAX_TRIES, delay=1, backoff=BACKOFF, logger=None): |
This comment was marked as spam.
This comment was marked as spam.
Sorry, something went wrong.
Better indicates its purpose relative to other retry implementations. Rename 'tries' -> 'max_tries' for clarity. Also, rework the exponential backoff bits to make a little clearer.
Allows choosing to retry caught exceptions based on exception attributes. Use the new predicate feature for Bigtable retry/backoff: we want to retry only gRCP 'AbortionErrors' which have a status code of 'UNAVAILABLE'.
Permits retrying a method call until it returns the expected result, e.g. until the newly-created 'Widget' shows up in the 'list_widgets' API call (eventual consistency).
Permits retrying a method call until it its instance is in a given state, e.g. until a job is 'done'.
The imperative form is cleaner.
…te': The back end may raise a transient Conflict because the instance created by the previous testcase is not-dead-yet.
Addresses: #2050 (comment)
|
@dhermes Any remaining issues? I think we disagree about the third |
|
My argument for a tie-breaker: fewer classes is better 😀 |
|
Mine: the third class makes the intent clearer: I still have to puzzle out exactly how your examples work every time I look at them. |
See reports 4) an 7) in: #1100
|
LGTM. Let's get this merged before it gets stale |
Toward #2042.
This PR tries to rationalize the retries in our system tests into three types:
logging_).__self__passes (e.g., untiljob.state == "done").The PR adds classes for each of these categories, and attempts to use them where appropriate, replacing manual
time.sleep()calls and existing "backoff" functions / loops.It also tweaks the
tox -e system-testsrunner to attempt all system test modules, even if one fails, and tries to run them in order from least to most flake.BTW, cannot figure out how to get the
translatetests to run at all, and there is one of theerror_reportingtests which blows up at me.