-
Notifications
You must be signed in to change notification settings - Fork 26.3k
Fix flaky nuclear_norm() test #21638
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Naturally this particular CI run is green. :) But the cause is unrelated to the diff, with thousands of tests I've managed to |
|
@kostmo can we get a CI scan for the |
|
The latest status is unfortunately "Heisenbug": If I repeat the tests often enough (1000+), print out the input values on failure, then run the test manually with the inputs, the results are the same. |
|
This is one example (test_torch.py is modified for printing): Equivalent to: Which gives: |
|
Can you run the kernel with |
|
@ezyang I tried So the situation is:
So 3) is what the latest diff implements. It is not quite satisfying, but may help in suppressing the flakiness until the cause is found. |
|
I haven't tried |
…ear_norm_flaky_test
| _TestTorchMixin._test_nuclear_norm_axes(self, device='cuda') | ||
|
|
||
| @unittest.skipIf(not TEST_MAGMA, "no MAGMA library detected") | ||
| def test_nuclear_norm_exceptions(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you try adding the @skipCUDANonDefaultStreamIf(True) decorator here and check whether the tests pass? I think this might be an issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm just guessing here based on your comments about the tests passing after being moved to a different script.
|
@vishwakftw Thanks, I think
|
|
@ezyang I think the latest diff (suggestion by @vishwakftw) resolves #21785. |
facebook-github-bot
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ezyang is landing this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
Try to fix a sporadic failure on some CIs.
I've run this test hundreds of times on my machine (GeForce 1060, MAGMA) but I cannot reproduce this.