-
Notifications
You must be signed in to change notification settings - Fork 26.3k
Clip Binomial results for different endpoints in curand_uniform #42702
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
💊 CI failures summary and remediationsAs of commit e30b771 (more details on the Dr. CI page):
🕵️ 4 new failures recognized by patternsThe following CI failures do not appear to be due to upstream breakages:
|
|
cc @ngimel for a review since you are in the original issue thread, please feel free to remove the assignment if doesn't fit. |
|
Lint error is due to CMake not finding Protobuf: caffe2_onnx_main_py3_6_clang7_ubuntu16_04_build failure is caused by a CMake version discrepancy" I don't think I caused those ... |
Yes, I did. I picked up some third-party submodule changes by mistake. Fixing. |
|
A few of the builds got this error, should I rebase off master again? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This won't work, because val is float, and you are static_casting and not reinterpret-casting it to uint64_t, which means that it'll either become 0 or 1.
| auto val = curand_uniform(&state); | |
| auto val = curand_uniform(&state); |
uint val = curand(&state); //need just bits
constexpr auto MASK = static_cast<uint>((static_cast<uint64_t>(1) << std::numeric_limits<float>::digits) - 1);//MASK should be uint, int64 arithmetic is much slower. In honesty, even static_cast<uint64_t>(1) is not needed because digist for float is 24, so 32 bits is enough, but it does not matter
constexpr auto DIVISOR = static_cast<float>(1) / (static_cast<uint>(1) << std::numeric_limits<float>::digits);//uint64_t is not needed here either
return (val & MASK) * DIVISOR;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks. I assumed tests would catch any gross errors :(
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
adopted in 7257949. Also rebased off master to hopefully clear the CI errors
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IT's actually concerning that the tests don't catch those errors.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's uint32_t probably to fix windows builds, sorry.
|
The
|
|
You changes are cuda :-) but it does not look like they are related. ROCm has been very flaky lately. Let's try to merge. |
facebook-github-bot
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ngimel has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
|
Oh, btw, sorry to ask, but in the test you are doing, can you also roughly check the number of 0's and 1's, so that at least glaring errors in the distribution are caught? |
Yup. |
|
I added a smoke test, but now there are new CI failures. |
|
Test failures are unrelated. Thank you! |
facebook-github-bot
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ngimel has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
Fixes #42153
As documented (search for
curand_uniformon the page),curand_uniformreturns "from 0.0 to 1.0, where 1.0 is included and 0.0 is excluded." These endpoints are different than the CPU equivalent, and makes the calculation in the PR fail when the value is 1.0.The test from the issue is added, it failed for me consistently before the PR even though I cut the number of samples by 10.