-
Notifications
You must be signed in to change notification settings - Fork 26.3k
Vsx initial support issue27678 #41541
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Vsx initial support issue27678 #41541
Conversation
💊 CI failures summary and remediationsAs of commit 97c8e17 (more details on the Dr. CI page):
Extra GitHub checks: 2 failed
This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.Please report bugs/suggestions on the GitHub issue tracker or post in the (internal) Dr. CI Users group. This comment has been revised 162 times. |
4925d8f to
ccab406
Compare
19959fe to
84f7d05
Compare
|
@smessmer in the future, please don't mark an OSS PR as triaged without also assigning a reviewer for it |
|
Not to sure who should be the shepherd for this. @VitalyFedyunin do you have any ideas? |
|
New modifications on Vec256Tests:
|
|
For now AVX2 and VSX both fails on these tests Additionally VSX fails on Multiplication because of precision |
|
Failures above are precision related and tests can be checked within the domain and with low precision. |
|
@VitalyFedyunin if possible, could we discuss my Pr this week |
|
Do we know anyone at IBM who could help review this? |
|
@xuhdev / @zasdfgbnm is this the sort of thing either of you would be interested in? :) |
|
I know very little about CPU vectorization :) |
hello, I have little PowerPC experience, but given time I can potentially review and land this proposal. Several notes so far:
|
|
| #include <c10/macros/Macros.h> | ||
| #include <ATen/cpu/vec256/intrinsics.h> | ||
|
|
||
| using vbool8 = __attribute__((altivec(vector__))) __attribute__((altivec(bool__))) char; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about replacing __attribute__((altivec(vector__))) by __vector for readability?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had to do so because of GCC and default compilation. I opened the issue advance-toolchain/issues/1861.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ouch, I see. Shame
|
Having checked that further my biggest remaining complaint would be the code duplication. The added code is massive but if I e.g. do a search&replace on the double implementation to replace all "double" by "float" and all "vfloat64" by "vfloat32" there isn't much of a difference left. Likely similar for the other types. Also this way the differences stick out: E.g. using Sleef in double but some "obscure" code in float. Similar for some magic numbers used (1, 3, 7, ...). This makes it hard to verify correctness and if there is any issue it has to be fixed/improved in all (by now 10) instances. How about having a Also the member ops can likely be done in the base class. Just an idea to make this easier to develop and maintain. I'm not part of FB/PyTorch obviously so this is only from a fellow developer. |
|
@Flamefire |
|
Ah I see, the base class approach you suggested is also what I had in mind. Using
Do I read this correct as: Having the base class would not allow to implement the sin/cos as members in the base class template because there is no "1 function" to call? I think this still reduces the amount of code a lot. Currently 1 such class file is about 300-400 lines long and there are 10 so 3000-4000 LOC. I'd expect the base class to reduce that to at least half of that, maybe more. Just having the mask-vector-union and the ctors in a base class already helps as do the ops. You might need some CRTP though, if you want the ops to return the |
|
#41541 (comment) |
|
Hello @Flamefire, we are looking to rewrite (and remove massive duplication) Vec code (eventually to support AVX512 and various vec sizes and architectures). So far we are happy with the code as is. |
|
@quickwritereader please scroll up for CLA information, without it I cannot import your code to run internal tests. |
facebook-github-bot
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@VitalyFedyunin has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
Summary: This fixes multiple bugs introduced by the VSX optimized code in #41541 - min/max/clamp now consistently return nan when any value is NaN as on other architectures - The non-complex angle functions return PI for negative values now - The complex angle functions have been corrected and optimized - The float32-log function implementation returned a wrong result when inf was passed (and maybe other inputs), replaced by the sleef function just as for float64 Fixes #59248 Fixes #57537 Pull Request resolved: #59382 Reviewed By: jbschlosser Differential Revision: D28944626 Pulled By: ezyang fbshipit-source-id: 1ae2782b9e34e458a19cec90617037654279e0e0
Summary: This fixes multiple bugs introduced by the VSX optimized code in pytorch#41541 - min/max/clamp now consistently return nan when any value is NaN as on other architectures - The non-complex angle functions return PI for negative values now - The complex angle functions have been corrected and optimized - The float32-log function implementation returned a wrong result when inf was passed (and maybe other inputs), replaced by the sleef function just as for float64 Fixes pytorch#59248 Fixes pytorch#57537 Pull Request resolved: pytorch#59382 Reviewed By: jbschlosser Differential Revision: D28944626 Pulled By: ezyang fbshipit-source-id: 1ae2782b9e34e458a19cec90617037654279e0e0
This fixes the remaining bug introduced by the VSX optimized code in #41541 Followup to #59382 ### Description The code currently returns wrong results on POWER9LE making e.g. the `test_binary_ufuncs` fail. ### Testing Build and ran tests on PPC Pull Request resolved: #82646 Approved by: https://github.com/ezyang
Summary: This fixes the remaining bug introduced by the VSX optimized code in #41541 Followup to #59382 ### Description The code currently returns wrong results on POWER9LE making e.g. the `test_binary_ufuncs` fail. ### Testing Build and ran tests on PPC Pull Request resolved: #82646 Approved by: https://github.com/ezyang Test Plan: contbuild & OSS CI, see https://hud.pytorch.org/commit/pytorch/pytorch/39ffad392c49aafcfeba05e2704bb1b666247471 Reviewed By: kit1980 Differential Revision: D38395263 fbshipit-source-id: c4c56af2d8e3b528b6418a4a32c63de77037e5cf
Replace the remaining hand-written code in vec256_float_vsx.h by calls to Sleef functions similar to what was done in pytorch#59382 & pytorch#82646 after pytorch#41541 Also remove some whitespace wrongly added in the above PRs. This fixes wrong results for e.g. `sin(1e20)`. Fixes pytorch#85978
Replace the remaining hand-written code in vec256_float_vsx.h by calls to Sleef functions similar to what was done in pytorch#59382 & pytorch#82646 after pytorch#41541 Also remove some whitespace wrongly added in the above PRs. This fixes wrong results for e.g. `sin(1e20)`. Fixes pytorch#85978
Replace the remaining hand-written code in vec256_float_vsx.h by calls to Sleef functions similar to what was done in #59382 & #82646 after #41541 This fixes wrong results for e.g. `sin(1e20)`. Fixes #85978 To fix #85978 I only needed to do the sin/cos functions to make the test pass but to not encounter the same issue again and again (see the previous PRs and issues) I checked the whole file for similar functions where a Sleef function could be used and changed those too. In the diff I've noticed the faulty whitespace so to make this complete I fixed that too, so it should now be done. Pull Request resolved: #86453 Approved by: https://github.com/malfet
Replace the remaining hand-written code in vec256_float_vsx.h by calls to Sleef functions similar to what was done in pytorch#59382 & pytorch#82646 after pytorch#41541 This fixes wrong results for e.g. `sin(1e20)`. Fixes pytorch#85978 To fix pytorch#85978 I only needed to do the sin/cos functions to make the test pass but to not encounter the same issue again and again (see the previous PRs and issues) I checked the whole file for similar functions where a Sleef function could be used and changed those too. In the diff I've noticed the faulty whitespace so to make this complete I fixed that too, so it should now be done. Pull Request resolved: pytorch#86453 Approved by: https://github.com/malfet
Pytorch Vec256 ppc64le support
implemented types:
Notes:
All basic vector operations are implemented:
There are a few problems:
Besides, I added CPU_CAPABILITY for power. but as because of quantization errors for DEFAULT I had to undef and use vsx for DEFAULT tooDetails
Supported math functions
(notes). Example: -(both ) means it was also missing on x86 side
g( func_name) means vectorization is using func_name
sleef - redirected to the Sleef
unsupported
Vec256 Test cases Pr #42685
Current list:
Missing:
Notes on tests and testing framework
For example, std::round and vector version of round differs. so it was tested against the local versionfor double type on Vsx vec_round failed for (even)+0 .5 values. it was solved by using vec_rintcomplex types are not testedAfter enabling complex testing due to precision and domain some of the complex functions failed for vsx and x86 avx as well. I will either test it against local implementation or check within the accepted domainquantizations are not testedAdded tests for quantizing, dequantize, requantize_from_int, relu, relu6, widening_subtract functionsFor now-DBUILD_MOBILE_TEST=ONwill be used for Vec256Test tooVec256 Test cases will be built for each CPU_CAPABILITY