-
Notifications
You must be signed in to change notification settings - Fork 552
Fix for evaluation containing array and its inverse. #3671
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix for evaluation containing array and its inverse. #3671
Conversation
A bug was found in the jit calculation tree where under certain circumstances the incorrect result would be found when both an array and a transpose of the same array are used in the same evaluation. When the transpose method is used on an array the treatment depends on the structure of the jit tree of that array. In the particular case that the array is linear (e.g. not a sub-array) and is not simply a buffer (i.e. a combination of buffers and/or scalars with some operators) a moddim node will be added to the root of the tree with the new dimensions. When the tree is evaluated the dimensions of the child buffer(s) of a moddim node are changed and the moddim node is deleted. If an evaluation happens to include both an array and the transpose of that same array then when the moddim node is applied it changes the dimensions of the children for the remaining lifetime of the evaluation including any subsequent use of these nodes. As a result if the transpose of an array is used in an expression followed by the array itself the second instance will also be transposed. In this fix instead of creating a moddims node a deep copy of the calculation tree is made at that point with the transpose applied to the child buffer nodes. These buffer node copies will have the transposed dimensions and strides but will still point to the same memory location for the data so no copy of the underlying data needs to be made.
| } | ||
|
|
||
| template<typename T> | ||
| Array<T> moddimOp(const Array<T> &in, af::dim4 outDim) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like moddims is not a jittable operation anymore with this change. what are the performance implications of this?
|
I tested the effects of this PR on the OpenCL backend on the Intel Arc B580 by comparing the output of the jit kernels from the moddims and the jit test. (Note: I removed the cache kernels to make sure they are regenerated from scratch) These are the results for moddims: These are the results for jit: Things to note:
I also tested these changes on the oneAPI backend and they also pass the tests (no jit kernels generated) |
Yeah, the __noop operations were a result of the moddims nodes added to the JIT tree but those are gone now. |
A bug was found in the jit calculation tree where under certain circumstances the incorrect result would be found when both an array and a transpose of the same array are used in the same evaluation. When the transpose method is used on an array the treatment depends on the structure of the jit tree of that array. In the particular case that the array is linear (e.g. not a sub-array) and is not simply a buffer (i.e. a combination of buffers and/or scalars with some operators) a moddim node will be added to the root of the tree with the new dimensions. When the tree is evaluated the dimensions of the child buffer(s) of a moddim node are changed and the moddim node is deleted. If an evaluation happens to include both an array and the transpose of that same array then when the moddim node is applied it changes the dimensions of the children for the remaining lifetime of the evaluation including any subsequent use of these nodes. As a result if the transpose of an array is used in an expression followed by the array itself the second instance will also be transposed.
In this fix instead of creating a moddims node a deep copy of the calculation tree is made at that point with the transpose applied to the child buffer nodes. These buffer node copies will have the transposed dimensions and strides but will still point to the same memory location for the data so no copy of the underlying data needs to be made.
Checklist