Fix for evaluation containing array and its inverse. #3671

christophe-murphy · 2025-06-23T22:08:43Z

A bug was found in the jit calculation tree where under certain circumstances the incorrect result would be found when both an array and a transpose of the same array are used in the same evaluation. When the transpose method is used on an array the treatment depends on the structure of the jit tree of that array. In the particular case that the array is linear (e.g. not a sub-array) and is not simply a buffer (i.e. a combination of buffers and/or scalars with some operators) a moddim node will be added to the root of the tree with the new dimensions. When the tree is evaluated the dimensions of the child buffer(s) of a moddim node are changed and the moddim node is deleted. If an evaluation happens to include both an array and the transpose of that same array then when the moddim node is applied it changes the dimensions of the children for the remaining lifetime of the evaluation including any subsequent use of these nodes. As a result if the transpose of an array is used in an expression followed by the array itself the second instance will also be transposed.

In this fix instead of creating a moddims node a deep copy of the calculation tree is made at that point with the transpose applied to the child buffer nodes. These buffer node copies will have the transposed dimensions and strides but will still point to the same memory location for the data so no copy of the underlying data needs to be made.

Checklist

Rebased on latest master
Code compiles
Tests pass

A bug was found in the jit calculation tree where under certain circumstances the incorrect result would be found when both an array and a transpose of the same array are used in the same evaluation. When the transpose method is used on an array the treatment depends on the structure of the jit tree of that array. In the particular case that the array is linear (e.g. not a sub-array) and is not simply a buffer (i.e. a combination of buffers and/or scalars with some operators) a moddim node will be added to the root of the tree with the new dimensions. When the tree is evaluated the dimensions of the child buffer(s) of a moddim node are changed and the moddim node is deleted. If an evaluation happens to include both an array and the transpose of that same array then when the moddim node is applied it changes the dimensions of the children for the remaining lifetime of the evaluation including any subsequent use of these nodes. As a result if the transpose of an array is used in an expression followed by the array itself the second instance will also be transposed. In this fix instead of creating a moddims node a deep copy of the calculation tree is made at that point with the transpose applied to the child buffer nodes. These buffer node copies will have the transposed dimensions and strides but will still point to the same memory location for the data so no copy of the underlying data needs to be made.

umar456 · 2025-06-30T18:25:13Z

src/backend/common/moddims.cpp

+}
+
 template<typename T>
 Array<T> moddimOp(const Array<T> &in, af::dim4 outDim) {


It looks like moddims is not a jittable operation anymore with this change. what are the performance implications of this?

edwinsolisf · 2025-07-13T06:32:48Z

I tested the effects of this PR on the OpenCL backend on the Intel Arc B580 by comparing the output of the jit kernels from the moddims and the jit test. (Note: I removed the cache kernels to make sure they are regenerated from scratch)

These are the results for moddims:
test_output_after_changes.txt
test_output_before_changes.txt

These are the results for jit:
before_jit.txt
after_jit.txt

Things to note:

The tests pass in both cases
Some of the jit kernels are different after the changes. The main difference observed in the moddims test is that more __noop operations are added which from the rough test runtime seem to get optimize or have minimal impact in the Release build. (More test to confirm optimization is needed). However, on the jit test, specifically the added test evaluateBothArrayAndItsTranspose, with the changes two more concise kernels are created instead of three.
The size of the compiled kernels are very similar but very slightly larger (which is expected?). While I could not compare individual kernels due to the name hashing, I compared the total space occupied by all the cached kernels generated. In my case I observed that it increased from 31.8 MB (33,386,496 bytes) to 31.9 MB (33,488,896 bytes), an increase of 0.3%.

I also tested these changes on the oneAPI backend and they also pass the tests (no jit kernels generated)

christophe-murphy · 2025-07-17T18:48:02Z

I tested the effects of this PR on the OpenCL backend on the Intel Arc B580 by comparing the output of the jit kernels from the moddims and the jit test. (Note: I removed the cache kernels to make sure they are regenerated from scratch)

These are the results for moddims: test_output_after_changes.txt test_output_before_changes.txt

These are the results for jit: before_jit.txt after_jit.txt

Things to note:
* The tests pass in both cases

* Some of the jit kernels are different after the changes. The main difference observed in the moddims test is that more __noop operations are added which from the rough test runtime seem to get optimize or have minimal impact in the Release build. (More test to confirm optimization is needed). However, on the jit test, specifically the added test evaluateBothArrayAndItsTranspose, with the changes two more concise kernels are created instead of three.

* The size of the compiled kernels are very similar but very slightly larger (which is expected?). While I could not compare individual kernels due to the name hashing, I compared the total space occupied by all the cached kernels generated. In my case I observed that it increased from 31.8 MB (33,386,496 bytes) to 31.9 MB (33,488,896 bytes), an increase of 0.3%.
I also tested these changes on the oneAPI backend and they also pass the tests (no jit kernels generated)

Yeah, the __noop operations were a result of the moddims nodes added to the JIT tree but those are gone now.

christophe-murphy linked an issue Jun 23, 2025 that may be closed by this pull request

[BUG] sum two array result is not correct #3660

Closed

christophe-murphy added this to the 3.10 milestone Jun 24, 2025

christophe-murphy requested a review from umar456 June 25, 2025 17:19

umar456 reviewed Jun 30, 2025

View reviewed changes

umar456 approved these changes Jul 11, 2025

View reviewed changes

christophe-murphy merged commit 3ae9f04 into master Jul 17, 2025
1 of 4 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix for evaluation containing array and its inverse. #3671

Fix for evaluation containing array and its inverse. #3671

Uh oh!

christophe-murphy commented Jun 23, 2025

Uh oh!

umar456 Jun 30, 2025

Uh oh!

edwinsolisf commented Jul 13, 2025 •

edited

Loading

Uh oh!

christophe-murphy commented Jul 17, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Fix for evaluation containing array and its inverse. #3671

Fix for evaluation containing array and its inverse. #3671

Uh oh!

Conversation

christophe-murphy commented Jun 23, 2025

Checklist

Uh oh!

umar456 Jun 30, 2025

Choose a reason for hiding this comment

Uh oh!

edwinsolisf commented Jul 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

christophe-murphy commented Jul 17, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

edwinsolisf commented Jul 13, 2025 •

edited

Loading