Skip to content

Conversation

@frasercrmck
Copy link
Contributor

Description

Without the barrier at the end of barrierOR, it is possible for work-item 0 to start the next loop iteration and update predicates[0] while other work-items are still inside barrierOR reading predicates, meaning they read the next loop iteration's exit condition. This results in a divergent loop, where not all work-items reach the same barriers.

A previous fix identified this as a problem only on NVIDIA platforms, but strictly speaking a barrier is required in all cases to avoid a spec violation and undefined behaviour.

Changes to Users

The kernel should produce correct results on more OpenCL implementations.

Locally I tested both Intel(R) FPGA Emulation Device and various oneAPI Construction Kit devices, which all previously failed the confidence_connected_opencl --gtest_filter="SingleSeed/ConfidenceConnectedDataTest.SegmentARegion/_prefix_background_radius_0_multiplier_1_iterations_5_replace_255" unit test.

I'm unable to test other OpenCL implementations, sorry.

Checklist

  • Rebased on latest master
  • Code compiles
  • Tests pass
  • [ ] Functions added to unified API
  • [ ] Functions documented

Without the barrier at the end of barrierOR, it is possible for
work-item 0 to start the next loop iteration and update predicates[0]
while other work-items are still inside barrierOR reading `predicates`,
meaning they read the next loop iteration's exit condition. This results
in a divergent loop, where not all work-items reach the same barriers.

A previous fix identified this as a problem only on NVIDIA platforms,
but strictly speaking a barrier is required in all cases to avoid a spec
violation and undefined behaviour.
@umar456
Copy link
Member

umar456 commented Feb 21, 2024

Took me a bit to figure out the problem but I see the issue now. The we can ignore the errors in the CI because they are not related. I will test it on a couple of other systems before merge this PR. Thank you for your contribution!

@melonakos melonakos added this to the 3.10 milestone Feb 5, 2025
@christophe-murphy christophe-murphy self-requested a review February 20, 2025 20:17
@christophe-murphy christophe-murphy merged commit 6cea4d3 into arrayfire:master Feb 20, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants