Skip to content

Conversation

@umar456
Copy link
Member

@umar456 umar456 commented Apr 20, 2020

This PR improves the performance of the transpose and join kernels in the CPU backend. The previous approach was naive and was not optimized for data locality. This new version is slightly improved and uses a tile based approach to speed up the operation.

The join kernel is improved by using a memcpy call instead of a for loop to perform the copy to the output matrix.

Fixed several warnings using the -Wall flag in GCC and enabled it by default in CMake

Fixed a matrix multiplication test where the output matrix was not being tested

Fixed a potential issue with mean where the optimizer could remove some operations that could reduce the accuracy of the result.

Use double to calculate the mean for the random engine uniform tests to avoid overflow issues with larger arrays

Fixed a bug in the DefaultMemoryManager introduced in f211253 where I was dereferencing an iterator before checking if the find function returned an actual value.

@umar456 umar456 added this to the 3.7.2 milestone Apr 20, 2020
Copy link
Member

@9prady9 9prady9 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Finished only join instantiation related changes. It would be great if this big PR can be split into

  • Join related changes
  • Warnings fixes that touches lot of files
  • other misc fixes

@9prady9 9prady9 merged commit c7f16cc into arrayfire:master Apr 22, 2020
@umar456 umar456 deleted the cpu_opt branch June 26, 2020 18:40
@umar456 umar456 mentioned this pull request Jun 27, 2020
2 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants