Improve the performance of CPU join and transpose #2849

umar456 · 2020-04-20T00:41:03Z

This PR improves the performance of the transpose and join kernels in the CPU backend. The previous approach was naive and was not optimized for data locality. This new version is slightly improved and uses a tile based approach to speed up the operation.

The join kernel is improved by using a memcpy call instead of a for loop to perform the copy to the output matrix.

Fixed several warnings using the -Wall flag in GCC and enabled it by default in CMake

Fixed a matrix multiplication test where the output matrix was not being tested

Fixed a potential issue with mean where the optimizer could remove some operations that could reduce the accuracy of the result.

Use double to calculate the mean for the random engine uniform tests to avoid overflow issues with larger arrays

Fixed a bug in the DefaultMemoryManager introduced in f211253 where I was dereferencing an iterator before checking if the find function returned an actual value.

9prady9

Finished only join instantiation related changes. It would be great if this big PR can be split into

Join related changes
Warnings fixes that touches lot of files
other misc fixes

src/backend/cpu/join.cpp

src/backend/opencl/surface.cpp

umar456 added 5 commits April 19, 2020 06:31

Fix dereference of memory_info iterator before check

62f9a3a

Use double to calculate mean in random engine uniform tests if avialable

10d1177

Prevent the optimizations in the MeanOp on cpu.

47e8d1f

Fix the MatrixMultiplyBatch test so that we are testing the result

678c5db

Remove unnecessary tile from var. Use arith output parameter instead

29aff9a

umar456 added this to the 3.7.2 milestone Apr 20, 2020

9prady9 requested changes Apr 20, 2020

View reviewed changes

src/backend/cpu/join.cpp Outdated Show resolved Hide resolved

src/backend/opencl/surface.cpp Outdated Show resolved Hide resolved

umar456 force-pushed the cpu_opt branch from f84ee99 to c5986ec Compare April 20, 2020 22:50

umar456 added 4 commits April 20, 2020 18:57

Address all warnings with -Wall flags in GCC 9.3

f595b68

Enable the -Wall flags if the compiler supports it

01db684

Speed up CPU transpose

330a703

Optimize join using memcpy

5feead0

umar456 force-pushed the cpu_opt branch from c5986ec to da0c1ae Compare April 20, 2020 23:14

Remove unnecessary instantiations of join in all backends

e1cf3d1

umar456 force-pushed the cpu_opt branch from da0c1ae to e1cf3d1 Compare April 20, 2020 23:24

9prady9 approved these changes Apr 22, 2020

View reviewed changes

9prady9 merged commit c7f16cc into arrayfire:master Apr 22, 2020

umar456 deleted the cpu_opt branch June 26, 2020 18:40

umar456 mentioned this pull request Jun 27, 2020

Backport changes to 3.7 #2949

Merged

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve the performance of CPU join and transpose #2849

Improve the performance of CPU join and transpose #2849

Uh oh!

umar456 commented Apr 20, 2020

Uh oh!

9prady9 left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Improve the performance of CPU join and transpose #2849

Improve the performance of CPU join and transpose #2849

Uh oh!

Conversation

umar456 commented Apr 20, 2020

Uh oh!

9prady9 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants