Skip to content

Conversation

@umar456
Copy link
Member

@umar456 umar456 commented Jun 27, 2020

This PR backports bugfixs and some minor features to the 3.7 branch for the 3.7.2 release.

Description

Improvements

Fixes

Checklist

  • [ ] Rebased on latest master
  • Code compiles
  • Tests pass
  • [ ] Functions added to unified API
  • [ ] Functions documented

9prady9 and others added 30 commits June 26, 2020 15:15
Instead of creating a static library out of all separate instantiations
of thrust_sort_by_key sources, we now directly embed sources
generated(using cmake's configure_file command) into afcuda target.

This also fixed separable compilation.
Prior to this change, separate compilation failed (related to cuda device
linking - undefined references). I tried to fix that problem, but
couldn't get a break through. However, I realized that just directly
using the generated sources with afcuda target will do the job without
any additional static library.
thrust::stable_sort_by_key has known issue with device linking. The code
crashes with cudaInvalidValueError. It works as expected without any
changes with or without separable compilation otherwise.

https://github.com/thrust/thrust/wiki/Debugging#known-issues
https://github.com/thrust/thrust/blob/master/doc/changelog.md#known-issues-2

The above documents mention a known issue with device linking and thrust.
Although the documents say it happens in debug mode(with -G flag), I noticed
similar crashes in release configuration too in ArrayFire.

Due to the above issue, I have separated out the relevant source files
(fft,blas,sparse and solver) which require device linking into separate
static library. Once separated into a separate static library, sort_by_key
and all the other unit tests that use it are running as expected without
any crashes.
Removed a special neighborhood iterator which isn't necessary
Added mouse manipulations
pinverse_cpu test is excluded as lapacke dependency is not taken care of
yet
* Fix constant mem declaration in CUDA morph kernel

Global constant value of max filter length was not modified
after increasing filter support to 19 from 17 back originally.
* adds fallback for convolveNN functions

* adds cudnn option, runtime fallback

* Noexcept and const many Dependency module functions

* Refactor cuDNN code in CMake

* Fix fallback logic. refactor cuDNN util functions. Fix f16 wrap

Co-authored-by: Umar Arshad <umar@arrayfire.com>
* Add clang-tidy configuration file

* Cleanup some exception code

* Add additional upstream directories to .gitignore

* Remove unused parameters from wrap and transform implementations

* Fix warnings and removed unused calls
* Removed constexpr not supported by VS2015

* Fixed formatting
9prady9 and others added 26 commits June 26, 2020 18:10
* enqueueWriteBuffer asynchronously in vision kernels

There are few locations where initializing the flags or buffers were
earlier using synchronous copy to GPU memory which is not needed since
the kernel execution in-order. Hence, changed them to be asynchronous
copies.

* Fix formatting

* Correct the scope of h_desc_lvl on orb
* Improve documentation of the alloc and free function
* Add tests for memory operations
* Created snippets for examples in the document
* The rdc and dlink flags are not required because they are added
  by CMake for separable compilation and static linking respectively
* Add guards around libs that are not included in the CUDA 9.0
  Toolkit
* Only link with OpenMP when linking with cuSOLVER dynamically
* Fix error message when CUDNN is not found
* Address casts from double to __half which are missing in 9.0
* Thrust return_temporary_buffer function can accept void* pointers
  in older versions of Thrust. Use raw_pointer_cast to pass the
  pointer to memFree
* cublasGemmEx doesn't exist in CUDA 9.0. Add ifdefs to guard
  against older builds
* __float2half is not a host function so it needs to be removed
  from mean
* Add template instantiation for memFree to accept void* pointers
CUSOLVER_CHECK error message printed "CUBLAS Error" instead of
CUSOLVER Error
Earlier to this change, I added bash based syntax which won't work
with /bin/sh or dash shells.

/usr/sh is available on most systems that use init.d scripts. So, it is
safe to assume it's availability on majority of linux distributions.
* Adds PR template 

**Short description of change**  
Adds a github PR template for the ArrayFire project. Developers will now face a short suggested checklist when creating a new PR on github.

**Motivation**  
Adding a PR template will make it easier to reference old issues when generating reports and link future issue in historical context.

**Future considerations**  
Wiki might need to be updated with additional development guidelines. The current guidelines could be more comprehensive.

* Updated pull request template

* Added additional detail.
* Use comments instead of text to communicate with the reader.
* Create a simple checklist

* Grammer + Future changes in the description section

Co-authored-by: Umar Arshad <umar@arrayfire.com>
* AF_CONSTEXPR expands to nothing if constexpr support is not available.

* Replace CONSTEXPR_DH with AF_CONSTEXPR and __DH__ in
  `src/backend/common/half.hpp`

* Removed AF_CONSTEXPR where it is invalid in half.hpp
* Adds the Zc:__cplusplus flag to cuda builds for MSVC if the flag is available.
the cuda_fp16 header does not define the default constructor for __half as
"= default" and that prevents the __half struct to be used in a constexpr
expression

* For older versions of MSVC we define the __cplusplus macro before and
after the inclusion of cuda_fp16.h header.

* Define the AF_CONSTEXPR macro for NVRTC compilation
Adds several classes of issues with proposed additional information that would be helpful when debugging.

Co-authored-by: pradeep <pradeep@arrayfire.com>
Co-authored-by: Umar Arshad <umar@arrayfire.com>
cusparseSpMv/cusparseSpMM functions use sparse and dense matrix/vector
descriptor objects as arguments. This API is introduced in CUDA 10.1 and
old API has been deprecated. It is also removed in CUDA 11.
Also, updates CUB version from 1.8.0 to 1.9.10
@umar456 umar456 added this to the 3.7.2 milestone Jun 27, 2020
@9prady9 9prady9 merged commit 2b929a8 into arrayfire:v3.7 Jun 27, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants