-
Notifications
You must be signed in to change notification settings - Fork 552
Backport changes to 3.7 #2949
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Backport changes to 3.7 #2949
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Instead of creating a static library out of all separate instantiations of thrust_sort_by_key sources, we now directly embed sources generated(using cmake's configure_file command) into afcuda target. This also fixed separable compilation. Prior to this change, separate compilation failed (related to cuda device linking - undefined references). I tried to fix that problem, but couldn't get a break through. However, I realized that just directly using the generated sources with afcuda target will do the job without any additional static library.
thrust::stable_sort_by_key has known issue with device linking. The code crashes with cudaInvalidValueError. It works as expected without any changes with or without separable compilation otherwise. https://github.com/thrust/thrust/wiki/Debugging#known-issues https://github.com/thrust/thrust/blob/master/doc/changelog.md#known-issues-2 The above documents mention a known issue with device linking and thrust. Although the documents say it happens in debug mode(with -G flag), I noticed similar crashes in release configuration too in ArrayFire. Due to the above issue, I have separated out the relevant source files (fft,blas,sparse and solver) which require device linking into separate static library. Once separated into a separate static library, sort_by_key and all the other unit tests that use it are running as expected without any crashes.
Removed a special neighborhood iterator which isn't necessary
Added mouse manipulations
Change ninja to 1.10.0
pinverse_cpu test is excluded as lapacke dependency is not taken care of yet
* Fix constant mem declaration in CUDA morph kernel Global constant value of max filter length was not modified after increasing filter support to 19 from 17 back originally.
* adds fallback for convolveNN functions * adds cudnn option, runtime fallback * Noexcept and const many Dependency module functions * Refactor cuDNN code in CMake * Fix fallback logic. refactor cuDNN util functions. Fix f16 wrap Co-authored-by: Umar Arshad <umar@arrayfire.com>
* Add clang-tidy configuration file * Cleanup some exception code * Add additional upstream directories to .gitignore * Remove unused parameters from wrap and transform implementations * Fix warnings and removed unused calls
* Removed constexpr not supported by VS2015 * Fixed formatting
* enqueueWriteBuffer asynchronously in vision kernels There are few locations where initializing the flags or buffers were earlier using synchronous copy to GPU memory which is not needed since the kernel execution in-order. Hence, changed them to be asynchronous copies. * Fix formatting * Correct the scope of h_desc_lvl on orb
* Improve documentation of the alloc and free function * Add tests for memory operations
* Created snippets for examples in the document
* The rdc and dlink flags are not required because they are added by CMake for separable compilation and static linking respectively * Add guards around libs that are not included in the CUDA 9.0 Toolkit * Only link with OpenMP when linking with cuSOLVER dynamically * Fix error message when CUDNN is not found
* Address casts from double to __half which are missing in 9.0 * Thrust return_temporary_buffer function can accept void* pointers in older versions of Thrust. Use raw_pointer_cast to pass the pointer to memFree * cublasGemmEx doesn't exist in CUDA 9.0. Add ifdefs to guard against older builds * __float2half is not a host function so it needs to be removed from mean * Add template instantiation for memFree to accept void* pointers
CUSOLVER_CHECK error message printed "CUBLAS Error" instead of CUSOLVER Error
Earlier to this change, I added bash based syntax which won't work with /bin/sh or dash shells. /usr/sh is available on most systems that use init.d scripts. So, it is safe to assume it's availability on majority of linux distributions.
* Adds PR template **Short description of change** Adds a github PR template for the ArrayFire project. Developers will now face a short suggested checklist when creating a new PR on github. **Motivation** Adding a PR template will make it easier to reference old issues when generating reports and link future issue in historical context. **Future considerations** Wiki might need to be updated with additional development guidelines. The current guidelines could be more comprehensive. * Updated pull request template * Added additional detail. * Use comments instead of text to communicate with the reader. * Create a simple checklist * Grammer + Future changes in the description section Co-authored-by: Umar Arshad <umar@arrayfire.com>
* AF_CONSTEXPR expands to nothing if constexpr support is not available. * Replace CONSTEXPR_DH with AF_CONSTEXPR and __DH__ in `src/backend/common/half.hpp` * Removed AF_CONSTEXPR where it is invalid in half.hpp
* Adds the Zc:__cplusplus flag to cuda builds for MSVC if the flag is available. the cuda_fp16 header does not define the default constructor for __half as "= default" and that prevents the __half struct to be used in a constexpr expression * For older versions of MSVC we define the __cplusplus macro before and after the inclusion of cuda_fp16.h header. * Define the AF_CONSTEXPR macro for NVRTC compilation
Adds several classes of issues with proposed additional information that would be helpful when debugging. Co-authored-by: pradeep <pradeep@arrayfire.com> Co-authored-by: Umar Arshad <umar@arrayfire.com>
cusparseSpMv/cusparseSpMM functions use sparse and dense matrix/vector descriptor objects as arguments. This API is introduced in CUDA 10.1 and old API has been deprecated. It is also removed in CUDA 11.
Also, updates CUB version from 1.8.0 to 1.9.10
9prady9
approved these changes
Jun 27, 2020
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR backports bugfixs and some minor features to the 3.7 branch for the 3.7.2 release.
Description
Improvements
Fixes
Checklist
[ ] Rebased on latest master[ ] Functions added to unified API[ ] Functions documented