When using Dynamic Parallelism in CUDA, you can implement recursive algorithms like mergeSort. I have implemented it and my program don't work for inputs greater than blah.
My question is how many depth in the recursion tree the implementation can go? Is there any limitation? (My program is just fine for smaller inputs.)
-
1stackoverflow.com/questions/14301903/…void_ptr– void_ptr2015-01-03 17:06:28 +00:00Commented Jan 3, 2015 at 17:06
Add a comment
|
1 Answer
From Professional CUDA C Programming:
The maximum nesting depth of dynamic parallelism is limited to 24, but in reality most kernels will be limited by the amount of memory required by the device runtime system at each new level . . .
2 Comments
Robert Crovella
This is documented in the programming guide as well.
user14717
Seems like something that will need to go in the
cudaDeviceProp struct eventually.