1

When using Dynamic Parallelism in CUDA, you can implement recursive algorithms like mergeSort. I have implemented it and my program don't work for inputs greater than blah.
My question is how many depth in the recursion tree the implementation can go? Is there any limitation? (My program is just fine for smaller inputs.)

1

1 Answer 1

4

From Professional CUDA C Programming:

The maximum nesting depth of dynamic parallelism is limited to 24, but in reality most kernels will be limited by the amount of memory required by the device runtime system at each new level . . .

Sign up to request clarification or add additional context in comments.

2 Comments

This is documented in the programming guide as well.
Seems like something that will need to go in the cudaDeviceProp struct eventually.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.