Reusing shared data between global functions

Ask Question

Asked 6 months ago

Modified 6 months ago

Viewed 86 times

Is there an officially sanctioned way to reuse shared data between global functions?

Consider the following code https://cuda.godbolt.org/z/KMj9EKKbf:

#include <cuda.h>
#include <stdio.h>

__global__ void kernel_A(int** s) {
  __shared__ int sdata;
  sdata = clock();
  *s = &sdata;
  printf("sdata: %p = %i\n", &sdata, sdata);
}

//somewhat surprisingly, inserting a device function with __shared__ data
//does not move kernel_B's shared data. 
//Maybe this is just luck?
__device__ int& make_shared() {
  __shared__ int bdata;
  bdata = 1;
  auto& result = bdata;
  return result;
}

__global__ void kernel_B(int** ptr_sdata) {
  auto& bdata = make_shared();
  printf("bdata: %p = %i\n", &bdata, bdata);
  __shared__ int sdata;
  printf("A.sdata: %p = %i, B.sdata: %p = %i\n", *ptr_sdata, *ptr_sdata[0], &sdata, sdata);
}

int main() {
  int** ptr_sdata;
  cudaMalloc(&ptr_sdata, sizeof(int*));
  kernel_A<<<1,1>>>(ptr_sdata);
  kernel_B<<<1,1>>>(ptr_sdata);
  cudaDeviceSynchronize();
}

This prints:

sdata: 0x7d52c5000000 = -594894858
bdata: 0x7d52c5000004 = 1
A.sdata: 0x7d52c5000000 = -594894858, B.sdata: 0x7d52c5000000 = -594894858

I understand that kernel_B might not get assigned to the same SM as kernel_A, but let's assume that I spawn as many kernels as there are SMs, so that every SM runs a single block.

Is there a way to reuse the shared memory contents from kernel_A in kernel_B reliably?

edited Jun 3 at 12:57

paleonix

3,3255 gold badges20 silver badges42 bronze badges

asked Jun 3 at 9:42

Johan

77.4k28 gold badges205 silver badges346 bronze badges

Having thought about it a little, from an optimization point of view, the overhead between two __global__ functions is about 25ish microseconds (if I remember correctly), so I guess that leaves plenty of time to shuttle the shared buffers to global memory in A and back again in B; but still, curious.

Johan
– Johan

2025-06-03 09:52:49 +00:00
Commented Jun 3 at 9:52
3

Completely undefined behaviour. The one word answer is no

talonmies
– talonmies

2025-06-03 09:56:17 +00:00
Commented Jun 3 at 9:56
3

In between your kernels the driver could schedule another workload from another stream or even another user/application depending on context. That other kernel could overwrite the data in shared memory.

paleonix
– paleonix

2025-06-03 12:45:56 +00:00
Commented Jun 3 at 12:45
Shared memory (SHM) stored SM cache. If reusing it would be possibly, it would only be beneficial in very very restricted cases and the performance improvement would be small anyway because SHM data can be stored in the pretty-fast L2 cache and possibly reloaded from the cache in the second kernel (not guaranteed though). Even a reload from the VRAM is not that expensive considering the tiny amount of SHM. For example, on a Nv 1660S, there is only 1.4 MiB of SHM so it should takes about 4 µs to write/read data from/to it. Certainly <2 µs for the L2 cache. Starting a kernel takes more than that

Jérôme Richard
– Jérôme Richard

2025-06-03 21:26:31 +00:00
Commented Jun 3 at 21:26
1

Also, if you have far more blocks than what can run concurrently in the available SM, then the blocks of the first kernels would override data of other blocks of the same kernel and since the order of the execution of blocks is unspecified/undocumented, reusing SHM make no sense in this case. This only makes sense if you can guarantee that all blocks can run concurrently, the second kernel is executed just after, and also have the same mapping (so with less concurrent blocks than the maximum amount possible).

Jérôme Richard
– Jérôme Richard

2025-06-03 21:30:17 +00:00
Commented Jun 3 at 21:30

| Show 1 more comment

0 Your Answer

Sign up or log in

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.

Collectives™ on Stack Overflow

Reusing shared data between global functions

0

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

0

Know someone who can answer? Share a link to this question via email, Twitter, or Facebook.

Your Answer

Sign up or log in

Post as a guest