If I am generating 0-12 triangles in a compute shader, is there a way I can stream them to a buffer that will then be used for rendering to screen?
My current strategy is:
- create a buffer of
float3of sizethreads * 12, so can store the maximum possible number of triangles; - write to the buffer using an index that depends on the thread position in the grid, so there are no race conditions.
If I want to render from this though, I would need to skip the empty memory. It sounds ugly, but probably there is no other way currently. I know CUDA geometry shaders can have variable length output, but I wonder if/how games on iOS can generate variable-length data on GPU.
UPDATE 1:
As soon as I wrote the question, I thought about the possibility of using a second buffer that would point out how many triangles are available for each block. The vertex shader would then process all vertices of all triangles of that block.
This will not solve the problem of the unused memory though and as I have a big number of threads, the total memory wasted would be considerable.