Latest CUDA Programming and Performance topics

Topic	Replies	Views	Activity
How to report a bug	2	19425	May 27, 2024
Reproducible GPU Validation: 95%+ Utilization on H100 with Ecosystem Compatibility cuda , cutensor , cuquantum	0	13	December 19, 2025
GPUDirect RDMA Bandwidth Bottleneck (~38Gbps) on ASUS WS X299 SAGE/10G with Tesla T4 + BlueField-2	3	20	December 19, 2025
Specifying L2 cache partition for SM	2	136	December 19, 2025
Looking for advice for CUDA performance tracking in CI/CD pipelines cuda	3	40	December 17, 2025
Pinned memory throughput significantly lower on Ubuntu than on Windows	23	226	December 17, 2025
CUDA Green Context API \| Memory Footprint cuda , driver	2	61	December 17, 2025
Double4 is deprecated, but the preferred double4_32a is unrecognized?	6	38	December 16, 2025
How to sync Cuda and Vulkan?	2	29	December 16, 2025
Nvcc, syntax error in cuda.h(7451): error: expected a ")" gtc	3	53	December 16, 2025
Wmma vs Wgmma On H100 GPU cublas	4	40	December 15, 2025
Thrust device allocator vs std allocator	3	41	December 15, 2025
Architectural insights needed: Why is the MIG 3g.71gb instance consistently the "Efficiency Sweet Spot" on H200? llama	4	74	December 15, 2025
Weekend project: Very accurate double-precision sincos() implementation for a restricted domain	0	27	December 14, 2025
Pixel Shader vs NPP - Which is faster for batch processing NV12 to RGB conversions and display directly to screen? npp	5	68	December 14, 2025
Register usage spike in SASS with divison slow/full path cuda	13	205	December 12, 2025
Question about the cacheConfig value in nsight systems nsight	6	55	December 12, 2025
Is the CUDA tile kernel submitted to GPU still using the cuLaunchKernel?	2	54	December 12, 2025
Unexpected results on cub::DeviceRadixSort::SortKeys and SortPairs with 128 bit keys	5	22	December 12, 2025
How many tensor cores to execute the wmma.mma.sync.aligned.{alayout}.{blayout}.m16n16k16 instruction？ cuda	23	163	December 12, 2025
__frsqrt_rn is not accurate 0.5ulp? I found a number cuda , gpu-computing	4	44	December 10, 2025
FFMA with Uniform register	3	75	December 9, 2025
Is it possible having compressible memory & memory pools over the same array on device? cuda	0	29	December 9, 2025
cudaMemcpyBatchAsync cannot aggregate D2D copy operations	13	118	December 9, 2025
Training YOLO in the background cuda , yolo , python	1	48	December 8, 2025
Deadlock when using cuStreamWaitValue32/cuStreamWriteValue32 for async cross-stream ordering cuda	8	51	December 8, 2025
Implementing clang-tidy checks for CUDA C++ Guidelines for Safety Critical Programming	3	46	December 8, 2025
Question about CTA/warp lifecycle	4	49	December 8, 2025
Help needed to execute tcgen05.mma_cta_group::2 instructions cuda , kernel	0	38	December 7, 2025
Which offers lower latency for NV12 to RGB conversion, NPP or CV-CUDA? npp	1	36	December 5, 2025