|
How to report a bug
|
|
2
|
19425
|
May 27, 2024
|
|
Reproducible GPU Validation: 95%+ Utilization on H100 with Ecosystem Compatibility
|
|
0
|
13
|
December 19, 2025
|
|
GPUDirect RDMA Bandwidth Bottleneck (~38Gbps) on ASUS WS X299 SAGE/10G with Tesla T4 + BlueField-2
|
|
3
|
20
|
December 19, 2025
|
|
Specifying L2 cache partition for SM
|
|
2
|
136
|
December 19, 2025
|
|
Looking for advice for CUDA performance tracking in CI/CD pipelines
|
|
3
|
40
|
December 17, 2025
|
|
Pinned memory throughput significantly lower on Ubuntu than on Windows
|
|
23
|
226
|
December 17, 2025
|
|
CUDA Green Context API | Memory Footprint
|
|
2
|
61
|
December 17, 2025
|
|
Double4 is deprecated, but the preferred double4_32a is unrecognized?
|
|
6
|
38
|
December 16, 2025
|
|
How to sync Cuda and Vulkan?
|
|
2
|
29
|
December 16, 2025
|
|
Nvcc, syntax error in cuda.h(7451): error: expected a ")"
|
|
3
|
53
|
December 16, 2025
|
|
Wmma vs Wgmma On H100 GPU
|
|
4
|
40
|
December 15, 2025
|
|
Thrust device allocator vs std allocator
|
|
3
|
41
|
December 15, 2025
|
|
Architectural insights needed: Why is the MIG 3g.71gb instance consistently the "Efficiency Sweet Spot" on H200?
|
|
4
|
74
|
December 15, 2025
|
|
Weekend project: Very accurate double-precision sincos() implementation for a restricted domain
|
|
0
|
27
|
December 14, 2025
|
|
Pixel Shader vs NPP - Which is faster for batch processing NV12 to RGB conversions and display directly to screen?
|
|
5
|
68
|
December 14, 2025
|
|
Register usage spike in SASS with divison slow/full path
|
|
13
|
205
|
December 12, 2025
|
|
Question about the cacheConfig value in nsight systems
|
|
6
|
55
|
December 12, 2025
|
|
Is the CUDA tile kernel submitted to GPU still using the cuLaunchKernel?
|
|
2
|
54
|
December 12, 2025
|
|
Unexpected results on cub::DeviceRadixSort::SortKeys and SortPairs with 128 bit keys
|
|
5
|
22
|
December 12, 2025
|
|
How many tensor cores to execute the wmma.mma.sync.aligned.{alayout}.{blayout}.m16n16k16 instruction?
|
|
23
|
163
|
December 12, 2025
|
|
__frsqrt_rn is not accurate 0.5ulp? I found a number
|
|
4
|
44
|
December 10, 2025
|
|
FFMA with Uniform register
|
|
3
|
75
|
December 9, 2025
|
|
Is it possible having compressible memory & memory pools over the same array on device?
|
|
0
|
29
|
December 9, 2025
|
|
cudaMemcpyBatchAsync cannot aggregate D2D copy operations
|
|
13
|
118
|
December 9, 2025
|
|
Training YOLO in the background
|
|
1
|
48
|
December 8, 2025
|
|
Deadlock when using cuStreamWaitValue32/cuStreamWriteValue32 for async cross-stream ordering
|
|
8
|
51
|
December 8, 2025
|
|
Implementing clang-tidy checks for CUDA C++ Guidelines for Safety Critical Programming
|
|
3
|
46
|
December 8, 2025
|
|
Question about CTA/warp lifecycle
|
|
4
|
49
|
December 8, 2025
|
|
Help needed to execute tcgen05.mma_cta_group::2 instructions
|
|
0
|
38
|
December 7, 2025
|
|
Which offers lower latency for NV12 to RGB conversion, NPP or CV-CUDA?
|
|
1
|
36
|
December 5, 2025
|