Skip to main content
Filter by
Sorted by
Tagged with
-5 votes
0 answers
33 views

I have some recurring issues with Kali (last kernel) and my nvidia graphic card. For example, the header of the windows (of all applications) is not displayed. I can't close the windows (only with the ...
James's user avatar
  • 1,471
0 votes
0 answers
17 views

I am currently implementing a deepstream pipeline in python with a tee split as follows: streammux -> tee -> queue1 -> detector1 -> tracker1 -> queue2 -> detector2 -> tracker2 The ...
user32306963's user avatar
1 vote
1 answer
211 views

I am trying to set up a docker container using the nvidia container toolkit on a remote server, so that I can run cuda programs developed with the Futhark Programming Language - however, the issue ...
Artemijo5's user avatar
0 votes
2 answers
62 views

I have a problem with pycuda. I used it for a python script i develop. I know this script work because i use it on other server. But on a specific server i got a problem : >>> import pycuda....
Julien's user avatar
  • 1
-5 votes
1 answer
73 views

I tested the performance of LAMMPS with DeepMD-kit for MD simulations on an HPC cluster. The job was allocated 8 CPUs, 64 GB of RAM, and one A100 GPU. I observed that when running with mpirun -np 1 ...
link89's user avatar
  • 2,017
Tooling
0 votes
1 replies
84 views

I’m trying to simulate multiple camera streams feeding into a YOLOv8l model on a single GPU and monitor real-time hardware utilization. My setup: Single GPU (48GB VRAM, CUDA-enabled) YOLOv8l model ...
Madesh Prasad's user avatar
2 votes
1 answer
874 views

I am trying to install JAX with GPU support on a powerful, dedicated Linux server, but I am stuck in what feels like a Catch-22 where every official installation method fails in a different way, ...
PowerPoint Trenton's user avatar
4 votes
1 answer
191 views

I am trying to run basic CUDA program in google colab but its not giving kernel output. Below are the steps what I tried: Changed run type to T4 GPU. !pip install nvcc4jupyter %load_ext ...
Digvijay Singh Thakur's user avatar
1 vote
1 answer
75 views

I installed NVIDIA Nsight Visual Studio Edition 2025.01 in Visual Studio 2022. I want to debug code, but I can't debug with step over(F10), The debugger always stops at a location without a breakpoint....
Imagination Youth's user avatar
1 vote
1 answer
601 views

I’ve been trying to get TensorFlow to use my GPU on Windows, and even though everything seems installed correctly, it shows 0 available GPUs. System setup Windows 11 RTX 3050 Laptop GPU NVIDIA driver ...
Houssem Eddine's user avatar
1 vote
0 answers
310 views

I’m using the PyTorch profiler to analyze sglang, and I noticed that in the CUDA timeline, some kernels show “Command Buffer Full”. This causes the cudaLaunchKernel time to become very long, as shown ...
plznobug's user avatar
  • 123
2 votes
0 answers
420 views

I am using WSL2 on windows 10. I have NVIDIA graphics card. I recently installed GPU jax using the command pip install -U "jax[cuda12]". This completed successfully, but when I run any jax ...
DrMittal's user avatar
2 votes
1 answer
228 views

I’m trying to launch a captured CUDA Graph from inside a regular CUDA kernel (i.e., device-side graph launch). From the NVIDIA blog on device graph launch, it seems this should be supported on newer ...
Mohammad Siavashi's user avatar
0 votes
1 answer
110 views

I am trying to implement producer consumer problem in GPU-CPU. Required for some other project. GPU requests some data via Unified memory to CPU. CPU copies that data to a specific location in global ...
Chinmaya Bhat K K's user avatar
3 votes
1 answer
127 views

Problem Statement My iSLAM system works correctly with the original PyTorch PWC-Net but produces catastrophic trajectory errors (2.4km ATE RMSE) when I replace it with a TensorRT-converted version. ...
Unknown's user avatar
  • 705
0 votes
0 answers
155 views

I'm converting a PWC-Net optical flow model to run on Jetson NX DLA using the iSLAM framework, but the TensorRT engine build fails during DLA optimization. Environment Hardware: NVIDIA Jetson NX ...
Unknown's user avatar
  • 705
0 votes
0 answers
76 views

I am attempting to write my own holoscan::Operator for creating some images that should be displayed as a short video using a holoscan::ops::HolovizOp. So I compose()-d an application flow: add_flow(...
Markus-Hermann's user avatar
0 votes
1 answer
216 views

I want to quantitatively measure the memory bandwidth utilization and SM utilization of a CUDA program for performance analysis and regression testing. My approach so far: Compute the theoretical ...
plznobug's user avatar
  • 123
2 votes
1 answer
165 views

I am trying to get the holoscan example "bring your own model" https://docs.nvidia.com/holoscan/sdk-user-guide/examples/byom.html to run, translating it from Python into CPP. One necessary ...
Markus-Hermann's user avatar
1 vote
1 answer
485 views

I am writing PTX assembly code on CUDA C++ for research. This is my setup: I have just downloaded the latest CUDA C++ toolkit (13.0) yesterday on WSL linux. The local compilation environment does not ...
Junhao Liu's user avatar
1 vote
1 answer
115 views

Without getting into too much detail, the project I'm working on needs three different phases, each corresponding to a different kernel. I only know the number of threads needed in the second phase ...
StefanoTrv's user avatar
2 votes
1 answer
77 views

I am trying to pass a float4 as argument to my cuda kernel (by value) using PyCUDA’s make_float4(). But there seems to be some misalignment when the data is transferred to the kernel. If I read the ...
Dodilei's user avatar
  • 308
1 vote
0 answers
49 views

I'm using the OpenCL clBuildProgram() API function on a program created from a source string. The source is: kernel void foo(int val, write_only pipe int outPipe) { write_pipe(outPipe, &val); }...
einpoklum's user avatar
  • 138k
2 votes
0 answers
41 views

https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#warp-shuffle-functions https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#data-movement-and-conversion-instructions-shfl-...
Tom Huntington's user avatar
1 vote
1 answer
121 views

I am implementing PeopleNet onnx model at my cpp application. The following are preprocessing and postprocessing functions. void preprocessGpuBatch(const std::vector<cv::cuda::GpuMat>& ...
batuman's user avatar
  • 7,346
1 vote
1 answer
235 views

My code: from transformers import AutoTokenizer, AutoModel model_name = "NVIDIA/nv-embed-v2" tokenizer = AutoTokenizer.from_pretrained(model_name) model = AutoModel.from_pretrained(...
6zL's user avatar
  • 21
0 votes
1 answer
476 views

I'm confused what exactly is handled by CuTe and by Cutlass. From my understanding Cutlass handles the following: Gemm computation of CuTe Tensors Communication between CPU and GPU Abstract memory ...
jonithani123's user avatar
0 votes
0 answers
51 views

I have a 1 gpu machine with this configuration: This is my slurm.conf: NodeName=TechdivAISLURM CPUs=4 Boards=1 SocketsPerBoard=1 CoresPerSocket=4 ThreadsPerCore=1 RealMemory=32096 Gres=gpu:1 State=...
RobM's user avatar
  • 835
-4 votes
1 answer
53 views

When I was analyzing a large project, there were many kernel files. I wanted to find a specific kernel in the file obtained from nsys analysis. How should I operate
rongtao zhou's user avatar
0 votes
1 answer
287 views

I want to use nsys to profile a sever,how can I use nsys when the server is running,should I use nsys launch or nsys profile to restart the sever?Is there any way for me not to restart the service?
rongtao zhou's user avatar
2 votes
0 answers
87 views

Is there an officially sanctioned way to reuse shared data between global functions? Consider the following code https://cuda.godbolt.org/z/KMj9EKKbf: #include <cuda.h> #include <stdio.h> ...
Johan's user avatar
  • 77.5k
1 vote
0 answers
44 views

Environment: Hardware: BlueField-2, model MBF2H516A-CEEOT OS: Linux version 5.15.0-1060-bluefield (buildd@bos03-arm64-114) DOCA SDK: 2.10.0087 Description: I'm trying to run the doca_switch sample ...
user24906747's user avatar
-4 votes
1 answer
347 views

In a talk on The C++ Execution Model, from the cppunderthesea 2024 conference, at around 44:50, NVIDIA's Bryce Adelstein Lelbach claims, that non-NVIDIA GPUs give no guarantee of threads progressing (&...
einpoklum's user avatar
  • 138k
1 vote
1 answer
125 views

With the following test example, the output matrix doesn't give the desired output or maybe I'm misunderstanding certain parameters: #include <cstdio> #include <cublas_v2.h> #include <...
A. K.'s user avatar
  • 39.7k
0 votes
0 answers
319 views

I am trying to run a docker compose and failing. I have here a minimal reproducible example. First with this docker-compose.yml services: hello-app: image: python:3.10-slim command: python ...
KansaiRobot's user avatar
  • 10.6k
0 votes
0 answers
68 views

I'm working on a GPU-accelerated 2D convolution in OpenCL for a 2048x2048 image using a 3x3 Sobel filter. I implemented two versions of the kernel: A naive version that uses only global memory. An ...
Mxneeb's user avatar
  • 19
0 votes
0 answers
110 views

I'm encountering an issue while developing a DPDK-based program using a dual-port ConnectX-6 NIC on Ubuntu 24.04. Despite following the setup instructions, my program fails to detect the NIC ports. ...
Mohammad P's user avatar
1 vote
0 answers
47 views

We currently have a trained ResnetV250 image classification model that performs as expected on CPU and with GPU support on Turing based cards with CUDA 10.1 and cudnn 7.6.4. When transferring this to ...
Ross Halliday's user avatar
2 votes
2 answers
99 views

I am writing some code in C in which I want to add the optional ability to have certain sections of the code accelerated using OpenMP, and with an additional optional ability to have them accelerated ...
Matthew G.'s user avatar
0 votes
0 answers
16 views

I'm working on the NVIDIA VSS project as detailed in the official documentation: NVIDIA VSS Run Guide. I'm deploying the service on a MicroK8s cluster. However, one of the pods—named similar to vss-...
Cody Hubman's user avatar
0 votes
1 answer
80 views

So im trying to use tensorflow with my yolov8 project but for some reason it is not recognizing my gpu. I had originally installed it using pip but i was told i should use conda instead, so i switched ...
James Pelham-Burn's user avatar
2 votes
0 answers
47 views

In the last example of Mark Harris' webinar I don't understand the indexing before the parallel reduction part. In "Reduction #6" the gridSize/number of dispatches was ceil[N (the size of ...
Michael Bay's user avatar
0 votes
0 answers
149 views

Environment: OS: Windows Operating System TensorRT Version: TensorRT-10.3.0.26 NVIDIA CUDA Version: 12.6 cuDNN Version: 9.8 GPU: RTX 3050ti laptop GPU Issue Description: I am encountering an "...
B.Uluer's user avatar
  • 11
0 votes
0 answers
121 views

I used the following commands to convert an ONNX model to a TRT engine, where the input.onnx file is the original model: polygraphy surgeon sanitize --fold-constants ./input.onnx -o output.onnx ...
simonzgx's user avatar
1 vote
0 answers
71 views

I have a laptop with an integrated Intel graphics card and an NVIDIA T1000 graphics card. I set the NVIDIA card as the preferred graphic processor in the Managed 3D in NVIDIA Control Panel. However, ...
Martin121233's user avatar
2 votes
1 answer
356 views

I've been trying to train a language model (text classification), our lab has two GPUs, a 4090 and a 3090. However, I encountered a perplexing phenomenon during training: the model's performance ...
tong's user avatar
  • 21
2 votes
0 answers
353 views

I am using Ubuntu 22.04. I have nvidia-570 driver installed along with cuda 12.4 on my host machine. However, I am not able to access gpu in my container. This is my docker-compose-file version: '3.8' ...
prarthana sigedar's user avatar
0 votes
1 answer
112 views

Hello even though I am inside the TAO toolkit tensorflow I am still having the issue of decrypting a model with tao docker run -it --rm -v /home/models:/workspace/model_dir -p 8888:8888 --runtime=...
Novice's user avatar
  • 84
1 vote
0 answers
154 views

I'm trying to build a docker for realtime-whiper. The build process finishes successflly but at the end it gives this error: Error response from daemon: could not select device driver "nvidia&...
Ali Zekai Deveci's user avatar
2 votes
1 answer
122 views

I have a couple of pre-trained and tested TensorFlow LSTM models, which have been trained on Google Colab. I want to deploy these models with AWS as our entire application is deployed there. I've ...
Manu Sisko's user avatar

1
2 3 4 5
75