Newest 'pytorch' Questions

Advice

0 votes

1 replies

69 views

Book Recommendation in PyTorch

I am looking to find a book on PyTorch that is suitable for beginners, Ive used sklearn in the past for ML its a simple workflow for me prepare the X and Y data, fit/train a model, and make ...

Kev

1

asked yesterday

0 votes

0 answers

15 views

DLL initialization routine failed on package import

I have tried loading PyTorch in iPython but get a DLL initialization error. In a normal python console it works fine, as below. Windows 10, Miniconda installation. Can anyone advise how I need to ...

PGajjar

1

asked Apr 10 at 14:46

0 votes

1 answer

54 views

How to make a 2 Label Confusion Matrix and exporting into a json file?

I have to train a convolutional neural network on a dataset. The NN itself works and does what it's supposed to but now I want to make a confusion matrix and export it into a json file for further ...

traq

1

asked Apr 8 at 15:55

0 votes

0 answers

46 views

Sign Language with PyTorch GRU [closed]

I'm currently training a GRU model on American Sign Language (ASL) using a Kaggle dataset, while tweaking the parameters I achieved a peak accuracy of 44.7% on training and 28.2% on testing, that is ...

Bakr MARHFOUL

33

asked Apr 5 at 6:18

Advice

1 vote

1 replies

34 views

Reproducibility Hugging Face Transformer models

If I'm using any transformer model loaded from the Hugging Face Hub with Python, is it somehow possible to reproduce all the seeds, that have been used for the model training/fine-tuning? Seeds/...

jhn

5

asked Apr 2 at 10:24

Advice

0 votes

1 replies

31 views

Is the torch.fx traced graph topologically sorted?

Dependency layer: The layers whose outputs are passed to the current layer. Basically, The current layer is dependent on the outputs of the dependency layers. For a project, I need to know if the ...

Mentor_sensei

1

asked Mar 30 at 5:41

0 votes

1 answer

48 views

How to use NeuralForecast and PyTorch Lightning on Intel GPU (XPU / torch.xpu)?

PyTorch supports Intel GPU through torch.xpu, but PyTorch Lightning does not currently have built-in XPU accelerator support. Because NeuralForecast uses Lightning under the hood, that also blocks ...

Marek Ozana

196

asked Mar 29 at 16:13

Best practices

1 vote

0 replies

26 views

How do you effectively reuse helper functions and training pipelines across multiple PyTorch projects?

I’ve been working on multiple machine learning projects using PyTorch, and I keep running into the same issue: a lot of code ends up being repeated across projects. This includes things like: ...

AIMLSE support

1

asked Mar 28 at 16:55

0 votes

0 answers

35 views

Torch c++ binding cordump with torch c++ extension binding

I found torch-text is archived, and I still want to use it ,because there is a course that uses it. but Since it was archived, I always meet signture missing problem, so I want to fork and fix it for ...

SteinGate

31

asked Mar 26 at 7:12

0 votes

0 answers

58 views

XTTS v2 produces hallucinations when running multiple inferences sequentially, but works fine individually

I'm using XTTS v2 fine-tuned for Vietnamese (vnTTS). Problem: - Running inference on a single sentence → perfect output - Running inference on multiple sentences in a loop → weird sounds/...

Duy Đỗ Đình

1

asked Mar 19 at 10:42

-1 votes

0 answers

98 views

Converting .h5 model weights (no architecture) to .pth

I have an .h5 file that contains only model weights, not the model architecture. I want to use these weights in a PyTorch model and convert them into a .pth file. Some context: The .h5 file does not ...

Saim Mahmood

1

asked Mar 19 at 6:01

3 votes

0 answers

36 views

How to convert the MLP in MoE to 4 bit quantization?

I'm doing some research about the information encoding with LLMs and need to find a way to quantize the weights of the MLP layers(MoE) to 4 bits and even customized mixed precision. Consider from ...

ShoutOutAndCalculate

623

asked Mar 18 at 14:38

-2 votes

0 answers

133 views

ONNXRuntimeError: CUDA error: cudaErrorNoKernelImageForDevice:no kernel image is available for execution on the device

I'm trying to run model on GPU: clf2 = PunctCapSegModelONNX.from_pretrained( "1-800-BAD-CODE/xlm-roberta_punctuation_fullstop_truecase", ort_provider=["CUDAExecutionProvider&...

xæliudzyh

1

asked Mar 18 at 10:36

Advice

0 votes

4 replies

86 views

Converting XLSX File to a FASTA format in Python

Need to extract data (Peptide_Sequence) from XLSX output file to a FASTA file. I'm using pandas. A FASTA file is a standard plain-text format in bioinformatics, used to store nucleotide or amino acid ...

MomoLtx

1

asked Mar 16 at 8:36

0 votes

1 answer

60 views

Is there a way to prioritise --index-url but still look at other places in pip install / download?

I want to setup to use only CPU (saves e.g. space) and for a package it's described in e.g. How to install torch without nvidia? pip install p --index-url https://download.pytorch.org/whl/cpu. Now I ...

Martian2020

492

asked Mar 16 at 3:44

3 votes

1 answer

82 views

How to robustly intercept PyTorch GPU OOM in a Python subprocess and dynamically adjust batch_size within an autonomous AI Agent loop?

I am building an autonomous AI Agent (managing training workflows) that automatically generates PyTorch/OpenMMLab training scripts and executes them in a background subprocess. One of the common ...

user32496676

31

asked Mar 14 at 16:54

Best practices

0 votes

2 replies

50 views

Can half-precision fp16 can transfer hyperparameter that tuned to fp32 directly or not?

I use google colab to train my model I have trained the model on fp32 and use random grid to search the hyperparameter. the training phase is slow; it takes around 3.24it/s. I want to ask can I use ...

Suebpong Pruttipattanapong

1

asked Mar 13 at 6:18

Best practices

5 votes

5 replies

132 views

Optimizing a Gaussian penalty function for HSL color compatibility in PyTorch/NumPy

I am currently developing an AI-driven fashion recommendation system, specifically focusing on a "Multi-modal Context-Aware Decision Model." A critical component of my recommendation engine ...

王振宇

1

asked Mar 11 at 16:11

Advice

0 votes

1 replies

58 views

can V-JEPA be used to detect audience engagement during a seminar from live video

I am experimenting with the V-JEPA model developed by Meta for video understanding. My goal is to analyze a live video stream of people attending a seminar and determine their engagement level (for ...

Harshitha Gangu

1

asked Mar 6 at 7:28

0 votes

0 answers

65 views

Sentence Transformer Stuck at Loading (Google Cloud Instance)

I use this code to load sentence transformer in a GCP VM instance (no GPU). This is a dask plugin used on dask worker.: class NLPSetup(WorkerPlugin): def __init__(self, bucket_uri): self....

cuneyttyler

1,414

asked Feb 28 at 11:04

Advice

2 votes

0 replies

141 views

Is clothing-invariant person recognition possible using still images only?

I am working on a person recognition system for learning purposes. My goal is: Maintain a small gallery of known people (multiple images per person) Given a new query image, return the most similar ...

Shanthini M

315

asked Feb 26 at 11:10

3 votes

0 answers

98 views

Implemented PPO algorithm fails to train

I wrote a PPO-based reinforcement learning code for the Gymnasium CarRacing-v3 environment. (The code was generated with the help of Gemini) However, even after 200,000 frames, the training does not ...

Rai Madu

39

asked Feb 22 at 3:31

0 votes

0 answers

58 views

Problem of freeze metrics after first epoch

I encountered a problem with metrics fading after the first training epoch. During the first epoch, the model training proceeds normally. The loss metrics decrease, and the accuracy increases. The ...

CumMunist

9

asked Feb 21 at 0:33

4 votes

2 answers

153 views

ModelCheckpoint not saving last validating checkpoint when save_last=True

I am using pytorch lightning to train my model, here I use the lightning callback ModelCheckpoint, with the following settings: ModelCheckpoint( dirpath="path/to/dir", monitor="...

JacobM

41

asked Feb 19 at 14:50

5 votes

0 answers

164 views

CUDA error: CUBLAS_STATUS_INVALID_VALUE in cublasGemmEx() with PyTorch, fp16=False

I am using an RTX 3060 (12GB VRAM) and implementing a RAG pipeline with the BGE-M3 embedding model. Initially, I installed PyTorch with the CUDA 12.8 wheel (my NVIDIA driver supports CUDA 12.9). ...

Sujith A

51

asked Feb 17 at 17:10

2 votes

1 answer

67 views

PyTorch ValueError: optimizer got an empty parameter list when building a Logistic Regression Model

I tried making a logistic regression model using nn.Module class LogisticRegressionModel(nn.Module): def __init__(self, input_dim= None) -> None: super().__init__() if input_dim ...

Bakr MARHFOUL

33

asked Feb 16 at 3:46

Advice

1 vote

3 replies

90 views

What does it mean that Pytorch's torch.mul is "unbound"?

Running help(torch.Tensor.mul) gives: Help on method_descriptor: mul(...) unbound torch._C.TensorBase method mul(value) -> Tensor See :func:`torch.mul`. What does unbound mean in this ...

escapecharacter

985

asked Feb 15 at 18:44

3 votes

2 answers

124 views

why can't I pass input and target tensors directly to nn.CrossEntropyLoss?

on Python 3.13, torch 2.10.0+cu130 import torch loss = nn.CrossEntropyLoss() loss(torch.tensor((.1, .2)), torch.tensor((.3, .4))) returns - tensor(0.4811) but why does nn.CrossEntropyLoss(torch....

user2309803

723

asked Feb 13 at 11:26

1 vote

0 answers

70 views

Why does BatchNorm1d fail with batch size 1 in training mode?

I am training a small PyTorch model and want to use nn.BatchNorm1d. When the batch size is 1 and the model is in training mode, I get the error below; ValueError: Expected more than 1 value per ...

Linda

37

asked Feb 11 at 22:05

0 votes

0 answers

45 views

Trained and loaded CycleGAN model is giving distorted output images

I trained a CycleGAN model on Google Colab using this repository - https://github.com/junyanz/pytorch-CycleGAN-and-pix2pix The model should enhance dark images. I tested the model on my test dataset ...

rupertpurple

11

asked Feb 9 at 20:34

1 vote

1 answer

69 views

GluonTS DeepAREstimator fails to load checkpoint in PyTorch 2.6

I am currently working on a project where I have to use GluonTS (the DeepAREstimator and DLinearEstimator). At the beginning it worked well. But now, even when I use the example code from the GluonTS ...

peter mafai

121

asked Feb 7 at 21:08

2 votes

2 answers

60 views

"from torch_geometric.data import Data" throwing an error

If I run a py module with only these imports (no additional code) it works fine and the output is Process finished with exit code 0: import torch.utils.data from torch.utils.data.dataloader import ...

Takewood

53

asked Feb 7 at 20:25

0 votes

0 answers

32 views

post training quantized model gets the error "Copying from quantized Tensor to non-quantized Tensor is not allowed" even though I'm not copying tensor

I got a pretrained resnet 18 model from this lane detection repo in order to use it as an ADAS(advanced driver assistance systems) function for an electric car making competition. My current goal is ...

Ekim

3

asked Feb 6 at 14:02

0 votes

0 answers

37 views

How to properly handle LSTM states during training with SliceSampler in TorchRL?

I am implementing a Reinforcement Learning environment using torchrl where the agent uses an LSTM-based policy. My goal is to train the agent on sequences sampled from a replay buffer. While I have ...

wittn

318

asked Feb 5 at 18:43

0 votes

0 answers

46 views

Why does .view() fail after permuting dimensions for a GRU?

I'm trying to train a character-level GRU on Linux kernel source but the training loop keeps crashing with this error: RuntimeError: view size is not compatible with input tensor's size and stride (...

user32309338

asked Feb 3 at 21:03

1 vote

0 answers

72 views

PyTorch and NVIdia Flare is taking all computing resource on machine learning experiments

I am utilizing PyTorch for federated experiments. As my experiments involves 50 datasets with models, so, I have to run multiple ML models experiments parallelly. The code for training ML model is ...

coderoid

2,400

asked Feb 3 at 6:32

0 votes

0 answers

40 views

torch dataloader next-method when using multiple workers

I have a Dataset that is based on IterableDataSet, looking like that class MyDataSet(torch.utils.data.IterableDataset): def __init__(self): # doing init stuff here def __iter__(self): ...

RaJa

1,597

asked Jan 29 at 6:06

Advice

2 votes

1 replies

55 views

Why do we use requires_grad=True in the input here?

# Example of target with class indices loss = nn.CrossEntropyLoss() input = torch.randn(3, 5, requires_grad=True) <=============== WHY ? target = torch.empty(3, dtype=torch.long).random_(5) output =...

VISHMA PRATIM DAS

21

asked Jan 28 at 19:03

6 votes

0 answers

136 views

Docker load fails with wrong diff id calculated on extraction for large CUDA/PyTorch image (Ubuntu 22.04 + CUDA 12.8 + PyTorch 2.8)

About I am trying to create a Docker image with the same Dockerfile with Python 3.10, CUDA 12.8, and PyTorch 2.8 that is portable between two machines: Local Machine: NVIDIA RTX 5070 (Blackwell ...

requiemman

311

asked Jan 26 at 1:59

0 votes

0 answers

244 views

Memory access fault by GPU node-1 (Agent handle: 0x26f5dbf0) on address 0x7749d0333000. Reason: Write access to a read-only page

I am currently on a project to segment 3D-LSM images using self-supervised model and i have been trying to perform a dryrun(testing pre-training) on the AMD GPU droplet on digitalocean. the configs of ...

Manav Patel

1

asked Jan 25 at 7:31

1 vote

1 answer

57 views

Why does PyTorch GPU matmul give correct results without torch.cuda.synchronize()?

I'm learning GPU programming with PyTorch and I'm confused about when torch.cuda.synchronize() is actually necessary. I have this code that compares CPU and GPU matrix multiplication: import torch ...

nz_21

7,841

asked Jan 24 at 17:34

1 vote

1 answer

101 views

torch.matmul(S, v) where S is symmetric and v is a vector: how to speed up computations?

Let S be a nxn symmetric matrix and v a n 1-dimensional vector. We need to compute inside a pytorch loss function the vector (S x v) in an efficient manner. Do you know if there is a way to keep ...

Filippo Portera

85

asked Jan 20 at 17:25

0 votes

0 answers

45 views

Proper utilization of sliding window inferer

I am training an Encoder-Decoder network to reconstruct brain CT images. Due to OOM (Out of Memory) errors with full-sized images, I implemented a sliding window approach for training and inference. ...

kKodorna

3

asked Jan 15 at 23:35

2 votes

1 answer

76 views

PyTorch: trying to create a joint dataset with different transforms results in both datasets having same transform

I'm very new to PyTorch and am attempting to create a dataset for which a given sample has both unmasked and masked data associated with it, or in other words, the first piece of data is just the ...

user1799323

701

asked Jan 15 at 3:17

3 votes

1 answer

68 views

How can the backward function in tensor influence the matrix in model

class SoftmaxRegission(torch.nn.Module): linear: torch.nn.Linear def __init__(self, num_features: int, num_classes: int): super(SoftmaxRegission, self).__init__() self.linear =...

SteinGate

31

asked Jan 12 at 12:30

0 votes

0 answers

46 views

Parameter count difference between UNETR paper and MONAI implementation

I am comparing many deep learning models to each other, including UNETR, on the BTCV dataset and noticed a discrepancy in the reported number of parameters. In their paper titled "UNETR: ...

Ahmed

163

asked Jan 12 at 10:27

0 votes

0 answers

71 views

In PyTorch how do I perform bf16 or f16 matmul with accumulation in f32 explicitly?

I need fast compute but the resulting sums need higher precision for downstream tasks just for one specific op out of many in the model. torch.bmm has an out_dtype parameter but the documentation does ...

LOST

3,384

asked Jan 9 at 18:00

0 votes

2 answers

137 views

Difference between torch.nn.Module and torch.Tensor?

I’m learning PyTorch and I see two common classes: torch.Tensor and torch.nn.Module. I’m a bit confused about their differences and when to use each. Here’s what I understand so far: torch.Tensor ...

Kokala Sai Teja

1

asked Jan 8 at 20:54

0 votes

0 answers

91 views

Onnx cannot be read with Microsoft.ML on Windows 10 (19045.5854)

Everything I describe here works perfectly fine on my computer that has Windows 11 (Version 10.0.26200). However, on a computer that has Windows 10 (10.0.19045), it does not work. This is a client's ...

FluidMechanics Potential Flows

1,124

asked Jan 8 at 10:46

0 votes

0 answers

57 views

Issue with converting mobile_sam.pt to onnx format (decode part)

I want to use the mobile_sam.pt model in web browser, so I need the onnx format of that. I tried these method, but always getting the same error. segment-anything - samexporter The error above ...

Erfan Riahi

45

asked Jan 5 at 11:51

Collectives™ on Stack Overflow