Skip to content

bugs in EmbeddingBag cuda codes #11847

@marmorb

Description

@marmorb

Issue description

I can run this code normally through CPU. but got the Error RuntimeError: cuda runtime error (9) : invalid configuration argument at /data/users/mabing/pytorch/aten/src/ATen/native/cuda/EmbeddingBag.cu:257 when run loss.backward(). through cuda().

Code example

import torch.optim as optim
import torch
import torch.nn as nn
import numpy as np
from scipy.special import expit
import os
import time

class SkipGramModel(nn.Module):
    def __init__(self, component_size, word_size, dim):
        super(SkipGramModel, self).__init__()
        self.emb_size = dim
        self.component_size = component_size
        self.word_size = word_size
        self.atten_layers = nn.Embedding(word_size,1)
        self.u_embeddings = nn.EmbeddingBag(component_size,dim)
        self.word_embeddings = nn.Embedding(word_size,dim,sparse=True)
        self.v_embeddings = nn.Embedding(word_size,dim,sparse=True)
        self.m = nn.Sigmoid()
        self.init_emb()

    def init_emb(self):
        initrange = 0.5 / self.emb_size
        self.word_embeddings.weight.data.uniform_(-initrange,initrange)
        self.u_embeddings.weight.data.uniform_(-initrange, initrange)
        self.v_embeddings.weight.data.uniform_(-0, 0)
        atten = torch.zeros([self.word_size, 5])
        atten[:, 0] += torch.log(torch.FloatTensor([4]))
        self.atten_layers.weight.data = atten


    def forward(self, word_in,component_in, word_out, offset):
        char_in = torch.cuda.LongTensor(component_in[0])
        redical_in = torch.cuda.LongTensor(component_in[1])
        com1_in = torch.cuda.LongTensor(component_in[2])
        com2_in = torch.cuda.LongTensor(component_in[3])
        offset1 = torch.cuda.LongTensor(offset[0])
        offset2 = torch.cuda.LongTensor(offset[1])
        offset3 = torch.cuda.LongTensor(offset[2])
        offset4 = torch.cuda.LongTensor(offset[3])
        attention = torch.softmax(self.atten_layers(word_in),dim=-1).unsqueeze(1)
        emb_uword = self.word_embeddings(word_in)
        emb_char = self.u_embeddings(char_in,offset1)
        emb_redical = self.u_embeddings(redical_in,offset2)
        emb_com1 = self.u_embeddings(com1_in,offset3)
        emb_com2 = self.u_embeddings(com2_in,offset4)
        emb_all = torch.stack((emb_uword,emb_char,emb_redical,emb_com1,emb_com2),1)
        emb_vword = self.v_embeddings(word_out)
        emb_mixin = torch.bmm(attention,emb_all).squeeze(1)
        score = torch.mul(emb_mixin, emb_vword)
        score = torch.sum(score, dim=-1)
        score = self.m(score)
        return score

if __name__ == '__main__':

    model = SkipGramModel(364, 180, 100).cuda()
    optimizer = optim.SGD(model.parameters(), lr=0.025)
    Lossfunc = nn.BCELoss(reduction='sum')
    for _ in range(100):
        word_in = torch.cuda.LongTensor([2]*128)
        word_out = torch.cuda.LongTensor([2]*128)
        label = torch.cuda.FloatTensor([1]*128)
        component_in = [[3,5],[2,4,5],[2,3,4],[]]
        offset = [[0]*127+[1],[0]*127+[1],[0]*128,[0]*128]
        outs = model.forward(word_in, component_in, word_out, offset)
        loss = Lossfunc(outs, label)
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

System Info

`Collecting environment information...
PyTorch version: 1.0.0a0+7f0dd24
Is debug build: No
CUDA used to build PyTorch: 8.0.61

OS: Ubuntu 16.04.5 LTS
GCC version: (Ubuntu 4.9.3-13ubuntu2) 4.9.3
CMake version: version 3.12.2

Python version: 3.6
Is CUDA available: Yes
CUDA runtime version: Could not collect
GPU models and configuration:
GPU 0: Tesla M40 24GB
GPU 1: Tesla M40 24GB
GPU 2: Tesla M40 24GB
GPU 3: Tesla M40 24GB

Nvidia driver version: 384.130
cuDNN version: Probably one of the following:
/usr/local/cuda-8.0/targets/x86_64-linux/lib/libcudnn.so
/usr/local/cuda-8.0/targets/x86_64-linux/lib/libcudnn.so.6
/usr/local/cuda-8.0/targets/x86_64-linux/lib/libcudnn.so.6.0.21
/usr/local/cuda-8.0/targets/x86_64-linux/lib/libcudnn.so.7
/usr/local/cuda-8.0/targets/x86_64-linux/lib/libcudnn.so.7.0.5
/usr/local/cuda-8.0/targets/x86_64-linux/lib/libcudnn_static.a
/usr/local/cuda-9.0/targets/x86_64-linux/lib/libcudnn.so
/usr/local/cuda-9.0/targets/x86_64-linux/lib/libcudnn.so.7
/usr/local/cuda-9.0/targets/x86_64-linux/lib/libcudnn.so.7.0.4
/usr/local/cuda-9.0/targets/x86_64-linux/lib/libcudnn.so.7.0.5
/usr/local/cuda-9.0/targets/x86_64-linux/lib/libcudnn_static.a

Versions of relevant libraries:
[pip] Could not collect
[conda] cuda80 1.0 0 soumith
[conda] cuda91 1.0 h4c16780_0 pytorch
[conda] pytorch 0.4.1 py36_cuda8.0.61_cudnn7.1.2_1 [cuda80] soumith
[conda] torch 1.0.0a0+7f0dd24
[conda] torch 0.4.1
[conda] torchvision 0.1.9 py36h7584368_1 soumith
[conda] torchvision 0.2.1 `

  • PyTorch or Caffe2: PyTorch
  • How you installed PyTorch (conda, pip, source): source
  • Build command you used (if compiling from source): python setup.py install
  • OS: Ubantu 16.04.5 LTS (Xenial Xerus)
  • PyTorch version:0.4.1
  • Python version:3.6.6
  • CUDA/cuDNN version: 8.0.61
  • GPU models and configuration:
  • GCC version (if compiling from source): 4.9.3
  • CMake version:3.12.2
  • Versions of any other relevant libraries:

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions