Skip to content

CUDNN_STATUS_EXECUTION_FAILED with RNN on GPU #5213

@diegoantognini

Description

@diegoantognini

Hi everyone,

I'm using the new torch.split function given a list of chunks as well as a LSTM/GRU network (both lead to the bug).

On CPU, the code works perfectly.
On GPU, If I do something else that a RNN forward during the iteration over torch.split, it's fine. otherwise it crashes.

StackTrace
...
File "/home/diego/Github/DocAgg/pygcn_modified/models.py", line 80, in forward
output, hidden = self.document_rnn(sentence_embeddings_per_doc, self.document_rnn_hidden)
File "/usr/local/lib/python3.5/dist-packages/torch/nn/modules/module.py", line 357, in call
result = self.forward(*input, **kwargs)
File "/usr/local/lib/python3.5/dist-packages/torch/nn/modules/rnn.py", line 181, in forward
output, hidden = func(input, self.all_weights, hx, batch_sizes)
File "/usr/local/lib/python3.5/dist-packages/torch/nn/_functions/rnn.py", line 315, in forward
return func(input, *fargs, **fkwargs)
File "/usr/local/lib/python3.5/dist-packages/torch/nn/_functions/rnn.py", line 284, in forward
Variable(dropout_desc.state) if dropout_desc.state is not None else None)
RuntimeError: CUDNN_STATUS_EXECUTION_FAILED

Code
sentence_hidden_embeddings is a FloatTensor [657, 700]
nb_sentences_per_doc is a python list :[26, 13, 12, 20, 25, 26, 535]

`

    all_sentence_embeddings_per_doc = torch.split(sentence_hidden_embeddings.unsqueeze(0), nb_sentences_per_doc, dim=1)[:-1]

    document_embeddings = []
    for sentence_embeddings_per_doc in all_sentence_embeddings_per_doc:
        self.document_rnn_hidden = self.init_hidden()
        output, hidden = self.document_rnn(sentence_embeddings_per_doc, self.document_rnn_hidden)

        # output[-1][-1] == hidden[-1][-1] (GRU) and output[-1][-1] == hidden[0][-1][-1] (LSTM)
        doc_emb = hidden[-1] if self.mode == 'GRU' else (hidden[0][-1] if self.mode == 'LSTM' else None)
        document_embeddings.append(doc_emb)

        # TODO Remove. Doing only this perfectly works on GPU
        #doc_emb = torch.mean(sentence_embeddings_per_doc, dim=1)
        #document_embeddings.append(doc_emb)
    cluster_embedding = torch.mean(torch.cat(document_embeddings), dim=0)`

RNN
`

    if self.mode == 'GRU':
        self.document_rnn = nn.GRU(embedding_size, embedding_size, num_layers=self.nb_layers, bias=True, dropout=self.dropout, bidirectional=False, batch_first=True)
    elif self.mode == 'LSTM':
        self.document_rnn = nn.LSTM(embedding_size, embedding_size, num_layers=self.nb_layers, bias=True, dropout=self.dropout, bidirectional=False, batch_first=True)
    self.document_rnn_hidden = self.init_hidden()

`

Hidden_init
`

def init_hidden(self):
    document_rnn_init_h = nn.Parameter(nn.init.xavier_uniform(torch.Tensor(self.nb_layers, self.batch_size, self.embedding_size).type(torch.FloatTensor)), requires_grad=True)
    if self.mode == 'GRU':
        return document_rnn_init_h
    elif self.mode == 'LSTM':
        document_rnn_init_c = nn.Parameter(nn.init.xavier_uniform(torch.Tensor(self.nb_layers, self.batch_size, self.embedding_size).type(torch.FloatTensor)), requires_grad=True)
        return (document_rnn_init_h, document_rnn_init_c)

`

  • OS: Linux Mint 18.2 Sonya
  • PyTorch version: From source (2b2d56d)
  • How you installed PyTorch (conda, pip, source): pip
  • Python version: 3.5
  • CUDA/cuDNN version: 9.1/7.0.5 (latest versions)
  • GPU models and configuration: Titan Xp 12Go (Driver 390.12)
  • GCC version (if compiling from source): (Ubuntu 5.4.0-6ubuntu1~16.04.6) 5.4.0 20160609

Metadata

Metadata

Assignees

No one assigned

    Labels

    module: cudnnRelated to torch.backends.cudnn, and CuDNN support

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions