[feature proposal] adaptive softmax

Hi, some time ago Adam mentioned that you were considering adding the `adaptive softmax`  from [1] to pytorch.

I wanted to ask if you would be interested in having an `nn.Module` implementation of that paper. 
From my benchmarks on `text-8` dataset (`44371` tokens), batch size of `128`, bptt of `20`, I get the following timings per batch.

|                | standard   | adaptive |
|:------------:           |:-------------:|:-----:|
| forward [ms]      | `58.1`       |  `13.5` |
| backward [ms]   | `90.74`     | `24.03` |

I think implementing `backward` by hand would require moving this to `C++` since the computation requires calling `softmax_backward`?

Please find below the API that I was thinking about.

[1] [Efficient softmax approximation for GPUs](https://arxiv.org/abs/1609.04309)
```python

class AdaptiveLogSoftmax(nn.Module):
    def __init__(self, in_features, n_classes, cutoffs):
        """
        Parameters
        ----------
        in_features : int
            dimensionality of input data
        n_classes : int
            number of classes
        cutoffs : List[int]
            cutoff values for clusters. 
            e.g. cutoffs = [100, 1000] means that 100 most frequent labels will
            end up in main cluster, labels between 101 and 1000
             will end up in the first sub-cluster, and labels 
            between 1000 and n_classes will end up in second sub-cluster
        """

        pass

    def reset_parameters(self):
        pass        

    def forward(self, input, target):
        """
        Parameters
        ----------
        input : torch.FloatTensor of size (batch_size x self.in_features)
        target : torch.LongTensor of size (batch_size)

        Returns
        -------
        output : torch.FloatTensor of size (batch_size)
            each entry is a log probability of a corresponding `target` value
        """

        pass
        

    def get_log_proba(self, input):
        """ Computes log probabilities for all `self.n_claseess` classes """

        pass
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[feature proposal] adaptive softmax #4659

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[feature proposal] adaptive softmax #4659

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions