Adam not optimally implemented: unnecessary torch.div

### 🚀 The feature, motivation and pitch

Adam is implemented in https://github.com/pytorch/pytorch/blob/master/torch/optim/adam.py#L259 using the following effective formula (terminology from the [paper](https://arxiv.org/abs/1412.6980) and the code with light abbreviations): 
```
theta = lr / bias1 * m / (sqrt(v) / bias2 + eps)
```
This formulation requires an additional division of the full-sized tensor `sqrt(v)` by `bias2`. Presumably it was done this way to have `eps` not be implicitly affected by `bias2`. However, we can still achieve this without the costly `sqrt(v)/bias2` op - multiply & divide `eps` by `bias2` and then factor out `1/bias2` in the denominator and you get: 
```
theta = lr * bias2 / bias1 * m / (sqrt(v) + bias2 * eps)
```
which should be incrementally faster to compute. 

Note: the current formulation permeates the Adam and Adam-esque implementations, so any update should presumably be propagated across them. 




### Alternatives

N/a

### Additional context

I've confirmed that the current formulation does indeed induce an extra `div` op in the Adam step: 

![image](https://user-images.githubusercontent.com/90846/173192526-34c07c06-e6f9-4d49-8d56-2dc92bdc5466.png)

This was done on pytorch v1.10, and the code has been reorganized since so the precise file & line reference is no longer accurate, but the relevant piece of code hasn't actually changed, it's still there on https://github.com/pytorch/pytorch/blob/master/torch/optim/adam.py#L259 as of this writing. 

cc @VitalyFedyunin @ngimel @vincentqb @jbschlosser @albanD

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Adam not optimally implemented: unnecessary torch.div #79352

🚀 The feature, motivation and pitch

Alternatives

Additional context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Adam not optimally implemented: unnecessary torch.div #79352

Description

🚀 The feature, motivation and pitch

Alternatives

Additional context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions