dtype option for softmax #11719

ngimel · 2018-09-14T22:06:25Z

Add dtype argument to softmax/log_softmax functions.
Computing softmax in fp32 precision is necessary for mixed precision training, and converting output of the previous layer into fp32 and then reading it as fp32 in softmax is expensive, memory and perf-wise, this PR allows one to avoid it.
For most input data/dtype combinations, input data is converted to dtype and then softmax is computed. If input data is half type and dtype is fp32, kernels with the corresponding template arguments are called.

ssnl

Didn't review the kernels. But how about also adding the option to cross entropy loss? :)

ngimel · 2018-09-14T22:14:00Z

cross_entropy calls soft_max https://github.com/pytorch/pytorch/blob/master/torch/nn/functional.py#L1645 so it would require couple line python change.

ssnl · 2018-09-14T22:15:21Z

Yep, but you are already changing log_softmax, right?

ngimel · 2018-09-14T23:02:04Z

@ssnl, Yes, I can do that.
Test failure is legit,

 ======================================================================
22:32:33 FAIL: test_passing_one_positional_but_not_the_second (__main__.TestCustomOperators)
22:32:33 ----------------------------------------------------------------------
22:32:33 RuntimeError: Found 2 overloads for operator aten::log_softmax! Overloads are not supported from Python.
22:32:33 
22:32:33 During handling of the above exception, another exception occurred:
22:32:33 
22:32:33 Traceback (most recent call last):
22:32:33   File "test_jit.py", line 7796, in test_passing_one_positional_but_not_the_second
22:32:33     torch.ops.aten.log_softmax(torch.ones(5))
22:32:33 AssertionError: "aten::log_softmax\(\) is missing value for argument 'dim'." does not match "Found 2 overloads for operator aten::log_softmax! Overloads are not supported from Python."

but I'm not sure what's the preferred fix should be. FWIW, some operators already have overloads that are not supported from python, e.g.

In [3]: torch.ops.aten.sum(torch.ones(5))
---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-3-acd80e61fcec> in <module>()
----> 1 torch.ops.aten.sum(torch.ones(5))

/workspace/ngimel/pytorch_upstream/torch/_ops.py in __getattr__(self, op_name)
     56         # for overloads and raise an exception if there are more than one.
     57         qualified_op_name = '{}::{}'.format(self.name, op_name)
---> 58         op = torch._C._jit_get_operation(qualified_op_name)
     59         # let the script frontend know that op is identical to the builtin op
     60         # with qualified_op_name

RuntimeError: Found 5 overloads for operator aten::sum! Overloads are not supported from Python.

so log_softmax erroring out with similar message is not necessarily a big problem (?) .

ezyang · 2018-09-18T04:35:30Z

cc @apaszke @jamesr66a on JIT test

aten/src/ATen/native/SoftMax.cpp

ezyang · 2018-09-18T16:19:51Z

@ngimel Looking at this more closely I would advise updating the error message here.

ngimel · 2018-09-18T16:32:21Z

@ngimel Looking at this more closely I would advise updating the error message here.

In the jit tests on in the softmax_cpu assert? softmax_cpu should never be called with upconvert = True, and upconvert is not user exposed, if it happend, there's something wrong with the core that user can't fix, hence AT_ASSERTM and not AT_CHECK.

ngimel · 2018-09-20T16:30:43Z

Anything I can do to help move this forward? @apaszke @ezyang

test/test_jit.py

apaszke

I'm not super happy with the upconvert flag. It doesn't really specify the destination type. Should it be float? Should it be double? The context is probably dependent on the device, and this seems to overfit the CUDA context. Can't we apply a simple modification to our kernels, or simply have a _log_softmax_half_to_float implemented only for CUDA, and dispatch to _log_softmax(...).to(dtype) otherwise?

torch/_torch_docs.py

tools/autograd/derivatives.yaml

test/test_nn.py

ngimel · 2018-09-24T21:30:47Z

I'm not super happy with the upconvert flag. It doesn't really specify the destination type. Should it be float? Should it be double? The context is probably dependent on the device, and this seems to overfit the CUDA context. Can't we apply a simple modification to our kernels, or simply have a _log_softmax_half_to_float implemented only for CUDA, and dispatch to _log_softmax(...).to(dtype) otherwise?

upconvert is true only for cuda half inputs with fp32 dtype argument. I could dispatch to _log_softmax_half_to_float in this case, but it would require it's own entries for forward and backward in native_functions and in derivatives, and overall I don't think it would be any prettier.
Adding modification to kernels to support more input type /dtype combinations with a fast path can be done (in fact, for cuda kernels output type can be anything, it's a separate template parameter), but then dispatch will have to be really tricky (right now dispatch defines scalar_t from which I can derive acc_type, but any other combinations of input/output types would require changes to types defined in dispatch, and instantiating a cross-product of kernels with different input/output types, which no one wants.)

apaszke · 2018-09-25T15:15:02Z

upconvert is true only for cuda half inputs with fp32 dtype argument

That's precisely the problem. It's a very specific flag, with a very specific meaning, which it not at all implied by its name/function name/function signature.

I don't understand why the dispatch would be a problem. Can't you just declare the derivatives for the top-level native function log_softmax, and have it take full responsibility for providing the derivative no matter which implementation it chooses?

ngimel · 2018-09-25T16:35:26Z

If derivatives are declared for log_softmax then backward will have to take care of type conversion which is now delegated to autograd, which is not the end of the world, but will make backward more error-prone, especially if at some point some implicit conversions for other types are added.

facebook-github-bot

apaszke has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

facebook-github-bot

zou3519 has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

facebook-github-bot

zou3519 has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

facebook-github-bot

zou3519 has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

ngimel · 2018-10-12T05:58:33Z

Test failures seem unrelated.

Summary: Add dtype argument to softmax/log_softmax functions. Computing softmax in fp32 precision is necessary for mixed precision training, and converting output of the previous layer into fp32 and then reading it as fp32 in softmax is expensive, memory and perf-wise, this PR allows one to avoid it. For most input data/dtype combinations, input data is converted to dtype and then softmax is computed. If input data is half type and dtype is fp32, kernels with the corresponding template arguments are called. Pull Request resolved: pytorch/pytorch#11719 Reviewed By: ezyang Differential Revision: D10175514 Pulled By: zou3519 fbshipit-source-id: 06d285af91a0b659932236d41ad63b787eeed243

ngimel requested review from apaszke, colesbury, ezyang, gchanan, soumith and zdevito as code owners September 14, 2018 22:06

ssnl reviewed Sep 14, 2018

View reviewed changes

ezyang reviewed Sep 18, 2018

View reviewed changes

aten/src/ATen/native/SoftMax.cpp Outdated

This comment was marked as off-topic.

Sign in to view

goldsborough reviewed Sep 20, 2018

View reviewed changes

test/test_jit.py Outdated

This comment was marked as off-topic.

Sign in to view

ngimel force-pushed the mixed_softmax branch from 3e039a6 to 94c6400 Compare September 21, 2018 22:13

apaszke reviewed Sep 24, 2018

View reviewed changes

ngimel force-pushed the mixed_softmax branch from 56a9546 to fe802a7 Compare September 25, 2018 18:29

apaszke approved these changes Oct 3, 2018

View reviewed changes

facebook-github-bot reviewed Oct 3, 2018

View reviewed changes

facebook-github-bot reviewed Oct 9, 2018

View reviewed changes

facebook-github-bot reviewed Oct 10, 2018

View reviewed changes

Natalia Gimelshein added 4 commits October 11, 2018 13:19

add mixed precision softmax

595285c

lint

665dfe1

temporarily disable failing test

4aa4fd1

reenable test with transpose function

e9e2fd3

Natalia Gimelshein added 3 commits October 11, 2018 13:19

typo

f9c55c2

test cleanup, rename upconvert to half_to_float

a99ef7b

add _softmax functions to aten_interned_strings

f2c4397

ngimel requested review from Yangqing, anderspapitto, bddppq, dzhulgakov, ebetica, houseroad, jamesr66a and smessmer as code owners October 11, 2018 21:51

ngimel force-pushed the mixed_softmax branch from ba6e136 to f2c4397 Compare October 11, 2018 23:47

facebook-github-bot reviewed Oct 12, 2018

View reviewed changes

facebook-github-bot closed this in a98958d Oct 14, 2018

ngimel deleted the mixed_softmax branch January 16, 2019 19:51

ezyang added open source merged labels Jun 24, 2019

dtype option for softmax #11719

dtype option for softmax #11719

Uh oh!

Conversation

ngimel commented Sep 14, 2018

Uh oh!

ssnl left a comment

Choose a reason for hiding this comment

Uh oh!

ngimel commented Sep 14, 2018

Uh oh!

ssnl commented Sep 14, 2018

Uh oh!

ngimel commented Sep 14, 2018

Uh oh!

ezyang commented Sep 18, 2018

Uh oh!

This comment was marked as off-topic.

Uh oh!

ezyang commented Sep 18, 2018

Uh oh!

ngimel commented Sep 18, 2018

Uh oh!

ngimel commented Sep 20, 2018

Uh oh!

This comment was marked as off-topic.

Uh oh!

apaszke left a comment

Choose a reason for hiding this comment

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

ngimel commented Sep 24, 2018

Uh oh!

apaszke commented Sep 25, 2018

Uh oh!

ngimel commented Sep 25, 2018

Uh oh!

facebook-github-bot left a comment

Choose a reason for hiding this comment

Uh oh!

facebook-github-bot left a comment

Choose a reason for hiding this comment

Uh oh!

facebook-github-bot left a comment

Choose a reason for hiding this comment

Uh oh!

facebook-github-bot left a comment

Choose a reason for hiding this comment

Uh oh!

ngimel commented Oct 12, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants