Handle sequence lengths correctly when exporting RNNs to ONNX #4695

anderspapitto · 2018-01-16T22:22:13Z

PackedSequence: store batch_sizes as tensor rather
than converting to a list of python integers. This maintains
the invariant that module's inputs/outputs are collections of
Variables.

In particular, this causes the JIT to no longer choke when flattening
and unflattening arguments.

when uniform sequence lengths are provided, correctly omit the
argument when constructing the ONNX graph, so as to not fix the
graph to the batch size.
handle PackedSequences by floating them through the graph and
eliminating them in an optimization pass. ONNX does not have packed
sequences, but operates on a representation equivalent to
PaddedSequence, so we hide the representation-switching from ONNX
as a preliminary step towards handling PackedSequences, not directly
tied to ONNX export, change batch_sizes from being an argument to
the RNN operators into being an argument to the forward() function
of those RNN operators. This more closely models the reality that
batch_sizes are effectively part of the input sequences.

pytorchbot · 2018-01-16T22:22:13Z

@anderspapitto, thanks for your PR! We identified @zdevito to be a potential reviewer.

houseroad

We should add some docs about OnnxElidedInput type node. We should also have a better story for multiple optional parameters.

torch/csrc/jit/export.cpp

ezyang · 2018-01-18T00:34:06Z

@anderspapitto and I discussed using Undefined instead of introducing a new ElidedOnnxNode.

torch/csrc/jit/export.cpp

torch/csrc/jit/passes/onnx/peephole.cpp

torch/nn/modules/module.py

torch/nn/utils/rnn.py

torch/nn/_functions/packing.py

torch/nn/_functions/rnn.py

torch/nn/utils/rnn.py

anderspapitto · 2018-01-26T22:33:08Z

[WIP] btw. There's one or two other components that I have to do to get this to export correctly

anderspapitto · 2018-02-06T16:17:45Z

@dzhulgakov yeah, need to wrap list of integers with Variable()

Although, actually I expect that in many cases, people will already have the lengths stored in a Variable, and were having to explicitly convert to list of int (because semantically, lengths/batch sizes are just as much a feature of the dynamic input as the actual float values) - so actually they will just be able to remove any such code.

apaszke · 2018-02-06T17:23:51Z

@anderspapitto I don't think that's the case. PackedSequence object are pretty much only produced using functions that we provide, so I doubt anyone has those lengths packed as Variables upfront.

jekbradbury · 2018-02-06T18:57:40Z

The batch_sizes attribute is internal to PackedSequence objects, but lengths needs to be provided by the user (and providing it as a Variable is more natural).

anderspapitto · 2018-02-06T19:13:43Z

@apaszke I actually don't have many examples, but it is true in the one (unfortunately internal, so I can't link it) case that I have looked at.

Anyway I think I didn't say that in the clearest way, so let me give it another go:

Whenever a user calls PackPadded(input, lengths), I would assume that input and lengths almost always came from the same source. Therefore it should probably be very natural to turn lengths into a Variable at the same time that input is turned into a Variable (which already happens).

torch/autograd/function.py

test/test_nn.py

torch/nn/_functions/packing.py

torch/nn/_functions/rnn.py

torch/nn/utils/rnn.py

torch/csrc/jit/passes/onnx/peephole.cpp

torch/nn/modules/rnn.py

torch/nn/utils/rnn.py

anderspapitto · 2018-02-06T22:45:13Z

addressed all comments

- when uniform sequence lengths are provided, correctly omit the argument when constructing the ONNX graph, so as to not fix the graph to the batch size. - handle PackedSequences by floating them through the graph and eliminating them in an optimization pass. ONNX does not have packed sequences, but operates on a representation equivalent to PaddedSequence, so we hide the representation-switching from ONNX - as a preliminary step towards handling PackedSequences, not directly tied to ONNX export, change batch_sizes from being an argument to the RNN operators into being an argument to the forward() function of those RNN operators. This more closely models the reality that batch_sizes are effectively part of the input sequences.

anderspapitto · 2018-02-07T00:17:02Z

23:37:49 + python ../compare_with_baseline.py test_gpu_speed_word_language_model 115.893 116.011 115.957 116.066 116.069
23:37:49 baseline mean:  115.5332
23:37:49 baseline sigma:  0.10897
23:37:49 test mean:  115.9992
23:37:49 test sigma:  0.06713091687144009
23:37:49 z-value:  4.2764063503717376
23:37:49 Traceback (most recent call last):
23:37:49   File "../compare_with_baseline.py", line 46, in <module>
23:37:49     raise Exception("z-value >= 3, there is 99.7% chance of perf regression.")
23:37:49 Exception: z-value >= 3, there is 99.7% chance of perf regression.

how can I know if this is spurious or something I need to address? 116.0 vs 115.5 seems small but maybe it isn't?

This was accidentally lost while addressing review comments on pytorch#4695 pack_padded_sequence may be called either with a list or with a Variable. If called with a list we convert to Variable internally. I added to test_nn to test the new codepath. The bug was also caught by the onnx-fb-universe tests (which rely on passing in Variable).

This was accidentally lost while addressing review comments on #4695 pack_padded_sequence may be called either with a list or with a Variable. If called with a list we convert to Variable internally. I added to test_nn to test the new codepath. The bug was also caught by the onnx-fb-universe tests (which rely on passing in Variable).

onnxbot-worker-3 mentioned this pull request Jan 16, 2018

[auto] pytorch-pr-4695 onnxbot/onnx-fb-universe#298

Closed

houseroad reviewed Jan 17, 2018

View reviewed changes

torch/csrc/jit/export.cpp Outdated

This comment was marked as off-topic.

Sign in to view

This comment was marked as off-topic.

Sign in to view

anderspapitto force-pushed the no-batch-size branch from e0c5aa8 to 08c9dd9 Compare January 17, 2018 19:17

anderspapitto force-pushed the no-batch-size branch 8 times, most recently from a6cb557 to fd3989d Compare January 23, 2018 22:42

ezyang reviewed Jan 24, 2018

View reviewed changes

torch/csrc/jit/export.cpp Outdated

This comment was marked as off-topic.

Sign in to view

This comment was marked as off-topic.

Sign in to view

ezyang reviewed Jan 24, 2018

View reviewed changes

torch/csrc/jit/export.cpp Outdated

This comment was marked as off-topic.

Sign in to view

This comment was marked as off-topic.

Sign in to view

ezyang reviewed Jan 24, 2018

View reviewed changes

torch/csrc/jit/passes/onnx/peephole.cpp Outdated

This comment was marked as off-topic.

Sign in to view

This comment was marked as off-topic.

Sign in to view

ezyang reviewed Jan 24, 2018

View reviewed changes

torch/nn/modules/module.py Outdated

This comment was marked as off-topic.

Sign in to view

This comment was marked as off-topic.

Sign in to view

This comment was marked as off-topic.

Sign in to view

ezyang reviewed Jan 24, 2018

View reviewed changes

torch/nn/utils/rnn.py Outdated

This comment was marked as off-topic.

Sign in to view

This comment was marked as off-topic.

Sign in to view

ezyang reviewed Jan 24, 2018

View reviewed changes

torch/nn/utils/rnn.py Outdated

This comment was marked as off-topic.

Sign in to view

This comment was marked as off-topic.

Sign in to view

This comment was marked as off-topic.

Sign in to view

anderspapitto force-pushed the no-batch-size branch 2 times, most recently from a265f8a to c115e76 Compare January 24, 2018 19:25

apaszke reviewed Jan 24, 2018

View reviewed changes

anderspapitto force-pushed the no-batch-size branch 3 times, most recently from f4696a4 to d57e226 Compare January 26, 2018 22:22

anderspapitto changed the title ~~Elide sequence_lens argument when converting RNNs to ONNX~~ [WIP] Elide sequence_lens argument when converting RNNs to ONNX Jan 26, 2018

anderspapitto force-pushed the no-batch-size branch 2 times, most recently from e39ae15 to aa29c97 Compare January 27, 2018 00:17

apaszke reviewed Feb 6, 2018

View reviewed changes