Add max pooling support to EmbeddingBag #5725

EthanSteinberg · 2018-03-13T00:45:56Z

This pull request adds max pooling support to the EmbeddingBag feature. Max pooling is a very common way of aggregating embeddings and it is quite useful to have it built-in to EmbeddingBag for both performance and ergonomics reasons.

This particular implementation of EmbeddingBag max pooling does not support sparse matrices or the scale_grad_by_freq feature. Those can be added in following pull requests if necessary.

This code has been tested by using the test_embedding_bag and test_embedding_bag_cuda unit tests within test_nn.py.

This closes #4762.

ezyang · 2018-03-13T04:13:21Z

@pytorchbot test this please

EthanSteinberg · 2018-03-13T16:41:05Z

@ezyang Oops. It looks like there were some compilation issues on some of the platforms (unfortunately it looks like my local linux cuda9 test environment doesn't catch everything). I believe I just fixed them. Would it be possible for you to trigger another test run?

(I'll guess I'll try to trigger a test here, but I don't think I have permission. @pytorchbot test this please)

apaszke · 2018-03-13T16:45:12Z

@pytorchbot test this please

soumith · 2018-03-13T17:17:40Z

@pytorchbot add to whitelist

EthanSteinberg · 2018-03-13T19:13:45Z

It looks like this pull request passes all the tests. Let me know if you want me to make any changes!

goldsborough · 2018-03-13T23:11:39Z

@pytorchbot retest this please

goldsborough · 2018-03-14T04:17:29Z

@pytorchbot retest this please

EthanSteinberg · 2018-03-14T06:00:55Z

@pytorchbot retest this please

EthanSteinberg · 2018-03-14T13:46:28Z

@goldsborough Can you trigger another test run of this? I believe the latest CI changes caused a bunch of failures and it would be nice to double check that this code works.

apaszke · 2018-03-14T14:57:58Z

@lalaland can you rebase your commits on top of current master? That might help with the failures

EthanSteinberg · 2018-03-14T15:03:14Z

@apaszke Done.

apaszke · 2018-03-14T15:21:19Z

@pytorchbot test this please

ezyang · 2018-03-15T15:41:15Z

@pytorchbot retest this please

EthanSteinberg · 2018-03-15T21:16:35Z

There was a merge conflict due to another pull request being merged so I rebased the code and fixed the conflict.

EthanSteinberg · 2018-03-23T23:49:15Z

Looks like the CI had some spurious failures.

@pytorchbot retest this please

EthanSteinberg · 2018-03-24T00:21:22Z

Another spurious failure?

@pytorchbot retest this please

ezyang · 2018-03-24T00:33:43Z

@pytorchbot retest this please

EthanSteinberg · 2018-03-25T01:01:42Z

remote file operation failed: C:\Jenkins\workspace\pytorch-builds\pytorch-win-ws2016-cuda9-cudnn7-py3-build at hudson.remoting.Channel@4a696dd5:JNLP4-connect connection from ip-172-31-59-220.ec2.internal/172.31.59.220:49232

??

@pytorchbot retest this please

EthanSteinberg · 2018-03-25T14:38:53Z

01:30:53 AssertionError:
01:30:53 1.00000e-05 *
01:30:53 1.0490
01:30:53 [torch.cuda.FloatTensor of size () (GPU 0)]
01:30:53 not less than or equal to 1e-05 :

That's well within the margin of error of our floating point operations. That testing code should probably switch over to using a maximum relative error. Also, it's failing on the sparse operations which I did not touch.

@pytorchbot retest this please

ezyang · 2018-03-26T16:27:18Z

@pytorchbot retest this please

ezyang · 2018-03-26T18:15:59Z

@lalaland You can adjust the desired precision on assertEqual. If you believe this applies here, you should change the call site to have different precision.

However, it is kind of weird that only the CUDA8 test is failing, and not the CUDA9 tests.

EthanSteinberg · 2018-03-26T18:29:22Z

@ezyang It seems to be failing somewhat randomly. One issue here is that there is a lot of non-determinism in the order in which things get summed up. Note that the area where it's failing is not even code that I changed. It's failing in the sparse operations, which I did not touch.

I think the core issue here is that static epsilon values are a bad idea. The problem is that larger floating point numbers need larger epsilons. We should have epsilon measurements relative to the magnitude of the values being compared. There are three main options here as I see them:

Increase the epsilon for floats globally. Say to 1e-4 instead of 1e-5. This fixes things temporarily.
Increase the epsilon for floats for that particular embedding bag test. This also fixes things temporarily.
Properly implement an epsilon relative to the magnitude of the numbers being compared.

Which do you want me to do?

…ing_bag

EthanSteinberg · 2018-03-29T17:11:54Z

@ezyang I increased the epsilon for the embedding bag tests and now all the tests are now passing. Do let me know if you want me to change anything.

EthanSteinberg · 2018-04-03T23:41:04Z

I just merged the branch and resolved some merge conflicts.

soumith · 2018-04-26T04:19:20Z

thanks for patiently waiting @lalaland . Now that the 0.4 release is done, we have more bandwidth. I'll review the PR tomorrow.

soumith

The PR looks great, pretty well done.
We really apologize for the late review.

i'm requesting minor changes in the naming of the cuda kernels, and some other minor API usage changes, once they are pushed, this is good to merge.

aten/src/ATen/native/EmbeddingBag.cpp

aten/src/ATen/native/cuda/EmbeddingBag.cu

…ing_bag

EthanSteinberg · 2018-04-29T19:06:08Z

@soumith Thanks for the review. I implemented your changes and updated this PR. (Well, implemented all of them except the deterministic backward gpu pass. I'll do that in a separate PR).

soumith · 2018-04-29T19:17:45Z

will merge once tests pass

EthanSteinberg · 2018-04-29T20:32:32Z

@soumith Just wanted to give you a heads up that the tests have passed.

soumith · 2018-04-29T20:49:41Z

thanks @lalaland !

* Add max mode support to EmbeddingBag * Lint fix * Fix compilation issue on other platforms * Rebase + don't waste memory when not in max mode * Oops, missed a spot * Fix whitespace from merge * less precision * Lower precision to avoid spurious failures * Minor typo * Switch to size()

EthanSteinberg force-pushed the add_max_to_embedding_bag branch from 284382a to 43df74f Compare March 14, 2018 15:02

EthanSteinberg added 4 commits March 15, 2018 13:07

Add max mode support to EmbeddingBag

f375a88

Lint fix

dd4e052

Fix compilation issue on other platforms

fccbe57

Rebase + don't waste memory when not in max mode

48b0d07

EthanSteinberg force-pushed the add_max_to_embedding_bag branch from 43df74f to 48b0d07 Compare March 15, 2018 21:14

EthanSteinberg and others added 3 commits March 15, 2018 15:32

Oops, missed a spot

56c4fd8

Merge branch 'master' into add_max_to_embedding_bag

f71ba6f

Fix whitespace from merge

a9e703d

EthanSteinberg added 3 commits March 29, 2018 08:34

less precision

30e2da4

Merge remote-tracking branch 'upstream/master' into add_max_to_embedd…

4f0b0a1

…ing_bag

Lower precision to avoid spurious failures

f453870

Merge + fix minor issue

6a882a8

Merge

dfb2372

EthanSteinberg requested review from apaszke, colesbury, ezyang, gchanan, soumith and zdevito as code owners April 26, 2018 04:15

Minor typo

9deb22c

soumith suggested changes Apr 29, 2018

View reviewed changes

EthanSteinberg added 2 commits April 29, 2018 11:13

Merge remote-tracking branch 'upstream/master' into add_max_to_embedd…

d8911ec

…ing_bag

Switch to size()

64b3f40

soumith approved these changes Apr 29, 2018

View reviewed changes

soumith merged commit ee00a80 into pytorch:master Apr 29, 2018

EthanSteinberg deleted the add_max_to_embedding_bag branch April 29, 2018 20:54

ezyang added the open source label Jun 24, 2019

Add max pooling support to EmbeddingBag #5725

Add max pooling support to EmbeddingBag #5725

Uh oh!

Conversation

EthanSteinberg commented Mar 13, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ezyang commented Mar 13, 2018

Uh oh!

EthanSteinberg commented Mar 13, 2018

Uh oh!

apaszke commented Mar 13, 2018

Uh oh!

soumith commented Mar 13, 2018

Uh oh!

EthanSteinberg commented Mar 13, 2018

Uh oh!

goldsborough commented Mar 13, 2018

Uh oh!

goldsborough commented Mar 14, 2018

Uh oh!

EthanSteinberg commented Mar 14, 2018

Uh oh!

EthanSteinberg commented Mar 14, 2018

Uh oh!

apaszke commented Mar 14, 2018

Uh oh!

EthanSteinberg commented Mar 14, 2018

Uh oh!

apaszke commented Mar 14, 2018

Uh oh!

ezyang commented Mar 15, 2018

Uh oh!

EthanSteinberg commented Mar 15, 2018

Uh oh!

EthanSteinberg commented Mar 23, 2018

Uh oh!

EthanSteinberg commented Mar 24, 2018

Uh oh!

ezyang commented Mar 24, 2018

Uh oh!

EthanSteinberg commented Mar 25, 2018

Uh oh!

EthanSteinberg commented Mar 25, 2018

Uh oh!

ezyang commented Mar 26, 2018

Uh oh!

ezyang commented Mar 26, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

EthanSteinberg commented Mar 26, 2018

Uh oh!

EthanSteinberg commented Mar 29, 2018

Uh oh!

EthanSteinberg commented Apr 3, 2018

Uh oh!

soumith commented Apr 26, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

soumith left a comment

Choose a reason for hiding this comment

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

EthanSteinberg commented Mar 13, 2018 •

edited

Loading

ezyang commented Mar 26, 2018 •

edited

Loading

soumith commented Apr 26, 2018 •

edited

Loading