Skip to content

Conversation

@EthanSteinberg
Copy link
Contributor

@EthanSteinberg EthanSteinberg commented Mar 13, 2018

This pull request adds max pooling support to the EmbeddingBag feature. Max pooling is a very common way of aggregating embeddings and it is quite useful to have it built-in to EmbeddingBag for both performance and ergonomics reasons.

This particular implementation of EmbeddingBag max pooling does not support sparse matrices or the scale_grad_by_freq feature. Those can be added in following pull requests if necessary.

This code has been tested by using the test_embedding_bag and test_embedding_bag_cuda unit tests within test_nn.py.

This closes #4762.

@ezyang
Copy link
Contributor

ezyang commented Mar 13, 2018

@pytorchbot test this please

@EthanSteinberg
Copy link
Contributor Author

@ezyang Oops. It looks like there were some compilation issues on some of the platforms (unfortunately it looks like my local linux cuda9 test environment doesn't catch everything). I believe I just fixed them. Would it be possible for you to trigger another test run?

(I'll guess I'll try to trigger a test here, but I don't think I have permission. @pytorchbot test this please)

@apaszke
Copy link
Contributor

apaszke commented Mar 13, 2018

@pytorchbot test this please

@soumith
Copy link
Contributor

soumith commented Mar 13, 2018

@pytorchbot add to whitelist

@EthanSteinberg
Copy link
Contributor Author

It looks like this pull request passes all the tests. Let me know if you want me to make any changes!

@goldsborough
Copy link
Contributor

@pytorchbot retest this please

2 similar comments
@goldsborough
Copy link
Contributor

@pytorchbot retest this please

@EthanSteinberg
Copy link
Contributor Author

@pytorchbot retest this please

@EthanSteinberg
Copy link
Contributor Author

@goldsborough Can you trigger another test run of this? I believe the latest CI changes caused a bunch of failures and it would be nice to double check that this code works.

@apaszke
Copy link
Contributor

apaszke commented Mar 14, 2018

@lalaland can you rebase your commits on top of current master? That might help with the failures

@EthanSteinberg EthanSteinberg force-pushed the add_max_to_embedding_bag branch from 284382a to 43df74f Compare March 14, 2018 15:02
@EthanSteinberg
Copy link
Contributor Author

@apaszke Done.

@apaszke
Copy link
Contributor

apaszke commented Mar 14, 2018

@pytorchbot test this please

@ezyang
Copy link
Contributor

ezyang commented Mar 15, 2018

@pytorchbot retest this please

@EthanSteinberg EthanSteinberg force-pushed the add_max_to_embedding_bag branch from 43df74f to 48b0d07 Compare March 15, 2018 21:14
@EthanSteinberg
Copy link
Contributor Author

There was a merge conflict due to another pull request being merged so I rebased the code and fixed the conflict.

@EthanSteinberg
Copy link
Contributor Author

Looks like the CI had some spurious failures.

@pytorchbot retest this please

@EthanSteinberg
Copy link
Contributor Author

Another spurious failure?

@pytorchbot retest this please

@ezyang
Copy link
Contributor

ezyang commented Mar 24, 2018

@pytorchbot retest this please

@EthanSteinberg
Copy link
Contributor Author

remote file operation failed: C:\Jenkins\workspace\pytorch-builds\pytorch-win-ws2016-cuda9-cudnn7-py3-build at hudson.remoting.Channel@4a696dd5:JNLP4-connect connection from ip-172-31-59-220.ec2.internal/172.31.59.220:49232

??

@pytorchbot retest this please

@EthanSteinberg
Copy link
Contributor Author

01:30:53 AssertionError:
01:30:53 1.00000e-05 *
01:30:53 1.0490
01:30:53 [torch.cuda.FloatTensor of size () (GPU 0)]
01:30:53 not less than or equal to 1e-05 :

That's well within the margin of error of our floating point operations. That testing code should probably switch over to using a maximum relative error. Also, it's failing on the sparse operations which I did not touch.

@pytorchbot retest this please

@ezyang
Copy link
Contributor

ezyang commented Mar 26, 2018

@pytorchbot retest this please

@ezyang
Copy link
Contributor

ezyang commented Mar 26, 2018

@lalaland You can adjust the desired precision on assertEqual. If you believe this applies here, you should change the call site to have different precision.

However, it is kind of weird that only the CUDA8 test is failing, and not the CUDA9 tests.

@EthanSteinberg
Copy link
Contributor Author

@ezyang It seems to be failing somewhat randomly. One issue here is that there is a lot of non-determinism in the order in which things get summed up. Note that the area where it's failing is not even code that I changed. It's failing in the sparse operations, which I did not touch.

I think the core issue here is that static epsilon values are a bad idea. The problem is that larger floating point numbers need larger epsilons. We should have epsilon measurements relative to the magnitude of the values being compared. There are three main options here as I see them:

  1. Increase the epsilon for floats globally. Say to 1e-4 instead of 1e-5. This fixes things temporarily.
  2. Increase the epsilon for floats for that particular embedding bag test. This also fixes things temporarily.
  3. Properly implement an epsilon relative to the magnitude of the numbers being compared.

Which do you want me to do?

@EthanSteinberg
Copy link
Contributor Author

@ezyang I increased the epsilon for the embedding bag tests and now all the tests are now passing. Do let me know if you want me to change anything.

@EthanSteinberg
Copy link
Contributor Author

I just merged the branch and resolved some merge conflicts.

@soumith
Copy link
Contributor

soumith commented Apr 26, 2018

thanks for patiently waiting @lalaland . Now that the 0.4 release is done, we have more bandwidth. I'll review the PR tomorrow.

Copy link
Contributor

@soumith soumith left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR looks great, pretty well done.
We really apologize for the late review.

i'm requesting minor changes in the naming of the cuda kernels, and some other minor API usage changes, once they are pushed, this is good to merge.

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

@EthanSteinberg
Copy link
Contributor Author

@soumith Thanks for the review. I implemented your changes and updated this PR. (Well, implemented all of them except the deterministic backward gpu pass. I'll do that in a separate PR).

@soumith
Copy link
Contributor

soumith commented Apr 29, 2018

will merge once tests pass

@EthanSteinberg
Copy link
Contributor Author

@soumith Just wanted to give you a heads up that the tests have passed.

@soumith soumith merged commit ee00a80 into pytorch:master Apr 29, 2018
@soumith
Copy link
Contributor

soumith commented Apr 29, 2018

thanks @lalaland !

@EthanSteinberg EthanSteinberg deleted the add_max_to_embedding_bag branch April 29, 2018 20:54
Jorghi12 pushed a commit to wsttiger/pytorch that referenced this pull request May 10, 2018
* Add max mode support to EmbeddingBag

* Lint fix

* Fix compilation issue on other platforms

* Rebase + don't waste memory when not in max mode

* Oops, missed a spot

* Fix whitespace from merge

* less precision

* Lower precision to avoid spurious failures

* Minor typo

* Switch to size()
weiyangfb pushed a commit to weiyangfb/pytorch that referenced this pull request Jun 11, 2018
* Add max mode support to EmbeddingBag

* Lint fix

* Fix compilation issue on other platforms

* Rebase + don't waste memory when not in max mode

* Oops, missed a spot

* Fix whitespace from merge

* less precision

* Lower precision to avoid spurious failures

* Minor typo

* Switch to size()
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Proposal] Max mode for EmbeddingBag

5 participants