Skip to content

Conversation

@cpuhrsch
Copy link
Contributor

@cpuhrsch cpuhrsch commented Jan 25, 2018

This diff creates code related to EmbeddingBag in ATen. It also allows sparse gradients.

This is the command I used to create the timings for both

NUMEXPR_NUM_THREADS=8 MKL_NUM_THREADS=8 OMP_NUM_THREADS=8 taskset -c 0-7 python benchmarks/embeddingbag.py

Master

""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""
runs: 10000     number of indices: 2000 maximum number of bags: 200     maximum bag size: 30

====================================================================================================
dimension:      10000   x       100
----------------------------------------------------------------------------------------------------
cpu dense       cpu sparse      cuda dense      cuda sparse
  13.175s          0.000s          7.638s          0.000s

====================================================================================================
dimension:      10000   x       1000
----------------------------------------------------------------------------------------------------
cpu dense       cpu sparse      cuda dense      cuda sparse
  66.805s          0.000s          7.442s          0.000s

====================================================================================================
dimension:      100000  x       100
----------------------------------------------------------------------------------------------------
cpu dense       cpu sparse      cuda dense      cuda sparse
  59.101s          0.000s          9.520s          0.000s

====================================================================================================
dimension:      100000  x       1000
----------------------------------------------------------------------------------------------------
cpu dense       cpu sparse      cuda dense      cuda sparse
 495.547s          0.000s         30.891s          0.000s

emb_tmp

""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""
runs: 10000     number of indices: 2000 maximum number of bags: 200     maximum bag size: 30

====================================================================================================
dimension:      10000   x       100
----------------------------------------------------------------------------------------------------
cpu dense       cpu sparse      cuda dense      cuda sparse
  15.195s         17.045s          6.571s          6.513s

====================================================================================================
dimension:      10000   x       1000
----------------------------------------------------------------------------------------------------
cpu dense       cpu sparse      cuda dense      cuda sparse
  64.924s         21.673s          8.399s          6.361s

====================================================================================================
dimension:      100000  x       100
----------------------------------------------------------------------------------------------------
cpu dense       cpu sparse      cuda dense      cuda sparse
  57.969s         14.924s          8.211s          6.711s

====================================================================================================
dimension:      100000  x       1000
----------------------------------------------------------------------------------------------------
cpu dense       cpu sparse      cuda dense      cuda sparse
 496.196s         22.128s         30.167s          6.683s

This is the script I used to benchmark.
EDIT: Added CUDA synchronize

The performance between both branches matches, but this new code has sparse gradients and lives within ATen.

The code was built with

python setup.py build_deps develop

The timings of the first 1000 runs are discarded and the benchmark is executed 10000 times.

@pytorchbot
Copy link
Collaborator

@cpuhrsch, thanks for your PR! We identified @zdevito to be a potential reviewer.

@colesbury
Copy link
Member

@pytorchbot add to whitelist

@colesbury
Copy link
Member

RE: the build errors: s/_VariableBase/_VariableFunctions

This comment was marked as off-topic.

Copy link
Member

@colesbury colesbury left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks pretty good.

  1. There's a change to pybind11 submodule that doesn't look like it belongs here
  2. The Windows build failure looks related to this PR:
01:14:54 LINK : fatal error LNK1181: cannot open input file 'src\ATen\CMakeFiles\ATen.dir\native\EmbeddingBag.cpp.obj'

https://ci.pytorch.org/jenkins/job/pytorch-builds/job/pytorch-win-ws2016-cuda9-cudnn7-py3-build/1357/console

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

@cpuhrsch cpuhrsch force-pushed the emb_tmp branch 3 times, most recently from 82e8247 to 7d8faa3 Compare February 7, 2018 16:04
@ssnl
Copy link
Collaborator

ssnl commented Feb 7, 2018

chiming in to say that the relevant issue is #4441 , which should be closed once this PR is approved and merged.

This comment was marked as off-topic.

@cpuhrsch cpuhrsch force-pushed the emb_tmp branch 2 times, most recently from 2fc03c6 to 874e3e3 Compare February 12, 2018 16:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants