Skip to content

Conversation

@cpuhrsch
Copy link
Contributor

@cpuhrsch cpuhrsch commented Mar 28, 2018

Measured perf using this script. The 1GB benchmarks for doubles are still running, but the 100MB benchmarks indicate a slight perf regression using this branch presumably because of the extra store and load within the non-avx2 implementation of vec and vec.

EDIT: Added timings for double
EDIT: Added support for abs for all types

Single core

Master

cos:            size: 10^2      count: 10000    elapsed: 1.55431699753  type: torch.FloatTensor
cos:            size: 10^3      count: 1000     elapsed: 1.62953400612  type: torch.FloatTensor
cos:            size: 10^4      count: 100      elapsed: 2.4978890419   type: torch.FloatTensor
cos:            size: 10^5      count: 10       elapsed: 2.39908790588  type: torch.FloatTensor
sin:            size: 10^2      count: 10000    elapsed: 1.44210910797  type: torch.FloatTensor
sin:            size: 10^3      count: 1000     elapsed: 1.54422807693  type: torch.FloatTensor
sin:            size: 10^4      count: 100      elapsed: 2.40846395493  type: torch.FloatTensor
sin:            size: 10^5      count: 10       elapsed: 2.29448604584  type: torch.FloatTensor
exp:            size: 10^2      count: 10000    elapsed: 1.50313210487  type: torch.FloatTensor
exp:            size: 10^3      count: 1000     elapsed: 1.5239470005   type: torch.FloatTensor
exp:            size: 10^4      count: 100      elapsed: 2.39896297455  type: torch.FloatTensor
exp:            size: 10^5      count: 10       elapsed: 2.31458091736  type: torch.FloatTensor
log:            size: 10^2      count: 10000    elapsed: 1.65806102753  type: torch.FloatTensor
log:            size: 10^3      count: 1000     elapsed: 1.69567799568  type: torch.FloatTensor
log:            size: 10^4      count: 100      elapsed: 2.56792616844  type: torch.FloatTensor
log:            size: 10^5      count: 10       elapsed: 2.47661995888  type: torch.FloatTensor

cos:            size: 10^2      count: 10000    elapsed: 29.2463679314  type: torch.DoubleTensor
cos:            size: 10^3      count: 1000     elapsed: 29.1897990704  type: torch.DoubleTensor
cos:            size: 10^4      count: 100      elapsed: 30.8953261375  type: torch.DoubleTensor
cos:            size: 10^5      count: 10       elapsed: 30.8482341766  type: torch.DoubleTensor
sin:            size: 10^2      count: 10000    elapsed: 28.2373690605  type: torch.DoubleTensor
sin:            size: 10^3      count: 1000     elapsed: 28.8227479458  type: torch.DoubleTensor
sin:            size: 10^4      count: 100      elapsed: 29.8306491375  type: torch.DoubleTensor
sin:            size: 10^5      count: 10       elapsed: 29.7849609852  type: torch.DoubleTensor
exp:            size: 10^2      count: 10000    elapsed: 21.3683919907  type: torch.DoubleTensor
exp:            size: 10^3      count: 1000     elapsed: 22.0100431442  type: torch.DoubleTensor
exp:            size: 10^4      count: 100      elapsed: 22.9923679829  type: torch.DoubleTensor
exp:            size: 10^5      count: 10       elapsed: 22.9505109787  type: torch.DoubleTensor
log:            size: 10^2      count: 10000    elapsed: 29.4384469986  type: torch.DoubleTensor
log:            size: 10^3      count: 1000     elapsed: 30.061797142   type: torch.DoubleTensor
log:            size: 10^4      count: 100      elapsed: 31.0885539055  type: torch.DoubleTensor
log:            size: 10^5      count: 10       elapsed: 31.0436210632  type: torch.DoubleTensor

This branch

cos:            size: 10^2      count: 10000    elapsed: 1.54843401909  type: torch.FloatTensor
cos:            size: 10^3      count: 1000     elapsed: 1.6413397789   type: torch.FloatTensor
cos:            size: 10^4      count: 100      elapsed: 2.52125191689  type: torch.FloatTensor
cos:            size: 10^5      count: 10       elapsed: 2.41325116158  type: torch.FloatTensor
sin:            size: 10^2      count: 10000    elapsed: 1.44992494583  type: torch.FloatTensor
sin:            size: 10^3      count: 1000     elapsed: 1.54990410805  type: torch.FloatTensor
sin:            size: 10^4      count: 100      elapsed: 2.40798306465  type: torch.FloatTensor
sin:            size: 10^5      count: 10       elapsed: 2.30475592613  type: torch.FloatTensor
exp:            size: 10^2      count: 10000    elapsed: 1.50176095963  type: torch.FloatTensor
exp:            size: 10^3      count: 1000     elapsed: 1.52033686638  type: torch.FloatTensor
exp:            size: 10^4      count: 100      elapsed: 2.40323591232  type: torch.FloatTensor
exp:            size: 10^5      count: 10       elapsed: 2.31940412521  type: torch.FloatTensor
log:            size: 10^2      count: 10000    elapsed: 1.63235998154  type: torch.FloatTensor
log:            size: 10^3      count: 1000     elapsed: 1.67921590805  type: torch.FloatTensor
log:            size: 10^4      count: 100      elapsed: 2.55977487564  type: torch.FloatTensor
log:            size: 10^5      count: 10       elapsed: 2.46202898026  type: torch.FloatTensor

cos:            size: 10^2      count: 10000    elapsed: 30.0235459805  type: torch.DoubleTensor
cos:            size: 10^3      count: 1000     elapsed: 31.4737381935  type: torch.DoubleTensor
cos:            size: 10^4      count: 100      elapsed: 32.1326041222  type: torch.DoubleTensor
cos:            size: 10^5      count: 10       elapsed: 31.6777579784  type: torch.DoubleTensor
sin:            size: 10^2      count: 10000    elapsed: 29.1807699203  type: torch.DoubleTensor
sin:            size: 10^3      count: 1000     elapsed: 29.6215879917  type: torch.DoubleTensor
sin:            size: 10^4      count: 100      elapsed: 30.6181018353  type: torch.DoubleTensor
sin:            size: 10^5      count: 10       elapsed: 30.2948379517  type: torch.DoubleTensor
exp:            size: 10^2      count: 10000    elapsed: 22.4347589016  type: torch.DoubleTensor
exp:            size: 10^3      count: 1000     elapsed: 23.307060957   type: torch.DoubleTensor
exp:            size: 10^4      count: 100      elapsed: 24.4995651245  type: torch.DoubleTensor
exp:            size: 10^5      count: 10       elapsed: 24.0824759007  type: torch.DoubleTensor
log:            size: 10^2      count: 10000    elapsed: 31.2646110058  type: torch.DoubleTensor
log:            size: 10^3      count: 1000     elapsed: 32.2329819202  type: torch.DoubleTensor
log:            size: 10^4      count: 100      elapsed: 33.4655709267  type: torch.DoubleTensor
log:            size: 10^5      count: 10       elapsed: 33.0231490135  type: torch.DoubleTensor

Abs Master

abs:            size: 10^2      count: 10000    elapsed: 4.18545389175  type: torch.FloatTensor
abs:            size: 10^3      count: 1000     elapsed: 4.30929303169  type: torch.FloatTensor
abs:            size: 10^4      count: 100      elapsed: 5.18289899826  type: torch.FloatTensor
abs:            size: 10^5      count: 10       elapsed: 5.12413811684  type: torch.FloatTensor
abs:            size: 10^2      count: 10000    elapsed: 4.21678590775  type: torch.DoubleTensor
abs:            size: 10^3      count: 1000     elapsed: 5.08613801003  type: torch.DoubleTensor
abs:            size: 10^4      count: 100      elapsed: 5.99231100082  type: torch.DoubleTensor
abs:            size: 10^5      count: 10       elapsed: 5.95689797401  type: torch.DoubleTensor
abs:            size: 10^2      count: 10000    elapsed: 4.16819906235  type: torch.IntTensor
abs:            size: 10^3      count: 1000     elapsed: 4.22109985352  type: torch.IntTensor
abs:            size: 10^4      count: 100      elapsed: 5.09025883675  type: torch.IntTensor
abs:            size: 10^5      count: 10       elapsed: 5.03382301331  type: torch.IntTensor
abs:            size: 10^2      count: 10000    elapsed: 4.22057914734  type: torch.LongTensor
abs:            size: 10^3      count: 1000     elapsed: 4.9265730381   type: torch.LongTensor
abs:            size: 10^4      count: 100      elapsed: 5.84256696701  type: torch.LongTensor
abs:            size: 10^5      count: 10       elapsed: 5.93304586411  type: torch.LongTensor
abs:            size: 10^2      count: 10000    elapsed: 4.17617201805  type: torch.ShortTensor
abs:            size: 10^3      count: 1000     elapsed: 4.2090690136   type: torch.ShortTensor
abs:            size: 10^4      count: 100      elapsed: 4.49850416183  type: torch.ShortTensor
abs:            size: 10^5      count: 10       elapsed: 4.626557827    type: torch.ShortTensor

Abs This branch

abs:            size: 10^2      count: 10000    elapsed: 0.448744058609 type: torch.FloatTensor
abs:            size: 10^3      count: 1000     elapsed: 0.421611070633 type: torch.FloatTensor
abs:            size: 10^4      count: 100      elapsed: 1.52210712433  type: torch.FloatTensor
abs:            size: 10^5      count: 10       elapsed: 1.49177908897  type: torch.FloatTensor
abs:            size: 10^2      count: 10000    elapsed: 0.844108104706 type: torch.DoubleTensor
abs:            size: 10^3      count: 1000     elapsed: 0.820927858353 type: torch.DoubleTensor
abs:            size: 10^4      count: 100      elapsed: 3.00412607193  type: torch.DoubleTensor
abs:            size: 10^5      count: 10       elapsed: 2.97248101234  type: torch.DoubleTensor
abs:            size: 10^2      count: 10000    elapsed: 0.445827960968 type: torch.IntTensor
abs:            size: 10^3      count: 1000     elapsed: 0.420156955719 type: torch.IntTensor
abs:            size: 10^4      count: 100      elapsed: 1.51804709435  type: torch.IntTensor
abs:            size: 10^5      count: 10       elapsed: 1.52535510063  type: torch.IntTensor
abs:            size: 10^2      count: 10000    elapsed: 0.844727993011 type: torch.LongTensor
abs:            size: 10^3      count: 1000     elapsed: 0.818938016891 type: torch.LongTensor
abs:            size: 10^4      count: 100      elapsed: 3.02740287781  type: torch.LongTensor
abs:            size: 10^5      count: 10       elapsed: 2.99336504936  type: torch.LongTensor
abs:            size: 10^2      count: 10000    elapsed: 0.251252174377 type: torch.ShortTensor
abs:            size: 10^3      count: 1000     elapsed: 0.217294931412 type: torch.ShortTensor
abs:            size: 10^4      count: 100      elapsed: 0.330620765686 type: torch.ShortTensor
abs:            size: 10^5      count: 10       elapsed: 0.745229005814 type: torch.ShortTensor

Copy link
Member

@colesbury colesbury left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure these macros made the code easier to read

def _testMathSize(size, self, torchfn, mathfn, dtype, prec):
# contiguous
m1 = torch.randn(*size)
m1 = torch.randn(*size).type(dtype)

This comment was marked as off-topic.

self.assertEqual(res1, res2, prec)

types = [
('torch.DoubleTensor', precs[0]),

This comment was marked as off-topic.

('torch.DoubleTensor', precs[0]),
('torch.FloatTensor', precs[1]),
]
for (dtype, prec) in types:

This comment was marked as off-topic.

#endif

#if defined(__GNUC__)
# define ALIGN32_ __attribute__((aligned(32)))

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

@cpuhrsch cpuhrsch force-pushed the elscvec branch 2 times, most recently from 8e40dbd to ca8a3d1 Compare March 29, 2018 16:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants