Stop copying tensors to CPU for torch.unique() in vocab reduction #537

theweiho · 2019-05-16T21:08:30Z

Summary:
pytorch/pytorch#8899 had added CUDA support for torch.unique()

pytorch/pytorch#16145 has some timing stats that could be relevant

Experiment results: https://fb.quip.com/olQOA853j0mb
Words per second (gpu-unique_wps_avg_vs_base): 1.046x
Total train time (gpu-unique_total_train_time_vs_base; excl ar_AR-fr_XX): 0.987x

Even though train time reduction is pretty minimal (probably overshadowed by random variance, scheduling delay, etc), WPS does seem to be ~5% faster - so might as well land this.

Training time for ar_AR-fr_XX increased significantly - but that's b/c it trained for many more updates (gpu-unique_num_updates_avg_vs_base) - and also ended up w/ +1.43 BLEU. I think this is probably just an anomaly?

Differential Revision: D15073468

…torch#537) Summary: Pull Request resolved: pytorch#537 pytorch/pytorch#8899 had added CUDA support for `torch.unique()` pytorch/pytorch#16145 has some timing stats that could be relevant --- Experiment results: https://fb.quip.com/olQOA853j0mb Words per second (`gpu-unique_wps_avg_vs_base`): 1.046x Total train time (`gpu-unique_total_train_time_vs_base`; excl ar_AR-fr_XX): 0.987x Even though train time reduction is pretty minimal (probably overshadowed by random variance, scheduling delay, etc), WPS does seem to be ~5% faster - so might as well land this. Training time for ar_AR-fr_XX increased significantly - but that's b/c it trained for many more updates (`gpu-unique_num_updates_avg_vs_base`) - and also ended up w/ +1.43 BLEU. I think this is probably just an anomaly? Differential Revision: D15073468 fbshipit-source-id: 713288fc7c77f582840f270dd2e343a3b63f8fe5

…torch#537) Summary: Pull Request resolved: pytorch#537 pytorch/pytorch#8899 had added CUDA support for `torch.unique()` pytorch/pytorch#16145 has some timing stats that could be relevant --- Experiment results: https://fb.quip.com/olQOA853j0mb Words per second (`gpu-unique_wps_avg_vs_base`): 1.046x Total train time (`gpu-unique_total_train_time_vs_base`; excl ar_AR-fr_XX): 0.987x Even though train time reduction is pretty minimal (probably overshadowed by random variance, scheduling delay, etc), WPS does seem to be ~5% faster - so might as well land this. Training time for ar_AR-fr_XX increased significantly - but that's b/c it trained for many more updates (`gpu-unique_num_updates_avg_vs_base`) - and also ended up w/ +1.43 BLEU. I think this is probably just an anomaly? Differential Revision: D15073468 fbshipit-source-id: 29c7eaaddd63d629866c7314920fe27b22690603

facebook-github-bot · 2019-05-17T17:33:06Z

This pull request has been merged in 2abcc08.

theweiho force-pushed the export-D15073468 branch from b29dff7 to e2c9d87 Compare May 16, 2019 21:15

theweiho force-pushed the export-D15073468 branch from e2c9d87 to 50b04b8 Compare May 16, 2019 21:25

facebook-github-bot closed this in 2abcc08 May 17, 2019

facebook-github-bot added the Merged label May 17, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Stop copying tensors to CPU for torch.unique() in vocab reduction #537

Stop copying tensors to CPU for torch.unique() in vocab reduction #537

Uh oh!

theweiho commented May 16, 2019

Uh oh!

facebook-github-bot commented May 17, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Stop copying tensors to CPU for torch.unique() in vocab reduction #537

Stop copying tensors to CPU for torch.unique() in vocab reduction #537

Uh oh!

Conversation

theweiho commented May 16, 2019

Uh oh!

facebook-github-bot commented May 17, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants