This repository was archived by the owner on Aug 1, 2023. It is now read-only.
Stop copying tensors to CPU for torch.unique() in vocab reduction #537
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Summary:
pytorch/pytorch#8899 had added CUDA support for
torch.unique()pytorch/pytorch#16145 has some timing stats that could be relevant
Experiment results: https://fb.quip.com/olQOA853j0mb
Words per second (
gpu-unique_wps_avg_vs_base): 1.046xTotal train time (
gpu-unique_total_train_time_vs_base; excl ar_AR-fr_XX): 0.987xEven though train time reduction is pretty minimal (probably overshadowed by random variance, scheduling delay, etc), WPS does seem to be ~5% faster - so might as well land this.
Training time for ar_AR-fr_XX increased significantly - but that's b/c it trained for many more updates (
gpu-unique_num_updates_avg_vs_base) - and also ended up w/ +1.43 BLEU. I think this is probably just an anomaly?Differential Revision: D15073468