Skip to content

Conversation

@yueyericardo
Copy link
Contributor

@yueyericardo yueyericardo commented Jun 26, 2018

Add cuda support for unique.

There is a simple test below for a tensor including 1M data.
And the performance is faster.

Performance
cpu: 0.05040597915649414 s
x: tensor([1, 3, 1,  ..., 4, 9, 4])
x output: tensor([1, 2, 3, 4, 5, 6, 7, 8, 9])
x inverse: tensor([0, 2, 0,  ..., 3, 8, 3])

gpu: 0.015192985534667969 s
y: tensor([1, 3, 1,  ..., 4, 9, 4], device='cuda:0')
y output: tensor([1, 2, 3, 4, 5, 6, 7, 8, 9], device='cuda:0')
y inverse: tensor([0, 2, 0,  ..., 3, 8, 3], device='cuda:0')
Code
import torch
import time
x=torch.randint(1,10,(1000000,),dtype=torch.long)
device = torch.device("cuda")
y=x.to(device)
start = time.time();
output,inverse = x.unique(sorted=True,return_inverse=True)
stop = time.time();
print('cpu:',stop-start,'s')
print('x:',x)
print('x output:',output)
print('x inverse:',inverse)

start = time.time();
output1,inverse1 = y.unique(sorted=True,return_inverse=True)
torch.cuda.synchronize();
stop = time.time();
print('gpu:',stop-start,'s')
print('y:',y)
print('y output:',output1)
print('y inverse:',inverse1)

@ngimel
Copy link
Collaborator

ngimel commented Jun 26, 2018

I don't see a synchronization call (torch.cuda.synchronize()) after call to unique on cuda, cuda calls should be times with synchronization otherwise timings might be inaccurate.

@yueyericardo
Copy link
Contributor Author

@ngimel Thanks for your remind. I already changed the code and result.

@ezyang
Copy link
Contributor

ezyang commented Jun 28, 2018

ROCm build OOMed:

4:51:05 In file included from /var/lib/jenkins/workspace/aten/src/THC/THCTensorCopy.cu:3:
14:51:05 /var/lib/jenkins/workspace/aten/src/THC/THCNumerics.cuh:740:72: warning: using integer absolute value function 'abs' when argument is of floating point type [-Wabsolute-value]
14:51:05   static inline __host__ __device__  double abs  (double a) { return   ::abs(a); }
14:51:05                                                                        ^
14:51:05 /var/lib/jenkins/workspace/aten/src/THC/THCNumerics.cuh:740:72: note: use function 'std::abs' instead
14:51:05   static inline __host__ __device__  double abs  (double a) { return   ::abs(a); }
14:51:05                                                                        ^~~~~
14:51:05                                                                        std::abs
14:51:09 [ 70%] Building HIPCC object caffe2/CMakeFiles/caffe2_hip.dir/__/aten/src/THC/caffe2_hip_generated_THCTensorMath.cu.o
14:51:12 [ 70%] Building HIPCC object caffe2/CMakeFiles/caffe2_hip.dir/__/aten/src/THC/caffe2_hip_generated_THCTensorMathBlas.cu.o
14:51:12 [ 70%] Building HIPCC object caffe2/CMakeFiles/caffe2_hip.dir/__/aten/src/THC/caffe2_hip_generated_THCTensorMathMagma.cu.o
14:51:15 In file included from /var/lib/jenkins/workspace/aten/src/THC/THCTensorMath.cu:5:
14:51:15 /var/lib/jenkins/workspace/aten/src/THC/THCNumerics.cuh:740:72: warning: using integer absolute value function 'abs' when argument is of floating point type [-Wabsolute-value]
14:51:15   static inline __host__ __device__  double abs  (double a) { return   ::abs(a); }
14:51:15                                                                        ^
14:51:15 /var/lib/jenkins/workspace/aten/src/THC/THCNumerics.cuh:740:72: note: use function 'std::abs' instead
14:51:15   static inline __host__ __device__  double abs  (double a) { return   ::abs(a); }
14:51:15                                                                        ^~~~~
14:51:15                                                                        std::abs
14:51:17 In file included from /var/lib/jenkins/workspace/aten/src/THC/THCTensorMathBlas.cu:5:
14:51:17 /var/lib/jenkins/workspace/aten/src/THC/THCNumerics.cuh:740:72: warning: using integer absolute value function 'abs' when argument is of floating point type [-Wabsolute-value]
14:51:17   static inline __host__ __device__  double abs  (double a) { return   ::abs(a); }
14:51:17                                                                        ^
14:51:17 /var/lib/jenkins/workspace/aten/src/THC/THCNumerics.cuh:740:72: note: use function 'std::abs' instead
14:51:17   static inline __host__ __device__  double abs  (double a) { return   ::abs(a); }
14:51:17                                                                        ^~~~~
14:51:17                                                                        std::abs
14:51:19 1 warning generated.
14:51:25 [ 70%] Building HIPCC object caffe2/CMakeFiles/caffe2_hip.dir/__/aten/src/THC/caffe2_hip_generated_THCTensorMathPairwise.cu.o
14:51:27 [ 70%] Building HIPCC object caffe2/CMakeFiles/caffe2_hip.dir/__/aten/src/THC/caffe2_hip_generated_THCTensorMathReduce.cu.o
14:51:31 In file included from /var/lib/jenkins/workspace/aten/src/THC/THCTensorMathPairwise.cu:6:
14:51:31 /var/lib/jenkins/workspace/aten/src/THC/THCNumerics.cuh:740:72: warning: using integer absolute value function 'abs' when argument is of floating point type [-Wabsolute-value]
14:51:31   static inline __host__ __device__  double abs  (double a) { return   ::abs(a); }
14:51:31                                                                        ^
14:51:31 /var/lib/jenkins/workspace/aten/src/THC/THCNumerics.cuh:740:72: note: use function 'std::abs' instead
14:51:31   static inline __host__ __device__  double abs  (double a) { return   ::abs(a); }
14:51:31                                                                        ^~~~~
14:51:31                                                                        std::abs
14:51:32 In file included from /var/lib/jenkins/workspace/aten/src/THC/THCTensorMathReduce.cu:1:
14:51:32 In file included from /var/lib/jenkins/workspace/aten/src/THC/THCTensorMathReduce.cuh:6:
14:51:32 /var/lib/jenkins/workspace/aten/src/THC/THCNumerics.cuh:740:72: warning: using integer absolute value function 'abs' when argument is of floating point type [-Wabsolute-value]
14:51:32   static inline __host__ __device__  double abs  (double a) { return   ::abs(a); }
14:51:32                                                                        ^
14:51:32 /var/lib/jenkins/workspace/aten/src/THC/THCNumerics.cuh:740:72: note: use function 'std::abs' instead
14:51:32   static inline __host__ __device__  double abs  (double a) { return   ::abs(a); }
14:51:32                                                                        ^~~~~
14:51:32                                                                        std::abs
14:51:35 1 warning generated.
14:51:40 1 warning generated.
14:51:40 1 warning generated.
14:51:57 [ 70%] Building HIPCC object caffe2/CMakeFiles/caffe2_hip.dir/__/aten/src/THC/caffe2_hip_generated_THCTensorMathScan.cu.o
14:52:02 In file included from /var/lib/jenkins/workspace/aten/src/THC/THCTensorMathScan.cu:6:
14:52:02 In file included from /var/lib/jenkins/workspace/aten/src/THC/THCReduce.cuh:13:
14:52:02 /var/lib/jenkins/workspace/aten/src/THC/THCNumerics.cuh:740:72: warning: using integer absolute value function 'abs' when argument is of floating point type [-Wabsolute-value]
14:52:02   static inline __host__ __device__  double abs  (double a) { return   ::abs(a); }
14:52:02                                                                        ^
14:52:02 /var/lib/jenkins/workspace/aten/src/THC/THCNumerics.cuh:740:72: note: use function 'std::abs' instead
14:52:02   static inline __host__ __device__  double abs  (double a) { return   ::abs(a); }
14:52:02                                                                        ^~~~~
14:52:02                                                                        std::abs
14:52:09 [ 70%] Building HIPCC object caffe2/CMakeFiles/caffe2_hip.dir/__/aten/src/THC/caffe2_hip_generated_THCTensorIndex.cu.o
14:52:14 In file included from /var/lib/jenkins/workspace/aten/src/THC/THCTensorIndex.cu:9:
14:52:14 In file included from /var/lib/jenkins/workspace/aten/src/THC/THCReduce.cuh:13:
14:52:14 /var/lib/jenkins/workspace/aten/src/THC/THCNumerics.cuh:740:72: warning: using integer absolute value function 'abs' when argument is of floating point type [-Wabsolute-value]
14:52:14   static inline __host__ __device__  double abs  (double a) { return   ::abs(a); }
14:52:14                                                                        ^
14:52:14 /var/lib/jenkins/workspace/aten/src/THC/THCNumerics.cuh:740:72: note: use function 'std::abs' instead
14:52:14   static inline __host__ __device__  double abs  (double a) { return   ::abs(a); }
14:52:14                                                                        ^~~~~
14:52:14                                                                        std::abs
14:52:17 1 warning generated.
14:52:51 1 warning generated.
14:52:57 [ 71%] Building HIPCC object caffe2/CMakeFiles/caffe2_hip.dir/__/aten/src/THC/caffe2_hip_generated_THCTensorRandom.cu.o
14:53:02 In file included from /var/lib/jenkins/workspace/aten/src/THC/THCTensorRandom.cu:7:
14:53:02 In file included from /var/lib/jenkins/workspace/aten/src/THC/THCTensorRandom.cuh:4:
14:53:02 /var/lib/jenkins/workspace/aten/src/THC/THCNumerics.cuh:740:72: warning: using integer absolute value function 'abs' when argument is of floating point type [-Wabsolute-value]
14:53:02   static inline __host__ __device__  double abs  (double a) { return   ::abs(a); }
14:53:02                                                                        ^
14:53:02 /var/lib/jenkins/workspace/aten/src/THC/THCNumerics.cuh:740:72: note: use function 'std::abs' instead
14:53:02   static inline __host__ __device__  double abs  (double a) { return   ::abs(a); }
14:53:02                                                                        ^~~~~
14:53:02                                                                        std::abs
14:53:05 [ 71%] Building HIPCC object caffe2/CMakeFiles/caffe2_hip.dir/__/aten/src/THC/caffe2_hip_generated_THCTensorScatterGather.cu.o
14:53:07 1 warning generated.
14:53:10 In file included from /var/lib/jenkins/workspace/aten/src/THC/THCTensorScatterGather.cu:3:
14:53:10 In file included from /var/lib/jenkins/workspace/aten/src/THC/THCAtomics.cuh:6:
14:53:10 /var/lib/jenkins/workspace/aten/src/THC/THCNumerics.cuh:740:72: warning: using integer absolute value function 'abs' when argument is of floating point type [-Wabsolute-value]
14:53:10   static inline __host__ __device__  double abs  (double a) { return   ::abs(a); }
14:53:10                                                                        ^
14:53:10 /var/lib/jenkins/workspace/aten/src/THC/THCNumerics.cuh:740:72: note: use function 'std::abs' instead
14:53:10   static inline __host__ __device__  double abs  (double a) { return   ::abs(a); }
14:53:10                                                                        ^~~~~
14:53:10                                                                        std::abs
14:53:18 1 warning generated.
14:53:23 [ 71%] Building HIPCC object caffe2/CMakeFiles/caffe2_hip.dir/__/aten/src/THC/caffe2_hip_generated_THCTensorTopK.cu.o
14:53:28 In file included from /var/lib/jenkins/workspace/aten/src/THC/THCTensorTopK.cu:8:
14:53:28 In file included from /var/lib/jenkins/workspace/aten/src/THC/THCTensorMathReduce.cuh:6:
14:53:28 /var/lib/jenkins/workspace/aten/src/THC/THCNumerics.cuh:740:72: warning: using integer absolute value function 'abs' when argument is of floating point type [-Wabsolute-value]
14:53:28   static inline __host__ __device__  double abs  (double a) { return   ::abs(a); }
14:53:28                                                                        ^
14:53:28 /var/lib/jenkins/workspace/aten/src/THC/THCNumerics.cuh:740:72: note: use function 'std::abs' instead
14:53:28   static inline __host__ __device__  double abs  (double a) { return   ::abs(a); }
14:53:28                                                                        ^~~~~
14:53:28                                                                        std::abs
14:53:31 1 warning generated.
14:53:39 [ 71%] Building HIPCC object caffe2/CMakeFiles/caffe2_hip.dir/__/aten/src/THC/caffe2_hip_generated_THCTensorSort.cu.o
14:53:44 In file included from /var/lib/jenkins/workspace/aten/src/THC/THCTensorSort.cu:1:
14:53:44 In file included from /var/lib/jenkins/workspace/aten/src/THC/THCTensorSort.cuh:5:
14:53:44 In file included from /var/lib/jenkins/workspace/aten/src/THC/THCSortUtils.cuh:6:
14:53:44 /var/lib/jenkins/workspace/aten/src/THC/THCNumerics.cuh:740:72: warning: using integer absolute value function 'abs' when argument is of floating point type [-Wabsolute-value]
14:53:44   static inline __host__ __device__  double abs  (double a) { return   ::abs(a); }
14:53:44                                                                        ^
14:53:44 /var/lib/jenkins/workspace/aten/src/THC/THCNumerics.cuh:740:72: note: use function 'std::abs' instead
14:53:44   static inline __host__ __device__  double abs  (double a) { return   ::abs(a); }
14:53:44                                                                        ^~~~~
14:53:44                                                                        std::abs
14:53:47 1 warning generated.
14:53:53 [ 71%] Building HIPCC object caffe2/CMakeFiles/caffe2_hip.dir/__/aten/src/THC/caffe2_hip_generated_THCSortUtils.cu.o
14:53:55 [ 71%] Building HIPCC object caffe2/CMakeFiles/caffe2_hip.dir/__/aten/src/THC/caffe2_hip_generated_THCTensorMode.cu.o
14:53:58 In file included from /var/lib/jenkins/workspace/aten/src/THC/THCSortUtils.cu:1:
14:53:58 In file included from /var/lib/jenkins/workspace/aten/src/THC/THCSortUtils.cuh:6:
14:53:58 /var/lib/jenkins/workspace/aten/src/THC/THCNumerics.cuh:740:72: warning: using integer absolute value function 'abs' when argument is of floating point type [-Wabsolute-value]
14:53:58   static inline __host__ __device__  double abs  (double a) { return   ::abs(a); }
14:53:58                                                                        ^
14:53:58 /var/lib/jenkins/workspace/aten/src/THC/THCNumerics.cuh:740:72: note: use function 'std::abs' instead
14:53:58   static inline __host__ __device__  double abs  (double a) { return   ::abs(a); }
14:53:58                                                                        ^~~~~
14:53:58                                                                        std::abs
14:53:59 1 warning generated.
14:54:02 In file included from /var/lib/jenkins/workspace/aten/src/THC/THCTensorMode.cu:15:
14:54:02 In file included from /var/lib/jenkins/workspace/aten/src/THC/THCTensorMode.cuh:4:
14:54:02 /var/lib/jenkins/workspace/aten/src/THC/THCNumerics.cuh:740:72: warning: using integer absolute value function 'abs' when argument is of floating point type [-Wabsolute-value]
14:54:02   static inline __host__ __device__  double abs  (double a) { return   ::abs(a); }
14:54:02                                                                        ^
14:54:02 /var/lib/jenkins/workspace/aten/src/THC/THCNumerics.cuh:740:72: note: use function 'std::abs' instead
14:54:02   static inline __host__ __device__  double abs  (double a) { return   ::abs(a); }
14:54:02                                                                        ^~~~~
14:54:02                                                                        std::abs
14:54:07 [ 71%] Building HIPCC object caffe2/CMakeFiles/caffe2_hip.dir/__/aten/src/THC/generated/caffe2_hip_generated_THCTensorSortByte.cu.o
14:54:14 In file included from /var/lib/jenkins/workspace/aten/src/THC/generated/THCTensorSortByte.cu:1:
14:54:14 In file included from /var/lib/jenkins/workspace/aten/src/THC/generated/../THCTensorSort.cuh:5:
14:54:14 In file included from /var/lib/jenkins/workspace/aten/src/THC/THCSortUtils.cuh:6:
14:54:14 /var/lib/jenkins/workspace/aten/src/THC/THCNumerics.cuh:740:72: warning: using integer absolute value function 'abs' when argument is of floating point type [-Wabsolute-value]
14:54:14   static inline __host__ __device__  double abs  (double a) { return   ::abs(a); }
14:54:14                                                                        ^
14:54:14 /var/lib/jenkins/workspace/aten/src/THC/THCNumerics.cuh:740:72: note: use function 'std::abs' instead
14:54:14   static inline __host__ __device__  double abs  (double a) { return   ::abs(a); }
14:54:14                                                                        ^~~~~
14:54:14                                                                        std::abs
14:54:18 1 warning generated.
14:54:24 LLVM ERROR: out of memory
14:54:25 /opt/rocm/hcc/bin/clang-7.0(_ZN4llvm3sys15PrintStackTraceERNS_11raw_ostreamE+0x2a)[0x16ea92a]
14:54:25 /opt/rocm/hcc/bin/clang-7.0(_ZN4llvm3sys17RunSignalHandlersEv+0x3e)[0x16e8a0e]
14:54:25 /opt/rocm/hcc/bin/clang-7.0[0x16e8b5c]
14:54:25 /lib/x86_64-linux-gnu/libpthread.so.0(+0x11390)[0x7f7778b9d390]
14:54:25 /lib/x86_64-linux-gnu/libc.so.6(gsignal+0x38)[0x7f777790f428]
14:54:25 /lib/x86_64-linux-gnu/libc.so.6(abort+0x16a)[0x7f777791102a]
14:54:25 /opt/rocm/hcc/bin/clang-7.0(_ZN4llvm22report_bad_alloc_errorEPKcb+0x154)[0x1698e54]
14:54:25 /usr/lib/x86_64-linux-gnu/libstdc++.so.6(_Znwm+0x2c)[0x7f7778250e8c]
14:54:25 /opt/rocm/hcc/bin/clang-7.0[0x100b3c6]
14:54:25 /opt/rocm/hcc/bin/clang-7.0[0x171b3dd]
14:54:25 /opt/rocm/hcc/bin/clang-7.0[0x171b730]
14:54:25 /opt/rocm/hcc/bin/clang-7.0(_ZN4llvm25CloneAndPruneIntoFromInstEPNS_8FunctionEPKS0_PKNS_11InstructionERNS_8ValueMapIPKNS_5ValueENS_14WeakTrackingVHENS_14ValueMapConfigISA_NS_3sys10SmartMutexILb0EEEEEEEbRNS_15SmallVectorImplIPNS_10ReturnInstEEEPKcPNS_14ClonedCodeInfoE+0x107)[0x171ffc7]
14:54:25 /opt/rocm/hcc/bin/clang-7.0(_ZN4llvm14InlineFunctionENS_8CallSiteERNS_18InlineFunctionInfoEPNS_9AAResultsEbPNS_8FunctionE+0xe91)[0x1749461]
14:54:25 /opt/rocm/hcc/bin/clang-7.0(_ZN4llvm17LegacyInlinerBase11inlineCallsERNS_12CallGraphSCCE+0xf4c)[0x129f17c]
14:54:25 /opt/rocm/hcc/bin/clang-7.0[0xd00453]
14:54:25 /opt/rocm/hcc/bin/clang-7.0(_ZN4llvm6legacy15PassManagerImpl3runERNS_6ModuleE+0x304)[0x11f8eb4]
14:54:25 /opt/rocm/hcc/bin/clang-7.0(_ZN5clang17EmitBackendOutputERNS_17DiagnosticsEngineERKNS_19HeaderSearchOptionsERKNS_14CodeGenOptionsERKNS_13TargetOptionsERKNS_11LangOptionsERKN4llvm10DataLayoutEPNSE_6ModuleENS_13BackendActionESt10unique_ptrINSE_17raw_pwrite_streamESt14default_deleteISM_EEb+0xc17)[0x18c5747]
14:54:25 /opt/rocm/hcc/bin/clang-7.0[0x1fe4d92]
14:54:25 /opt/rocm/hcc/bin/clang-7.0(_ZN5clang8ParseASTERNS_4SemaEbb+0x370)[0x27570b0]
14:54:25 /opt/rocm/hcc/bin/clang-7.0(_ZN5clang13CodeGenAction13ExecuteActionEv+0x37)[0x1fe4337]
14:54:25 /opt/rocm/hcc/bin/clang-7.0(_ZN5clang14FrontendAction7ExecuteEv+0x11e)[0x1c9e86e]
14:54:25 /opt/rocm/hcc/bin/clang-7.0(_ZN5clang16CompilerInstance13ExecuteActionERNS_14FrontendActionE+0x146)[0x1c69ae6]
14:54:25 /opt/rocm/hcc/bin/clang-7.0(_ZN5clang25ExecuteCompilerInvocationEPNS_16CompilerInstanceE+0x96c)[0x1d3237c]
14:54:25 /opt/rocm/hcc/bin/clang-7.0(_Z8cc1_mainN4llvm8ArrayRefIPKcEES2_Pv+0xa18)[0x8df7a8]
14:54:25 /opt/rocm/hcc/bin/clang-7.0(main+0x1951)[0x881871]
14:54:25 /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf0)[0x7f77778fa830]
14:54:25 /opt/rocm/hcc/bin/clang-7.0(_start+0x29)[0x8dceb9]
14:54:25 Stack dump:
14:54:25 0.	Program arguments: /opt/rocm/hcc/bin/clang-7.0 -cc1 -D__KALMAR_HC__=1 -D__HCC_HC__=1 -D__KALMAR_CPU__=1 -D__HCC_CPU__=1 -triple x86_64-unknown-linux-gnu -S -disable-free -disable-llvm-verifier -main-file-name THCTensorIndex.cu -mrelocation-model pic -pic-level 2 -mthread-model posix -mdisable-fp-elim -fmath-errno -masm-verbose -mconstructor-aliases -munwind-tables -fuse-init-array -target-cpu x86-64 -dwarf-column-info -debugger-tuning=gdb -coverage-notes-file /var/lib/jenkins/workspace/build/caffe2/CMakeFiles/caffe2_hip.dir/__/aten/src/THC/./caffe2_hip_generated_THCTensorIndex.cu.gcno -resource-dir /opt/rocm/hcc/lib/clang/7.0.0 -I/opt/rocm/hcc/bin/../include -I/opt/rocm/hcc/bin/../hcc/include -D __HIPCC__ -I /opt/rocm/hcc/include -I /opt/rocm/hip/include/hip/hcc_detail/cuda -I /opt/rocm/hsa/include -I /opt/rocm/profiler/CXLActivityLogger/include -I /opt/rocm/hip/include -D HIP_VERSION_MAJOR=1 -D HIP_VERSION_MINOR=5 -D HIP_VERSION_PATCH=18234 -D __HIP_ARCH_GFX900__=1 -I /usr/include/x86_64-linux-gnu -I /usr/include/x86_64-linux-gnu/c++/4.2.1 -I /usr/include/c++/4.2.1 -D __HIP_PLATFORM_HCC__=1 -D CUDA_HAS_FP16=1 -D __HIP_NO_HALF_OPERATORS__=1 -D __HIP_NO_HALF_CONVERSIONS__=1 -I /opt/rocm/hip/include -I /opt/rocm/hcc/include -I /opt/rocm/hsa/include -I /opt/rocm/rocrand/include -I /opt/rocm/hiprand/include -I /opt/rocm/rocblas/include -I /opt/rocm/miopen/include -I /data/Thrust -I /data/Thrust/thrust/system/cuda/detail/cub-hip -I -I/opt/rocm/hip/include -I /opt/rocm/hcc/include -I /opt/rocm/hsa/include -I /opt/rocm/rocrand/include -I /opt/rocm/hiprand/include -I /opt/rocm/rocblas/include -I /opt/rocm/miopen/include -I /data/Thrust -I /data/Thrust/thrust/system/cuda/detail/cub-hip -I -I/var/lib/jenkins/workspace/build/caffe2/aten/src/TH -I /var/lib/jenkins/workspace/aten/src/TH -I /var/lib/jenkins/workspace/build/caffe2/aten/src/THC -I /var/lib/jenkins/workspace/aten/src/THC -I /var/lib/jenkins/workspace/aten/src/THCUNN -I /var/lib/jenkins/workspace/aten/src/ATen/cuda -I /var/lib/jenkins/workspace/build/caffe2/aten/src/TH -I /var/lib/jenkins/workspace/aten/src/TH -I /var/lib/jenkins/workspace/aten/src/TH -I /var/lib/jenkins/workspace/aten/src/THC -I /var/lib/jenkins/workspace/build/caffe2/aten/src/TH -I /var/lib/jenkins/workspace/build/caffe2/aten/src/THC -I /var/lib/jenkins/workspace/aten/src -I /var/lib/jenkins/workspace/build/caffe2/aten/src -I /var/lib/jenkins/workspace/build/aten/src -I /var/lib/jenkins/workspace/aten/src/THNN -I /var/lib/jenkins/workspace/aten/src/THCUNN -I /var/lib/jenkins/workspace/aten/src -I /var/lib/jenkins/workspace/aten/../third_party/catch/single_include -I /var/lib/jenkins/workspace/build/caffe2/aten/src/ATen -I /var/lib/jenkins/workspace/aten/src/ATen/.. -I /var/lib/jenkins/workspace/build/caffe2/aten/src/ATen -I /var/lib/jenkins/workspace/build -I /var/lib/jenkins/workspace -I -I/var/lib/jenkins/workspace/third_party/protobuf/src -I /var/lib/jenkins/workspace/cmake/../third_party/eigen -I /var/lib/jenkins/workspace/cmake/../third_party/pybind11/include -I /opt/rocm/hip/include -I /opt/rocm/hipblas/include -I /opt/rocm/hcsparse/include -I /opt/rocm/hcrng/include -I /data/Thrust -I /var/lib/jenkins/workspace/third_party/onnx -I /var/lib/jenkins/workspace/build/third_party/onnx -internal-isystem /usr/lib/gcc/x86_64-linux-gnu/5.4.0/../../../../include/c++/5.4.0 -internal-isystem /usr/lib/gcc/x86_64-linux-gnu/5.4.0/../../../../include/x86_64-linux-gnu/c++/5.4.0 -internal-isystem /usr/lib/gcc/x86_64-linux-gnu/5.4.0/../../../../include/x86_64-linux-gnu/c++/5.4.0 -internal-isystem /usr/lib/gcc/x86_64-linux-gnu/5.4.0/../../../../include/c++/5.4.0/backward -internal-isystem /usr/local/include -internal-isystem /opt/rocm/hcc/lib/clang/7.0.0/include -internal-externc-isystem /usr/include/x86_64-linux-gnu -internal-externc-isystem /include -internal-externc-isystem /usr/include -Wno-deprecated-register -Wno-macro-redefined -Wno-inconsistent-missing-override -Wno-exceptions -Wno-shift-count-negative -Wno-shift-count-overflow -Wno-unused-command-line-argument -std=c++amp -fdeprecated-macro -fdebug-compilation-dir /var/lib/jenkins/workspace/build/caffe2/CMakeFiles/caffe2_hip.dir/__/aten/src/THC -ferror-limit 19 -fmessage-length 0 -fobjc-runtime=gcc -fcxx-exceptions -fexceptions -fdiagnostics-show-option -famp -fhsa-ext -o /tmp/THCTensorIndex-79ee72.s -x hc-host /var/lib/jenkins/workspace/aten/src/THC/THCTensorIndex.cu -emit-llvm-bc 
14:54:25 1.	<eof> parser at end of file
14:54:25 2.	Per-module optimization passes
14:54:25 3.	Running pass 'CallGraph Pass Manager' on module '/var/lib/jenkins/workspace/aten/src/THC/THCTensorIndex.cu'.
14:54:26 clang-7.0: error: unable to execute command: Aborted (core dumped)
14:54:37 [ 71%] Building HIPCC object caffe2/CMakeFiles/caffe2_hip.dir/__/aten/src/THC/generated/caffe2_hip_generated_THCTensorMathCompareTByte.cu.o
14:54:42 In file included from /var/lib/jenkins/workspace/aten/src/THC/generated/THCTensorMathCompareTByte.cu:1:
14:54:42 In file included from /var/lib/jenkins/workspace/aten/src/THC/generated/../THCTensorMathCompareT.cuh:8:
14:54:42 /var/lib/jenkins/workspace/aten/src/THC/THCNumerics.cuh:740:72: warning: using integer absolute value function 'abs' when argument is of floating point type [-Wabsolute-value]
14:54:42   static inline __host__ __device__  double abs  (double a) { return   ::abs(a); }
14:54:42                                                                        ^
14:54:42 /var/lib/jenkins/workspace/aten/src/THC/THCNumerics.cuh:740:72: note: use function 'std::abs' instead
14:54:42   static inline __host__ __device__  double abs  (double a) { return   ::abs(a); }
14:54:42                                                                        ^~~~~
14:54:42                                                                        std::abs
14:54:52 1 warning generated.
14:55:12 LLVM ERROR: out of memory
14:55:12 /opt/rocm/hcc/bin/clang-7.0(_ZN4llvm3sys15PrintStackTraceERNS_11raw_ostreamE+0x2a)[0x16ea92a]
14:55:12 /opt/rocm/hcc/bin/clang-7.0(_ZN4llvm3sys17RunSignalHandlersEv+0x3e)[0x16e8a0e]
14:55:12 /opt/rocm/hcc/bin/clang-7.0[0x16e8b5c]
14:55:12 /lib/x86_64-linux-gnu/libpthread.so.0(+0x11390)[0x7fcb33b65390]
14:55:12 /lib/x86_64-linux-gnu/libc.so.6(gsignal+0x38)[0x7fcb328d7428]
14:55:12 /lib/x86_64-linux-gnu/libc.so.6(abort+0x16a)[0x7fcb328d902a]
14:55:12 /opt/rocm/hcc/bin/clang-7.0(_ZN4llvm22report_bad_alloc_errorEPKcb+0x154)[0x1698e54]
14:55:12 /usr/lib/x86_64-linux-gnu/libstdc++.so.6(_Znwm+0x2c)[0x7fcb33218e8c]
14:55:12 /opt/rocm/hcc/bin/clang-7.0[0x120fbb8]
14:55:12 /opt/rocm/hcc/bin/clang-7.0(_ZN4llvm11Instruction11setMetadataEjPNS_6MDNodeE+0x23a)[0x12105ea]
14:55:12 /opt/rocm/hcc/bin/clang-7.0(_ZN4llvm11Instruction12copyMetadataERKS0_NS_8ArrayRefIjEE+0x58f)[0x11d53df]
14:55:12 /opt/rocm/hcc/bin/clang-7.0(_ZNK4llvm11Instruction5cloneEv+0x5a)[0x11d570a]
14:55:12 /opt/rocm/hcc/bin/clang-7.0[0x171b7e5]
14:55:12 /opt/rocm/hcc/bin/clang-7.0(_ZN4llvm25CloneAndPruneIntoFromInstEPNS_8FunctionEPKS0_PKNS_11InstructionERNS_8ValueMapIPKNS_5ValueENS_14WeakTrackingVHENS_14ValueMapConfigISA_NS_3sys10SmartMutexILb0EEEEEEEbRNS_15SmallVectorImplIPNS_10ReturnInstEEEPKcPNS_14ClonedCodeInfoE+0x107)[0x171ffc7]
14:55:12 /opt/rocm/hcc/bin/clang-7.0(_ZN4llvm14InlineFunctionENS_8CallSiteERNS_18InlineFunctionInfoEPNS_9AAResultsEbPNS_8FunctionE+0xe91)[0x1749461]
14:55:12 /opt/rocm/hcc/bin/clang-7.0(_ZN4llvm17LegacyInlinerBase11inlineCallsERNS_12CallGraphSCCE+0xf4c)[0x129f17c]
14:55:12 /opt/rocm/hcc/bin/clang-7.0[0xd00453]
14:55:12 /opt/rocm/hcc/bin/clang-7.0(_ZN4llvm6legacy15PassManagerImpl3runERNS_6ModuleE+0x304)[0x11f8eb4]
14:55:12 /opt/rocm/hcc/bin/clang-7.0(_ZN5clang17EmitBackendOutputERNS_17DiagnosticsEngineERKNS_19HeaderSearchOptionsERKNS_14CodeGenOptionsERKNS_13TargetOptionsERKNS_11LangOptionsERKN4llvm10DataLayoutEPNSE_6ModuleENS_13BackendActionESt10unique_ptrINSE_17raw_pwrite_streamESt14default_deleteISM_EEb+0xc17)[0x18c5747]
14:55:12 /opt/rocm/hcc/bin/clang-7.0[0x1fe4d92]
14:55:12 /opt/rocm/hcc/bin/clang-7.0(_ZN5clang8ParseASTERNS_4SemaEbb+0x370)[0x27570b0]
14:55:12 /opt/rocm/hcc/bin/clang-7.0(_ZN5clang13CodeGenAction13ExecuteActionEv+0x37)[0x1fe4337]
14:55:12 /opt/rocm/hcc/bin/clang-7.0(_ZN5clang14FrontendAction7ExecuteEv+0x11e)[0x1c9e86e]
14:55:12 /opt/rocm/hcc/bin/clang-7.0(_ZN5clang16CompilerInstance13ExecuteActionERNS_14FrontendActionE+0x146)[0x1c69ae6]
14:55:12 /opt/rocm/hcc/bin/clang-7.0(_ZN5clang25ExecuteCompilerInvocationEPNS_16CompilerInstanceE+0x96c)[0x1d3237c]
14:55:12 /opt/rocm/hcc/bin/clang-7.0(_Z8cc1_mainN4llvm8ArrayRefIPKcEES2_Pv+0xa18)[0x8df7a8]
14:55:12 /opt/rocm/hcc/bin/clang-7.0(main+0x1951)[0x881871]
14:55:12 /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf0)[0x7fcb328c2830]
14:55:12 /opt/rocm/hcc/bin/clang-7.0(_start+0x29)[0x8dceb9]
14:55:12 Stack dump:
14:55:12 0.	Program arguments: /opt/rocm/hcc/bin/clang-7.0 -cc1 -D__KALMAR_HC__=1 -D__HCC_HC__=1 -D__KALMAR_CPU__=1 -D__HCC_CPU__=1 -triple x86_64-unknown-linux-gnu -S -disable-free -disable-llvm-verifier -main-file-name THCTensorMode.cu -mrelocation-model pic -pic-level 2 -mthread-model posix -mdisable-fp-elim -fmath-errno -masm-verbose -mconstructor-aliases -munwind-tables -fuse-init-array -target-cpu x86-64 -dwarf-column-info -debugger-tuning=gdb -coverage-notes-file /var/lib/jenkins/workspace/build/caffe2/CMakeFiles/caffe2_hip.dir/__/aten/src/THC/./caffe2_hip_generated_THCTensorMode.cu.gcno -resource-dir /opt/rocm/hcc/lib/clang/7.0.0 -I/opt/rocm/hcc/bin/../include -I/opt/rocm/hcc/bin/../hcc/include -D __HIPCC__ -I /opt/rocm/hcc/include -I /opt/rocm/hip/include/hip/hcc_detail/cuda -I /opt/rocm/hsa/include -I /opt/rocm/profiler/CXLActivityLogger/include -I /opt/rocm/hip/include -D HIP_VERSION_MAJOR=1 -D HIP_VERSION_MINOR=5 -D HIP_VERSION_PATCH=18234 -D __HIP_ARCH_GFX900__=1 -I /usr/include/x86_64-linux-gnu -I /usr/include/x86_64-linux-gnu/c++/4.2.1 -I /usr/include/c++/4.2.1 -D __HIP_PLATFORM_HCC__=1 -D CUDA_HAS_FP16=1 -D __HIP_NO_HALF_OPERATORS__=1 -D __HIP_NO_HALF_CONVERSIONS__=1 -I /opt/rocm/hip/include -I /opt/rocm/hcc/include -I /opt/rocm/hsa/include -I /opt/rocm/rocrand/include -I /opt/rocm/hiprand/include -I /opt/rocm/rocblas/include -I /opt/rocm/miopen/include -I /data/Thrust -I /data/Thrust/thrust/system/cuda/detail/cub-hip -I -I/opt/rocm/hip/include -I /opt/rocm/hcc/include -I /opt/rocm/hsa/include -I /opt/rocm/rocrand/include -I /opt/rocm/hiprand/include -I /opt/rocm/rocblas/include -I /opt/rocm/miopen/include -I /data/Thrust -I /data/Thrust/thrust/system/cuda/detail/cub-hip -I -I/var/lib/jenkins/workspace/build/caffe2/aten/src/TH -I /var/lib/jenkins/workspace/aten/src/TH -I /var/lib/jenkins/workspace/build/caffe2/aten/src/THC -I /var/lib/jenkins/workspace/aten/src/THC -I /var/lib/jenkins/workspace/aten/src/THCUNN -I /var/lib/jenkins/workspace/aten/src/ATen/cuda -I /var/lib/jenkins/workspace/build/caffe2/aten/src/TH -I /var/lib/jenkins/workspace/aten/src/TH -I /var/lib/jenkins/workspace/aten/src/TH -I /var/lib/jenkins/workspace/aten/src/THC -I /var/lib/jenkins/workspace/build/caffe2/aten/src/TH -I /var/lib/jenkins/workspace/build/caffe2/aten/src/THC -I /var/lib/jenkins/workspace/aten/src -I /var/lib/jenkins/workspace/build/caffe2/aten/src -I /var/lib/jenkins/workspace/build/aten/src -I /var/lib/jenkins/workspace/aten/src/THNN -I /var/lib/jenkins/workspace/aten/src/THCUNN -I /var/lib/jenkins/workspace/aten/src -I /var/lib/jenkins/workspace/aten/../third_party/catch/single_include -I /var/lib/jenkins/workspace/build/caffe2/aten/src/ATen -I /var/lib/jenkins/workspace/aten/src/ATen/.. -I /var/lib/jenkins/workspace/build/caffe2/aten/src/ATen -I /var/lib/jenkins/workspace/build -I /var/lib/jenkins/workspace -I -I/var/lib/jenkins/workspace/third_party/protobuf/src -I /var/lib/jenkins/workspace/cmake/../third_party/eigen -I /var/lib/jenkins/workspace/cmake/../third_party/pybind11/include -I /opt/rocm/hip/include -I /opt/rocm/hipblas/include -I /opt/rocm/hcsparse/include -I /opt/rocm/hcrng/include -I /data/Thrust -I /var/lib/jenkins/workspace/third_party/onnx -I /var/lib/jenkins/workspace/build/third_party/onnx -internal-isystem /usr/lib/gcc/x86_64-linux-gnu/5.4.0/../../../../include/c++/5.4.0 -internal-isystem /usr/lib/gcc/x86_64-linux-gnu/5.4.0/../../../../include/x86_64-linux-gnu/c++/5.4.0 -internal-isystem /usr/lib/gcc/x86_64-linux-gnu/5.4.0/../../../../include/x86_64-linux-gnu/c++/5.4.0 -internal-isystem /usr/lib/gcc/x86_64-linux-gnu/5.4.0/../../../../include/c++/5.4.0/backward -internal-isystem /usr/local/include -internal-isystem /opt/rocm/hcc/lib/clang/7.0.0/include -internal-externc-isystem /usr/include/x86_64-linux-gnu -internal-externc-isystem /include -internal-externc-isystem /usr/include -Wno-deprecated-register -Wno-macro-redefined -Wno-inconsistent-missing-override -Wno-exceptions -Wno-shift-count-negative -Wno-shift-count-overflow -Wno-unused-command-line-argument -std=c++amp -fdeprecated-macro -fdebug-compilation-dir /var/lib/jenkins/workspace/build/caffe2/CMakeFiles/caffe2_hip.dir/__/aten/src/THC -ferror-limit 19 -fmessage-length 0 -fobjc-runtime=gcc -fcxx-exceptions -fexceptions -fdiagnostics-show-option -famp -fhsa-ext -o /tmp/THCTensorMode-e8947e.s -x hc-host /var/lib/jenkins/workspace/aten/src/THC/THCTensorMode.cu -emit-llvm-bc 
14:55:12 1.	<eof> parser at end of file
14:55:12 2.	Per-module optimization passes
14:55:12 3.	Running pass 'CallGraph Pass Manager' on module '/var/lib/jenkins/workspace/aten/src/THC/THCTensorMode.cu'.
14:55:12 clang-7.0: error: unable to execute command: Aborted (core dumped)

CC @Jorghi12 @bddppq

Copy link
Contributor

@ezyang ezyang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! Inverse index algorithm is not "great" but it seems to be good enough.

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ezyang has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@ezyang
Copy link
Contributor

ezyang commented Jun 28, 2018

@pytorchbot retest this please

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ezyang has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@ezyang
Copy link
Contributor

ezyang commented Jun 29, 2018

@yueyericardo If you want to unblock this for merge, I'd advise preprocessoring out the implementation when ROCm is building (so we'll have unique for CUDA but not ROCm). Use __HIP_PLATFORM_HCC__ to test for it.

@yueyericardo
Copy link
Contributor Author

yueyericardo commented Jul 2, 2018

@ezyang I already implemented your advise.Thank you.

@ezyang
Copy link
Contributor

ezyang commented Jul 2, 2018

Great, thanks!

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ezyang has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

zdevito pushed a commit to zdevito/ATen that referenced this pull request Jul 9, 2018
Summary:
Add cuda support for unique.

There is a simple test below for a tensor including 1M <int> data.
And the performance is faster.

```python
Performance
cpu: 0.05040597915649414 s
x: tensor([1, 3, 1,  ..., 4, 9, 4])
x output: tensor([1, 2, 3, 4, 5, 6, 7, 8, 9])
x inverse: tensor([0, 2, 0,  ..., 3, 8, 3])

gpu: 0.015192985534667969 s
y: tensor([1, 3, 1,  ..., 4, 9, 4], device='cuda:0')
y output: tensor([1, 2, 3, 4, 5, 6, 7, 8, 9], device='cuda:0')
y inverse: tensor([0, 2, 0,  ..., 3, 8, 3], device='cuda:0')
```

```python
Code
import torch
import time
x=torch.randint(1,10,(1000000,),dtype=torch.long)
device = torch.device("cuda")
y=x.to(device)
start = time.time();
output,inverse = x.unique(sorted=True,return_inverse=True)
stop = time.time();
print('cpu:',stop-start,'s')
print('x:',x)
print('x output:',output)
print('x inverse:',inverse)

start = time.time();
output1,inverse1 = y.unique(sorted=True,return_inverse=True)
torch.cuda.synchronize();
stop = time.time();
print('gpu:',stop-start,'s')
print('y:',y)
print('y output:',output1)
print('y inverse:',inverse1)
```
Closes pytorch/pytorch#8899

Reviewed By: SsnL

Differential Revision: D8677655

Pulled By: ezyang

fbshipit-source-id: 09df3f0602f235c5d36c7a6e7e1d89dbf82570bb
zdevito pushed a commit to zdevito/ATen that referenced this pull request Jul 13, 2018
Summary:
Add cuda support for unique.

There is a simple test below for a tensor including 1M <int> data.
And the performance is faster.

```python
Performance
cpu: 0.05040597915649414 s
x: tensor([1, 3, 1,  ..., 4, 9, 4])
x output: tensor([1, 2, 3, 4, 5, 6, 7, 8, 9])
x inverse: tensor([0, 2, 0,  ..., 3, 8, 3])

gpu: 0.015192985534667969 s
y: tensor([1, 3, 1,  ..., 4, 9, 4], device='cuda:0')
y output: tensor([1, 2, 3, 4, 5, 6, 7, 8, 9], device='cuda:0')
y inverse: tensor([0, 2, 0,  ..., 3, 8, 3], device='cuda:0')
```

```python
Code
import torch
import time
x=torch.randint(1,10,(1000000,),dtype=torch.long)
device = torch.device("cuda")
y=x.to(device)
start = time.time();
output,inverse = x.unique(sorted=True,return_inverse=True)
stop = time.time();
print('cpu:',stop-start,'s')
print('x:',x)
print('x output:',output)
print('x inverse:',inverse)

start = time.time();
output1,inverse1 = y.unique(sorted=True,return_inverse=True)
torch.cuda.synchronize();
stop = time.time();
print('gpu:',stop-start,'s')
print('y:',y)
print('y output:',output1)
print('y inverse:',inverse1)
```
Closes pytorch/pytorch#8899

Reviewed By: SsnL

Differential Revision: D8677655

Pulled By: ezyang

fbshipit-source-id: 09df3f0602f235c5d36c7a6e7e1d89dbf82570bb
goodlux pushed a commit to goodlux/pytorch that referenced this pull request Aug 15, 2018
Summary:
Add cuda support for unique.

There is a simple test below for a tensor including 1M <int> data.
And the performance is faster.

```python
Performance
cpu: 0.05040597915649414 s
x: tensor([1, 3, 1,  ..., 4, 9, 4])
x output: tensor([1, 2, 3, 4, 5, 6, 7, 8, 9])
x inverse: tensor([0, 2, 0,  ..., 3, 8, 3])

gpu: 0.015192985534667969 s
y: tensor([1, 3, 1,  ..., 4, 9, 4], device='cuda:0')
y output: tensor([1, 2, 3, 4, 5, 6, 7, 8, 9], device='cuda:0')
y inverse: tensor([0, 2, 0,  ..., 3, 8, 3], device='cuda:0')
```

```python
Code
import torch
import time
x=torch.randint(1,10,(1000000,),dtype=torch.long)
device = torch.device("cuda")
y=x.to(device)
start = time.time();
output,inverse = x.unique(sorted=True,return_inverse=True)
stop = time.time();
print('cpu:',stop-start,'s')
print('x:',x)
print('x output:',output)
print('x inverse:',inverse)

start = time.time();
output1,inverse1 = y.unique(sorted=True,return_inverse=True)
torch.cuda.synchronize();
stop = time.time();
print('gpu:',stop-start,'s')
print('y:',y)
print('y output:',output1)
print('y inverse:',inverse1)
```
Closes pytorch#8899

Reviewed By: SsnL

Differential Revision: D8677655

Pulled By: ezyang

fbshipit-source-id: 09df3f0602f235c5d36c7a6e7e1d89dbf82570bb
theweiho added a commit to theweiho/translate that referenced this pull request May 16, 2019
Summary:
pytorch/pytorch#8899 had added CUDA support for `torch.unique()`

pytorch/pytorch#16145 has some timing stats that could be relevant

 ---

Experiment results: https://fb.quip.com/olQOA853j0mb
Words per second (`gpu-unique_wps_avg_vs_base`): 1.046x
Total train time (`gpu-unique_total_train_time_vs_base`; excl ar_AR-fr_XX): 0.987x

Even though train time reduction is pretty minimal (probably overshadowed by random variance, scheduling delay, etc), WPS does seem to be ~5% faster - so might as well land this.

Training time for ar_AR-fr_XX increased significantly - but that's b/c it trained for many more updates (`gpu-unique_num_updates_avg_vs_base`) - and also ended up w/ +1.43 BLEU. I think this is probably just an anomaly?

Differential Revision: D15073468

fbshipit-source-id: a9710738c827013afb35a67bd3a9be259b0e2d5f
theweiho added a commit to theweiho/translate that referenced this pull request May 16, 2019
…torch#537)

Summary:
Pull Request resolved: pytorch#537

pytorch/pytorch#8899 had added CUDA support for `torch.unique()`

pytorch/pytorch#16145 has some timing stats that could be relevant

 ---

Experiment results: https://fb.quip.com/olQOA853j0mb
Words per second (`gpu-unique_wps_avg_vs_base`): 1.046x
Total train time (`gpu-unique_total_train_time_vs_base`; excl ar_AR-fr_XX): 0.987x

Even though train time reduction is pretty minimal (probably overshadowed by random variance, scheduling delay, etc), WPS does seem to be ~5% faster - so might as well land this.

Training time for ar_AR-fr_XX increased significantly - but that's b/c it trained for many more updates (`gpu-unique_num_updates_avg_vs_base`) - and also ended up w/ +1.43 BLEU. I think this is probably just an anomaly?

Differential Revision: D15073468

fbshipit-source-id: 713288fc7c77f582840f270dd2e343a3b63f8fe5
theweiho added a commit to theweiho/translate that referenced this pull request May 16, 2019
…torch#537)

Summary:
Pull Request resolved: pytorch#537

pytorch/pytorch#8899 had added CUDA support for `torch.unique()`

pytorch/pytorch#16145 has some timing stats that could be relevant

 ---

Experiment results: https://fb.quip.com/olQOA853j0mb
Words per second (`gpu-unique_wps_avg_vs_base`): 1.046x
Total train time (`gpu-unique_total_train_time_vs_base`; excl ar_AR-fr_XX): 0.987x

Even though train time reduction is pretty minimal (probably overshadowed by random variance, scheduling delay, etc), WPS does seem to be ~5% faster - so might as well land this.

Training time for ar_AR-fr_XX increased significantly - but that's b/c it trained for many more updates (`gpu-unique_num_updates_avg_vs_base`) - and also ended up w/ +1.43 BLEU. I think this is probably just an anomaly?

Differential Revision: D15073468

fbshipit-source-id: 29c7eaaddd63d629866c7314920fe27b22690603
facebook-github-bot pushed a commit to pytorch/translate that referenced this pull request May 17, 2019
Summary:
Pull Request resolved: #537

pytorch/pytorch#8899 had added CUDA support for `torch.unique()`

pytorch/pytorch#16145 has some timing stats that could be relevant

 ---

Experiment results: https://fb.quip.com/olQOA853j0mb
Words per second (`gpu-unique_wps_avg_vs_base`): 1.046x
Total train time (`gpu-unique_total_train_time_vs_base`; excl ar_AR-fr_XX): 0.987x

Even though train time reduction is pretty minimal (probably overshadowed by random variance, scheduling delay, etc), WPS does seem to be ~5% faster - so might as well land this.

Training time for ar_AR-fr_XX increased significantly - but that's b/c it trained for many more updates (`gpu-unique_num_updates_avg_vs_base`) - and also ended up w/ +1.43 BLEU. I think this is probably just an anomaly?

Reviewed By: akinh, jmp84

Differential Revision: D15073468

fbshipit-source-id: c2dba562b6d4fb4d15d2a56d03ce6a6e3ddff07d
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants