Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
967 commits
Select commit Hold shift + click to select a range
b366d67
Spelling fix in MultivariateNormal docstring (#7915)
aryamccarthy May 29, 2018
6541e96
[c10d] MPI Process Group Implementation (#7783)
teng-li May 29, 2018
42722c7
Fix Windows doc for import error (#7704)
peterjc123 May 29, 2018
dc12ca6
Moved condition for dilated grouped convolutions to CUDNN convolution…
iamannakogan May 29, 2018
d1bcd4f
Updates to caffe2 operator documentation (#7917)
inkawhich May 29, 2018
d2b1938
[auto] Update onnx to 307995b - Update from upstream (onnx/onnx#1038)
onnxbot May 29, 2018
cdf5574
Test if ASAN is actually working as part of ASAN tests. (#6050)
ezyang May 30, 2018
3d16f3d
Split up detail.h (#7836)
goldsborough May 30, 2018
0793719
Fix THCUNN SpatialDepthwiseConvolution assuming contiguity (#7952)
ssnl May 30, 2018
b1e76f4
Fix fbcode compatibility (#7939)
smessmer May 30, 2018
e779875
add test for correctness of transpose fusion (#7950)
anderspapitto May 30, 2018
b612652
[JIT][script] Fix emitted gather and slice for dynamic indices (#7861)
May 30, 2018
3c7e798
cache and use BLAS_SET_BY_USER so that it doesn't set itself to TRUE …
soumith May 30, 2018
722d471
Add unsafe flag to skip checking in prepare (#7832)
May 30, 2018
746fb43
Rename cuda::type to cuda::into_type and provide cuda::from_type. (#7…
gchanan May 30, 2018
641ad5d
Try to fix TORCH_CUDA_ARCH_LIST for PyTorch again (#7936)
ssnl May 30, 2018
b8bb35e
remove sort requirement from pad-sequence (#7928)
zou3519 May 30, 2018
d73405f
Fix checkBackend error message (#7926)
zou3519 May 30, 2018
53543ed
Split CI tests in half and run them in parallel (#7867)
yf225 May 30, 2018
60414b2
Handling of scalars in torch.Size (#5676)
zou3519 May 30, 2018
6594a71
[JIT] Fission and fusion passes for addmm (#7938)
May 30, 2018
5c2d4a4
Set smaller grain size for some cases (#7941)
cpuhrsch May 30, 2018
818d9c7
Fix returning scalar input in Python autograd function (#7934)
ssnl May 30, 2018
ce45183
Prevent git autocrlf for bash scripts (#7949)
kohr-h May 30, 2018
5c1ec5d
Delete unused file (#7919)
colesbury May 30, 2018
920fad6
Fix typo in autodiff formula for addmm (#7932)
May 30, 2018
f319fca
1) use meshgrid for flip() CPU implementation, only need one copy of …
weiyangfb May 30, 2018
921cb6f
[caffe2] YellowFin parameter update GPU code fix. (#6993)
edubois May 30, 2018
d71d7ae
[Caffe2] Keep name of caffe2_pybind11_state and caffe2_pybind11_state…
pooyadavoodi May 30, 2018
9ac676f
Allowing MatMul to create a gradient even with 3 inputs. useful if yo…
Swetko May 30, 2018
5a1fbf6
added const for local variables
weiyangfb May 31, 2018
c54fc26
Fix the cpp libtorch CUDA build (#7975)
orionr May 31, 2018
aafd333
Use mingfeima's mkldnn (#7977)
cpuhrsch May 31, 2018
3b18815
Fix the import part of the windows doc (#7979)
peterjc123 May 31, 2018
d2b8961
Change perf test folder after git checkout (#7980)
yf225 May 31, 2018
8f4711d
Move the broadcast check in MKL Add/Sum to runtime (#7978)
bddppq May 31, 2018
1664fe6
Use Glog's implementation of STL logging when possible. (#7206)
xkszltl May 31, 2018
519285c
[Hotfix] Bring back warnings and -Werror to ATen (#7866)
goldsborough May 31, 2018
49c468a
Enable ONNX backend Mean tests (#7985)
bddppq May 31, 2018
5904096
Add third wayt to determine IS_CONDA (#7971)
cpuhrsch May 31, 2018
5311b95
Fix EmbeddingBag max_norm option (#7959)
ssnl May 31, 2018
7ac3939
Raise error when torch.load a storage on a non-existing device (#7921)
zou3519 May 31, 2018
a911273
Make THStorage / THCStorage have void* data ptr. (#7964)
gchanan May 31, 2018
a5e6bda
Import/export observer symbols for DLL, which fixes the linking error…
xkszltl May 31, 2018
9fa2627
Remove python bindings for `torch.slice` (#7924)
sethah May 31, 2018
e0a9177
Build ONNX for PyTorch version of libcaffe2 (#7967)
orionr May 31, 2018
04578c0
support loading gzip (#6490)
li-roy May 31, 2018
6ff2039
Add memory leak check in CUDA tests (#7270)
ssnl May 31, 2018
d3c5a3e
Revert "Set smaller grain size for some cases" (#7988)
ezyang May 31, 2018
f7e821d
Entry for c10d in CODEOWNERS (#8001)
pietern May 31, 2018
80bb6b7
Fix a couple of typos (#7998)
dmitriy-serdyuk May 31, 2018
58ed5cd
Add on-stack observer cache for Observable (#7931)
May 31, 2018
e0e0125
Reduce grain size for Unary operations (#8003)
cpuhrsch May 31, 2018
b9466be
[auto] Update onnx to 8ec0e5f - Add index check for Transpose's type …
onnxbot Jun 1, 2018
bf67b80
Make AT_FORALL_SCALAR_TYPES usable outside of at::namespace. (#7935)
gchanan Jun 1, 2018
6489b5e
Remove WITH_ROCM cmake flag/variable (use USE_ROCM solely) (#8013)
bddppq Jun 1, 2018
a1bc89e
Mention the pytorch-ci-hud on the README. (#8004)
ezyang Jun 1, 2018
e4ed8c7
Re-enable build env check (#7969)
yf225 Jun 1, 2018
e0e7d67
Update nn.rst (#8029)
zuoxingdong Jun 1, 2018
f9ff686
Example for Transformed Distribution (#8011)
vishwakftw Jun 1, 2018
a475992
[auto] Update onnx to 33e9cd4 - Remove the usage of default value to …
onnxbot Jun 1, 2018
30bd61a
[auto] Update onnx to 1504a33 - Convert schema assert for duplicate t…
onnxbot Jun 1, 2018
89a2032
Support CUDA tensors in ProcessGroupGloo (#7694)
pietern Jun 1, 2018
b2edb83
[auto] Update onnx to 3fb9656 - Fix for fbcode CI (onnx/onnx#1062)
onnxbot Jun 1, 2018
661218b
propagate nan in some activations (#8033)
ssnl Jun 1, 2018
ff309bb
Fix profiler crash when no events register (#8034)
Jun 1, 2018
90ac702
Allow CI testing with different AVX configs (#8020)
yf225 Jun 1, 2018
61ed558
Support for generating ATen during the fbcode build, rather than comm…
anderspapitto Jun 1, 2018
afa0076
Factor python dependency out of interpreter (#7970)
zdevito Jun 1, 2018
dcbd232
[auto] Update onnx to 760c928 - add missing hasNInputShapes check for…
onnxbot Jun 1, 2018
94eba55
Support modules that output scalar in Gather (and data parallel) (#7973)
ssnl Jun 1, 2018
f4d9733
[auto] Update onnx to 9e7855d - Remove PyTorch generated Upsample tes…
onnxbot Jun 1, 2018
be7f239
[script] Add support for torch.zeros, torch.ones, etc. (#7799)
zdevito Jun 1, 2018
26f2a86
Add profiling annotations to NeuralNet[Operator|Data] (#8005)
yyetim Jun 1, 2018
22af5a0
Update from facebook 1ee4edd286a3 (#8040)
bwasti Jun 1, 2018
63ca74d
Skip CUDA memory leak test on BN tests on windows (#8043)
ssnl Jun 1, 2018
5c6520f
workaround for Sequential when one cannot retrieve python source (#8048)
soumith Jun 1, 2018
e532164
[auto] Update onnx to 0dbec2a - - Generate protoc type hints on Windo…
onnxbot Jun 1, 2018
3f7a4a3
[auto] Update onnx to 4f8ef17 - Remove erroneous documentation around…
onnxbot Jun 2, 2018
37f5a8c
[auto] Update onnx to e6a500e - Extract constant to initializer (onnx…
onnxbot Jun 2, 2018
2f50873
[auto] Update onnx to 033f956 - make gcc happy (onnx/onnx#1061)
onnxbot Jun 2, 2018
fd91212
Remove NO_PYTHON macros from Exceptions.h/cpp (#8007)
zrphercule Jun 2, 2018
fa87c79
[ready] Clean up torch.distributions (#8046)
vishwakftw Jun 2, 2018
f6dad69
Have a single THStorage and THCStorage type. (#8030)
gchanan Jun 2, 2018
cdfd577
Reduce usages of TensorUtils<T>::DataType in THC. (#8056)
gchanan Jun 2, 2018
ca3d3a8
Support to run ONNX Upsample operator (mode=nearest) in Caffe2 (#8037)
Jun 2, 2018
2c79dac
[auto] Update onnx to eb12f72 - Add conv transpose test cases (onnx/o…
onnxbot Jun 2, 2018
4d33785
[auto] Update onnx to bd98abb - Add a hook for doing post-processing …
onnxbot Jun 2, 2018
872c373
Skip ConvTraspose ONNX backend tests (#8074)
bddppq Jun 2, 2018
f34e86e
Post process onnx proto (#8064)
bddppq Jun 2, 2018
aa7001e
Add code for TensorBoard visualization of JIT GraphExecutors (#8050)
apaszke Jun 2, 2018
23a6738
[auto] Update onnx to cc26486 - bump version to 7 for prelu. (onnx/on…
onnxbot Jun 3, 2018
a5583ac
[auto] Update onnx to 356208d - add input tensor dimension checks to …
onnxbot Jun 3, 2018
3c7dec0
Move backtrace to its own header (#8096)
goldsborough Jun 4, 2018
a2d43b4
Fix and ignore some warnings (#8081)
bddppq Jun 4, 2018
9cf97a9
Do an additional sanity check that nvcc and CUDA include dir agree. (…
ezyang Jun 4, 2018
b5c3e23
use regex in kwarg parser (#8061)
sethah Jun 4, 2018
429b0c2
Removing remaining NO_PYTHON ifdefs (#8067)
zdevito Jun 4, 2018
e2cd664
Replace std::size_t with size_t (#8093)
goldsborough Jun 4, 2018
a5bbb22
Remove out-of-date comment (#8114)
colesbury Jun 4, 2018
b26f93b
[Caffe2] Enabling AMD GPU Backend for Caffe2 (#7955)
bddppq Jun 4, 2018
dcfe96c
Detect CUDNN related environment variables in cmake (#8082)
bddppq Jun 4, 2018
f12a507
Implement adaptive softmax (#5287)
elanmart Jun 4, 2018
5cf86c3
Make libshm also test if rt requires pthread. (#8112)
ezyang Jun 4, 2018
5b9f05b
[auto] Update onnx to 2d5ce4a - Remove empty model (onnx/onnx#1058)
onnxbot Jun 4, 2018
64a8fe2
Add missing pragma once. (#8118)
ezyang Jun 4, 2018
db97ae1
[auto] Update onnx to 2a87616 - Tests for LRN operator (onnx/onnx#903)
onnxbot Jun 4, 2018
33cbb42
Split SparseTensorImpl off from TensorImpl. (#7990)
ezyang Jun 4, 2018
b17cf72
[Caffe2] Support non peer access in muji and fix bug when reduced_aff…
daquexian Jun 4, 2018
c65b6c5
Skip OnnxBackendNodeModelTest::test_lrn_default_cuda that causes segf…
bddppq Jun 4, 2018
44e94b7
Replace most remaining usages of TensorUtils<T>::DataType. (#8124)
gchanan Jun 4, 2018
24231bf
Add utf-8 header to Python file with Unicode. (#8131)
ezyang Jun 4, 2018
73b6ce0
Add back lrn test (#8134)
bddppq Jun 4, 2018
1ff8957
Add non_blocking to Tensor/Module.to (#7312)
ssnl Jun 4, 2018
b27d0b9
Fix job name checking for AVX tests (#8135)
yf225 Jun 4, 2018
37faaac
Fix a corner case for ReShapeOp (#8142)
sunnieshang Jun 5, 2018
4b43c7f
cpu/ideep context converter (#8139)
Jun 5, 2018
84100bf
fix type mismatch while call torch._C._cuda_setDevice (#8065)
HisiFish Jun 5, 2018
4169fe2
docs: Add warning to torch.repeat() (#8116)
Ir1d Jun 5, 2018
0024401
Accelerate bernoulli number generation on CPU (#7171)
MlWoo Jun 5, 2018
caaf9c4
docs: add canonical_url and fix redirect link (#8155)
Ir1d Jun 5, 2018
886b7bc
docstring support for @script and @script_method (#7898)
zasdfgbnm Jun 5, 2018
a1f94e8
[auto] Update onnx to 968d28d - fix Node::isBefore (onnx/onnx#1075)
onnxbot Jun 5, 2018
f439c92
remove some unnecessary cudaGetDevices (#8089)
Jun 5, 2018
4ec08fa
Fix cuda.framework error on OSX. (#8136)
ezyang Jun 5, 2018
7899c03
[C++ API] Improve and use OrderedDict for parameters / modules (#7823)
goldsborough Jun 5, 2018
e6aefee
Fix __rshift__ bug (#8161)
vishwakftw Jun 5, 2018
78d66e1
Move non-generic Storage code needed by TensorUtils to non-generic C+…
gchanan Jun 5, 2018
bd0e2ce
Pinning opencv to < 3.4 in conda builds (#7923)
pjh5 Jun 5, 2018
7b03ecd
Adding -setup- path, and better code structure (#8122)
pjh5 Jun 5, 2018
0d699ea
Abstract parallelization to faciliate using threadpools (#8163)
cpuhrsch Jun 5, 2018
deacc51
[Caffe2] Update elementwise ops to support numpy style boradcast (#8070)
xiaomengy Jun 5, 2018
4628040
Export getCudnnHandle (#7726)
bstriner Jun 6, 2018
f12a021
[JIT] Support a single TensorList argument anywhere in the argument l…
Jun 6, 2018
891cee0
use the correct datatype format (#8144)
seravee Jun 6, 2018
423d99e
Add back onnx console scripts dropped during migration from onnx-caff…
bddppq Jun 6, 2018
b0cfcc4
Get rid of SOVERSION (again). (#8132)
ezyang Jun 6, 2018
4a5592b
Fix a corner case for ReShapeOp (#8178)
sunnieshang Jun 6, 2018
b330754
Better conv error message basing on weight shape (#8051)
ssnl Jun 6, 2018
48f7a3a
Add retry logic to sccache download for Windows build (#7697)
yf225 Jun 6, 2018
6d4503e
fix caffe2 docker build (#7411)
qigtang Jun 6, 2018
d143fe7
[ONNX] Fix type_as symbolic (#8183)
Jun 6, 2018
37400b9
Yangqing as an ONNX codeowner (#8185)
Jun 6, 2018
92e5b0a
Fix protobuf options (#8184)
bstriner Jun 6, 2018
54f970d
Add a loop unrolling pass to PyTorch JIT (#7672)
apaszke Jun 6, 2018
1391b52
[auto] Update onnx to 4e65fd8 - fuse consecutive squeezes (onnx/onnx#…
onnxbot Jun 6, 2018
e155663
[Caffe2] Merging setup.py with setup_caffe2.py (#8129)
pjh5 Jun 6, 2018
c4bdc2f
Fix scalar check for sparse tensors. (#8197)
zou3519 Jun 6, 2018
b2deb86
fix lint
soumith Jun 6, 2018
848d716
Add more annotations for arguments in ATen schema (#8192)
apaszke Jun 6, 2018
8070d34
use THCThrustAllocator in BCECriterion (#8188)
Jun 6, 2018
df5ce89
Allow parallel_apply to take in list[Tensor] (#8047)
ssnl Jun 6, 2018
573a61b
Docs for gradcheck and gradgradcheck; expose gradgradcheck (#8166)
ssnl Jun 6, 2018
6dcdbdb
Implement randperm for CUDA (#7606)
yf225 Jun 6, 2018
6abd687
Update c10d build to link against Caffe2 (#8201)
pietern Jun 6, 2018
896f8f6
add wipe_cache option (#8204)
lly-zero-one Jun 6, 2018
3eedb4f
Replace (non-data) TensorUtils calls with non-generic THCTensor calls…
gchanan Jun 6, 2018
3adcc6e
Fix c10d compiler warnings (#8206)
pietern Jun 6, 2018
4091b61
Bump gloo submodule (#8202)
pietern Jun 6, 2018
6c607b2
rm -rf aten/contrib (#8165)
goldsborough Jun 6, 2018
c498ff5
Fix tanh_op on ios build (#8207)
xiaomengy Jun 6, 2018
700249c
[auto] Update onnx to f28e2f1 - fix lrn spec (onnx/onnx#1090)
onnxbot Jun 6, 2018
5d55b8e
[cmake] deprecate caffe2_* specific cuda function in cmake. (#8200)
Yangqing Jun 6, 2018
eb6f70d
skip CUDA memory leak check on Windows altogether (#8213)
ssnl Jun 6, 2018
330e3c1
Record shape and type in autograd to validate gradients (#8168)
colesbury Jun 6, 2018
8627e3f
[auto] Update onnx to 18d70ff - Graph should only have one (input) kP…
onnxbot Jun 6, 2018
1c7e27d
Set up a c10 source folder (#7822)
smessmer Jun 6, 2018
3328c6c
Change the benchmark log format and also log flops (#8215)
lly-zero-one Jun 7, 2018
818118b
Move helper functions to unnamed namespace. (#8224)
yyetim Jun 7, 2018
6b31d57
[auto] Update onnx to e96d823 - Update Google benchmark to 1.4.1 (onn…
onnxbot Jun 7, 2018
635268e
Change new bernoulli implementation to be fully generic. (#8218)
gchanan Jun 7, 2018
cb4617c
Structure THTensor like THCTensor is structured. (#8217)
gchanan Jun 7, 2018
d9607a9
move THCP-related utils to cuda/utils.cpp. (#8221)
gchanan Jun 7, 2018
ae8ed57
[READY TO MERGE] Use ccache in macOS build (#8009)
yf225 Jun 7, 2018
4e8ea34
[NEEDS REVIEW] Add nan and inf probability check to multinomial (#7647)
yf225 Jun 7, 2018
03c9635
[READY TO MERGE] Enable tests that use DataLoader with multiple worke…
yf225 Jun 7, 2018
b53f08a
Don't copy unneeded grads when using a function for several derivativ…
t-vi Jun 7, 2018
4335d4c
Fix win mkldnn (#7718)
bstriner Jun 7, 2018
ff741c7
[Caffe2] Add ADD operator for IDEEP (#8220)
Jun 7, 2018
1745c75
Allow optional build and installation of native test binaries (#8225)
Yangqing Jun 7, 2018
0a754a6
Update MKL exporter to IDEEP ops (#8228)
vishar0 Jun 7, 2018
7cf88b1
[ideep] Add IDEEP Squeeze op (#8227)
vishar0 Jun 7, 2018
47cfcf0
[auto] Update onnx to 62e63e9 - Fix build errors inside protobuf-benc…
onnxbot Jun 7, 2018
3930eca
Use .cc since some downstream libraries are configured for C++ only. …
xkszltl Jun 7, 2018
910afc4
Rename SparseTensor to SparseTensorRef. (#8237)
ezyang Jun 7, 2018
dc0a78e
[caffe2] Build Android tests and binaries in CI (#7593)
Maratyszcza Jun 7, 2018
113187d
Remove core and util warnings (#8239)
orionr Jun 7, 2018
b3f9912
Remove .gitmodules.aten since it is in .gitmodules now (#8232)
Yangqing Jun 7, 2018
b593f0f
Fix: gradcheck forced float32 (#8230)
bhushan23 Jun 7, 2018
0bf9a0d
Print requires_grad and grad_fn in string repr of tensor (#8211)
colesbury Jun 7, 2018
6068030
Fix TEST_CUDA import in test_cuda (#8246)
yf225 Jun 7, 2018
3ffa722
Fix lifting cat into its constant version (#8174)
zdevito Jun 7, 2018
7344661
Don't override Tensor, Storage macros defined outside torch/csrc in t…
gchanan Jun 7, 2018
b281b75
[auto] Update onnx to 3a035f4 - Add retry logic to model downloading …
onnxbot Jun 7, 2018
f1feecb
Fully genericize THC/THCUNN (except for TensorUtils and DeviceTensorU…
gchanan Jun 7, 2018
9d9b82a
[cmake] Use CAFFE2_USE_* for public/cuda.cmake (#8248)
pietern Jun 7, 2018
9147596
Fix app size check (#8256)
xiaomengy Jun 7, 2018
0368766
wip on CPU impl
weiyangfb Jun 7, 2018
39e09f4
Stop BCELoss from returning negative results (#8147)
li-roy Jun 8, 2018
bba193e
Relax CUDA_HOME detection logic, to build when libraries are found. (…
dashesy Jun 8, 2018
db56a3a
Added backward function for kl_div target (#7839)
weiyangfb Jun 8, 2018
1f335f8
Change the output format of caffe2 observers (#8261)
lly-zero-one Jun 8, 2018
2a418d7
Remove TensorUtils<T>::getData, provide data<T>() in TH(C)Tensor. (#8…
gchanan Jun 8, 2018
a6aa526
[caffe2] Move submodule onnx-tensorrt forward (#7659)
pooyadavoodi Jun 8, 2018
8e2b170
[ideep] Add IDEEP fallbacks for Faster-RCNN ops (#8260)
vishar0 Jun 8, 2018
97930ba
un-genericize THCDeviceTensorUtils. (#8258)
gchanan Jun 8, 2018
58f009f
[caffe2] Fix ATen dispatch for ops with TensorList arg (#8226)
Jun 8, 2018
db5232d
[cmake] Add and export Modules_CUDA_fix (#8271)
Yangqing Jun 8, 2018
72d2262
[auto] Update onnx to 2508156 - Make error message more verbose (onnx…
onnxbot Jun 8, 2018
897e9ba
[auto] Update onnx to 39e4668 - fix optimizer does not set ir_version…
onnxbot Jun 8, 2018
9073280
[cmake] Make cudnn optional (#8265)
Yangqing Jun 8, 2018
0f21d0b
Move signal window functions to ATen; add Blackman window (#8130)
ssnl Jun 8, 2018
8f6ee5a
[ideep] Fuse Conv-Relu after IDEEP graph rewrite, skip group conv (#8…
vishar0 Jun 8, 2018
77c0460
[c10d] NCCL Process Group implementation (#8182)
teng-li Jun 8, 2018
d9482d6
Set up CI build for CUDA 9.2 + macOS (#8274)
yf225 Jun 8, 2018
c05ee5f
c10 build setup (#8264)
smessmer Jun 8, 2018
28cef82
Remove remaining TensorTypeUtils functions. (#8286)
gchanan Jun 8, 2018
b4d40f8
Create initial Python bindings for c10d (#8119)
pietern Jun 8, 2018
d4fb955
Add option USE_NVRTC which defaults to off (#8289)
Yangqing Jun 8, 2018
8c4f040
[build] Remove /torch/lib/THD/cmake in favor of /cmake (#7159)
Yangqing Jun 8, 2018
aa4a4b3
Have a single THTensor / THCTensor type. (#8288)
gchanan Jun 8, 2018
6e6e82f
[auto] Update onnx to 58efe0a - add float16 support back for math and…
onnxbot Jun 8, 2018
c4591e5
Some utils for compile-time programming (#7778)
smessmer Jun 9, 2018
7e7e8f6
Remove THC's FindMAGMA (#8299)
Yangqing Jun 9, 2018
5f59e8b
Entries for torch.distributed in CODEOWNERS (#8293)
pietern Jun 9, 2018
5f98923
Add depthwise convolution test for IDEEP (#8301)
Jun 9, 2018
3c09acc
Fix dividing by zero segfault in Reshape (#8302)
bddppq Jun 9, 2018
637f842
Removes unused THCTensorConv (#8229)
mruberry Jun 9, 2018
36a2076
Replace Variables to Tensors (#8309)
vishwakftw Jun 10, 2018
7c780b1
Clean up old sccache log before build (#8305)
yf225 Jun 10, 2018
942d016
Remove unused grad ops on mobile to reduce app size (#8297)
xiaomengy Jun 10, 2018
df7b419
Small fixes (#8296)
smessmer Jun 10, 2018
972e7c7
[auto] Update onnx to 5ed684e - Remove/replace /MX with /WX for MSVC …
onnxbot Jun 10, 2018
af1b560
Fix sample code for cuda stream (#8319)
Stonesjtu Jun 10, 2018
56b29e2
[auto] Update onnx to 4b4085c - Add missing warning ignoring flags to…
onnxbot Jun 10, 2018
5658078
[THD] fix broken THD build with NCCL (#8323)
teng-li Jun 11, 2018
4254f3b
Add docstring for `torch.sparse_coo_tensor` (#8152)
sethah Jun 11, 2018
027f6b9
add error when backend is not supported by DDP (#8325)
ailzhang Jun 11, 2018
5463e44
Fix collect_env.py for Windows (#8326)
peterjc123 Jun 11, 2018
61a68e3
Fix the script doesn't stop eariler on error for MSVC and Ninja (#8277)
peterjc123 Jun 11, 2018
58b1c73
Skip test_multinomial_invalid_probs_cuda on Windows (#8324)
yf225 Jun 11, 2018
8f92ab9
Support printing sparse tensors in ATen, fixes #8333. (#8334)
ezyang Jun 11, 2018
db4fa8f
[C++ API] Cursors (#8190)
goldsborough Jun 11, 2018
eee6226
Implement dim_arange operator (#8266)
Jun 11, 2018
63094e5
1. fixed flip CPU impl for non-continuous flip dims; 2. added more te…
weiyangfb Jun 11, 2018
44b18e4
nits
weiyangfb Jun 11, 2018
a35d2ad
Merge branch 'flip_tensor' of github.com:weiyangfb/pytorch into flip_…
weiyangfb Jun 11, 2018
a9ae3f1
1. removed for loop in pointwise CUDA kernel; 2. using templated (int…
weiyangfb Jun 15, 2018
8780087
added torch.flip.__doc__
weiyangfb Jun 15, 2018
0709c30
nits
weiyangfb Jun 15, 2018
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
55 changes: 55 additions & 0 deletions aten/src/ATen/native/TensorTransformations.cpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,55 @@
#include "TensorTransformations.h"

#include "ATen/NativeFunctions.h"

namespace at {
namespace native {

Tensor flip_cpu(const Tensor& self, IntList dims) {
const int64_t total_dims = self.dim(), flip_dims_size = dims.size();
check_errors(total_dims, flip_dims_size, dims);

auto flip_dims_v = std::vector<int64_t>(dims);
std::sort(flip_dims_v.begin(), flip_dims_v.end());
auto final_indices = std::vector<at::Tensor>(total_dims);

auto indices = std::vector<at::Tensor>(flip_dims_size);
for (int64_t i = 0; i < flip_dims_size; i++) {
indices[i] = at::arange(self.type().toScalarType(at::ScalarType::Long), self.size(i) - 1, -1, -1);
// creates a meshgrid
auto temp = std::vector<int64_t>(flip_dims_size, 1);
temp[i] = indices[i].size(0);
indices[i] = indices[i].view(IntList(temp));
final_indices[flip_dims_v[i]] = indices[i];
}

// check if distance between two flip dims >= 2, where permute of output tensor is needed,
// because the advanced indexing puts all non-consecutive indices in the beginning of the tensor
bool to_permute = false;
int64_t first = flip_dims_v[0], second = flip_dims_v[0];
for (int64_t i = 1; i < flip_dims_size; i++) {
second = flip_dims_v[i];
if (second - first >= 2) {
to_permute = true;
break;
}
first = second;
}

if (to_permute) {
// permute output tensor
auto permute_order = std::vector<int64_t>(flip_dims_v);
for (int64_t i = 0; i < total_dims; i++) {
if (std::find(flip_dims_v.begin(), flip_dims_v.end(), i) == flip_dims_v.end()) {
permute_order.emplace_back(i);
}
}
auto out_tensor = self.index(TensorList(final_indices));
return out_tensor.permute(IntList(permute_order));
}

auto out_tensor = self.index(TensorList(final_indices));
return out_tensor;
}

}} // namespace at::native
34 changes: 34 additions & 0 deletions aten/src/ATen/native/TensorTransformations.h
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
#include "ATen/ATen.h"

namespace at {
namespace native {

static inline void check_errors(int64_t total_dims, int64_t flip_dims_size, IntList dims) {
// check if number of axis in dim is valid
AT_CHECK(flip_dims_size > 0,
"expected input tensor dims > 0, but got tensor dims size=", flip_dims_size);

// check duplicates in dims
auto flip_dims_v = std::vector<int64_t>(dims);
flip_dims_v.erase(std::unique(flip_dims_v.begin(), flip_dims_v.end()), flip_dims_v.end());
AT_CHECK((int64_t)flip_dims_v.size() == flip_dims_size,
"dims has duplicates, original flip dims size=", flip_dims_size,
", but unique flip dims size=", flip_dims_v.size());

// check len of dims
AT_CHECK(flip_dims_size <= total_dims,
"expected flip dims size <= tensor total dims, but got flip dims size=",
flip_dims_size, " and tensor total dim=", total_dims);

// check if dims axis within range
auto min_max_d = std::minmax_element(flip_dims_v.begin(), flip_dims_v.end());

AT_CHECK(*min_max_d.first >= 0,
"expected flip dims axis >= 0, but got min flip dims=", *min_max_d.first);

AT_CHECK(*min_max_d.second < total_dims,
"expected flip dims axis < tensor total dims, but got max flip dims=",
*min_max_d.second, " and tensor total dim=", total_dims);
}

}} // namespace at::native
120 changes: 120 additions & 0 deletions aten/src/ATen/native/cuda/TensorTransformations.cu
Original file line number Diff line number Diff line change
@@ -0,0 +1,120 @@
#include "ATen/native/TensorTransformations.h"

#include "ATen/cuda/detail/IndexUtils.cuh"
#include "ATen/NativeFunctions.h"
#include "ATen/cuda/CUDATensorMethods.cuh"
#include "ATen/cuda/CUDATypeConversion.cuh"

namespace at {
namespace native {

#define AT_APPLY_THREADS_PER_BLOCK 32 * 16
#define AT_APPLY_BLOCKS_PER_SM 4

template <typename scalar_t, typename IndexType>
#if __CUDA_ARCH__ >= 350
__launch_bounds__(AT_APPLY_THREADS_PER_BLOCK, AT_APPLY_BLOCKS_PER_SM)
#endif
__global__ void
kernel_pointwise_flip_apply2(const cuda::detail::TensorInfo<scalar_t, IndexType> in_tensor_info,
cuda::detail::TensorInfo<scalar_t, IndexType> out_tensor_info,
IndexType N,
int flip_dim,
IndexType total_dims) {
for (IndexType linear_index = blockIdx.x * blockDim.x + threadIdx.x; linear_index < N; linear_index += gridDim.x * blockDim.x) {
IndexType dst_offset = 0;
if (flip_dim == 0) {
// flip 1st dim
dst_offset = (in_tensor_info.sizes[0] - 1 - linear_index / in_tensor_info.strides[0]) * in_tensor_info.strides[0] + linear_index % in_tensor_info.strides[0];
}
else {
// flip last dim
IndexType i = total_dims - 1;
dst_offset = linear_index / in_tensor_info.strides[0] * in_tensor_info.strides[0] + (in_tensor_info.sizes[i] - 1 - linear_index % in_tensor_info.strides[0]);
}
out_tensor_info.data[dst_offset] = in_tensor_info.data[linear_index];
}
}

template <typename scalar_t>
__global__
void flip_cuda_kernel(scalar_t* in_tensor, scalar_t* out_tensor, int64_t N, int64_t* flip_dims, int64_t flip_dims_size, int64_t* strides, int64_t* strides_contiguous, int64_t* shape, int64_t total_dims) {

int64_t linear_index = blockIdx.x * blockDim.x + threadIdx.x;
if (linear_index >= N) {
return;
}

int64_t cur_indices = linear_index, rem = 0, dst_offset = 0;
for (int64_t i = 0; i < total_dims; i++) {
int64_t temp = cur_indices;
cur_indices = cur_indices / strides_contiguous[i];
rem = temp - cur_indices * strides_contiguous[i];
// flip the indices if it is in flip_dims
for (int64_t j = 0; j < flip_dims_size; j++) {
if (i == flip_dims[j]) {
cur_indices = shape[i] - 1 - cur_indices;
}
}
dst_offset += cur_indices * strides[i];
cur_indices = rem;
}
out_tensor[linear_index] = in_tensor[dst_offset];
}

// Flip tensor given a list of dims
Tensor flip_cuda(const Tensor& self, IntList dims) {
auto in_tensor = self;
const int64_t flip_dims_size = dims.size(), total_dims = in_tensor.dim(), N = in_tensor.numel();
check_errors(total_dims, flip_dims_size, dims);

int64_t block_size = 512;
dim3 dim_block(block_size);
dim3 dim_grid((N + block_size - 1) / block_size);

// use kernel_pointwise_flip_apply2 only when to-flip dim is the 1st or last dim, where collapseDims can reduce the amount of work
if (flip_dims_size == 1 && in_tensor.is_contiguous() && (dims[0] == 0 || dims[0] == total_dims - 1)) {
auto out_tensor = at::empty_like(self);
AT_DISPATCH_ALL_TYPES_AND_HALF(in_tensor.type(), "flip_cuda", [&] {
using cuda_scalar_t = cuda::into_type<scalar_t>;
auto in_tensor_info = cuda::detail::getTensorInfo<cuda_scalar_t, int64_t>(in_tensor);
auto out_tensor_info = cuda::detail::getTensorInfo<cuda_scalar_t, int64_t>(out_tensor);
int flip_dim = in_tensor_info.collapseDims(dims[0]);
out_tensor_info.collapseDims(dims[0]);
kernel_pointwise_flip_apply2<cuda_scalar_t, int64_t>
<<<dim_grid, dim_block, 0, globalContext().getCurrentCUDAStream()>>>(
in_tensor_info, out_tensor_info, N, flip_dim, total_dims);
});
return out_tensor;
}

auto flip_dims = std::vector<int64_t>(dims);
auto flip_dims_t = at::CPU(kLong).tensorFromBlob(flip_dims.data(), {static_cast<int64_t>(flip_dims.size())});

auto shape = std::vector<int64_t>(in_tensor.sizes());
auto shape_t = at::CPU(kLong).tensorFromBlob(shape.data(), {static_cast<int64_t>(shape.size())});

auto strides = std::vector<int64_t>(in_tensor.strides());
auto strides_t = at::CPU(kLong).tensorFromBlob(strides.data(), {static_cast<int64_t>(strides.size())});

auto out_tensor = at::empty_like(in_tensor);

// stride_contiguous is the stride of non-contiguous tensor after called contiguous(), it is used to compute indices for each element in non-contiguous tensor
Tensor stride_contiguous = at::zeros(CPU(kLong), {total_dims});
int64_t* stride_contiguous_d = stride_contiguous.data<int64_t>();
int64_t tmp = N;
for (int64_t i = 0; i < total_dims; i++) {
tmp = tmp / shape[i];
stride_contiguous_d[i] = tmp;
}

AT_DISPATCH_ALL_TYPES_AND_HALF(in_tensor.type(), "flip_cuda", [&] {
using cuda_scalar_t = cuda::into_type<scalar_t>;
flip_cuda_kernel<<<dim_grid, dim_block, 0, globalContext().getCurrentCUDAStream()>>>(
in_tensor.data<cuda_scalar_t>(), out_tensor.data<cuda_scalar_t>(), N, flip_dims_t.toType(CUDA(kLong)).data<int64_t>(), flip_dims_size, strides_t.toType(CUDA(kLong)).data<int64_t>(), stride_contiguous.toType(CUDA(kLong)).data<int64_t>(), shape_t.toType(CUDA(kLong)).data<int64_t>(), total_dims);
});

return out_tensor;
}

}} // namespace at::native
5 changes: 5 additions & 0 deletions aten/src/ATen/native/native_functions.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -1088,6 +1088,11 @@
- func: transpose_(Tensor self, int64_t dim0, int64_t dim1) -> Tensor
variants: method

- func: flip(Tensor self, IntList dims) -> Tensor
dispatch:
CPU: flip_cpu
CUDA: flip_cuda

- func: _trilinear(Tensor i1, Tensor i2, Tensor i3, IntList expand1, IntList expand2, IntList expand3, IntList sumdim, int64_t unroll_dim=1) -> Tensor
variants: function

Expand Down
4 changes: 4 additions & 0 deletions test/test_autograd.py
Original file line number Diff line number Diff line change
Expand Up @@ -2509,6 +2509,10 @@ class dont_convert(tuple):
('reshape', (S,), (S,), '1d'),
('reshape', (), (dont_convert(()),), 'scalar_to_scalar'),
('reshape', (), (1,), 'scalar_to_1d'),
('flip', (S, S, S), ([0],), 'd0'),
('flip', (S, S, S), ([0, 1, 2],), 'd012'),
('flip', (S, S, S), ([0, 2],), 'd02'),
('flip', (S, S, S), ([2, 0],), 'd20'),
('view_as', (S, S, S), (non_differentiable(torch.rand(S * S, S)),)),
('view_as', (), (non_differentiable(torch.tensor(5.5)),), 'scalar'),
('view_as', (), (non_differentiable(torch.rand(1, 1)),), 'scalar_to_dims'),
Expand Down
7 changes: 7 additions & 0 deletions test/test_cuda.py
Original file line number Diff line number Diff line change
Expand Up @@ -409,6 +409,10 @@ def tmp(t):
('zero', small_3d, lambda t: [],),
('zeros', small_3d, lambda t: [1, 2, 3, 4],),
('eye', small_2d, lambda t: [3, 4],),
('flip', small_3d, lambda t: [0], 'd0', types, True),
('flip', small_3d, lambda t: [0, 1, 2], 'd012', types, True),
('flip', small_3d, lambda t: [0, 2], 'd02', types, True),
('flip', small_3d, lambda t: [2, 0], 'd20', types, True),
('rsqrt', lambda t: constant_tensor_add(1, small_3d(t)), lambda t: [], None, float_types),
('sinh', lambda t: tensor_clamp(small_3d(t), -1, 1), lambda t: [], None, float_types),
('tan', lambda t: tensor_clamp(small_3d(t), -1, 1), lambda t: [], None, float_types),
Expand Down Expand Up @@ -1372,6 +1376,9 @@ def test_gesv_batched_dims(self):
def test_view(self):
TestTorch._test_view(self, lambda t: t.cuda())

def test_flip(self):
TestTorch._test_flip(self, use_cuda=True)

def test_signal_window_functions(self):
TestTorch._test_signal_window_functions(self, device=torch.device('cuda'))

Expand Down
45 changes: 45 additions & 0 deletions test/test_torch.py
Original file line number Diff line number Diff line change
Expand Up @@ -5953,6 +5953,51 @@ def test_permute(self):
self.assertEqual(perm, new)
self.assertEqual(x.size(), orig)

@staticmethod
def _test_flip(self, use_cuda=False):
if use_cuda:
cuda = torch.device("cuda")
data = torch.tensor([1, 2, 3, 4, 5, 6, 7, 8], device=cuda).view(2, 2, 2)
# large data testing
large_data = torch.arange(0, 100000000, device=cuda).view(10000, 10000)
large_data.flip([0, 1])
else:
data = torch.tensor([1, 2, 3, 4, 5, 6, 7, 8]).view(2, 2, 2)

self.assertEqual(torch.tensor([5, 6, 7, 8, 1, 2, 3, 4]).view(2, 2, 2), data.flip(0))
self.assertEqual(torch.tensor([3, 4, 1, 2, 7, 8, 5, 6]).view(2, 2, 2), data.flip(1))
self.assertEqual(torch.tensor([2, 1, 4, 3, 6, 5, 8, 7]).view(2, 2, 2), data.flip(2))
self.assertEqual(torch.tensor([7, 8, 5, 6, 3, 4, 1, 2]).view(2, 2, 2), data.flip(0, 1))
self.assertEqual(torch.tensor([8, 7, 6, 5, 4, 3, 2, 1]).view(2, 2, 2), data.flip(0, 1, 2))

# check for permute
self.assertEqual(torch.tensor([6, 5, 8, 7, 2, 1, 4, 3]).view(2, 2, 2), data.flip(0, 2))
self.assertEqual(torch.tensor([6, 5, 8, 7, 2, 1, 4, 3]).view(2, 2, 2), data.flip(2, 0))

# not allow flip on the same dim more than once
self.assertRaises(RuntimeError, lambda: data.flip(0, 1, 1))
# not allow empty list as input
self.assertRaises(TypeError, lambda: data.flip())
# not allow size of flip dim > total dims
self.assertRaises(RuntimeError, lambda: data.flip(0, 1, 2, 3))
# not allow dim < 0
self.assertRaises(RuntimeError, lambda: data.flip(-1))
# not allow dim > max dim
self.assertRaises(RuntimeError, lambda: data.flip(3))

# test for non-contiguous case
if use_cuda:
expanded_data = torch.arange(1, 4, device=cuda).view(3, 1).expand(3, 2)
tranposed_data = torch.arange(1, 9, device=cuda).view(2, 2, 2).transpose(0, 1)
else:
expanded_data = torch.arange(1, 4).view(3, 1).expand(3, 2)
tranposed_data = torch.arange(1, 9).view(2, 2, 2).transpose(0, 1)
self.assertEqual(torch.tensor([3, 3, 2, 2, 1, 1]).view(3, 2), expanded_data.flip(0))
self.assertEqual(torch.tensor([8, 7, 4, 3, 6, 5, 2, 1]).view(2, 2, 2), tranposed_data.flip(0, 1, 2))

def test_flip(self):
self._test_flip(self, use_cuda=False)

def test_storage(self):
v = torch.randn(3, 5)
self.assertEqual(v.storage()[0], v.data[0][0])
Expand Down
3 changes: 3 additions & 0 deletions tools/autograd/derivatives.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -629,6 +629,9 @@
- name: t(Tensor self)
self: grad.t()

- name: flip(Tensor self, IntList dims)
self: grad.flip(dims)

- name: take(Tensor self, Tensor index)
self: zeros_like(self).put_(index, grad, true)

Expand Down
27 changes: 27 additions & 0 deletions torch/_torch_docs.py
Original file line number Diff line number Diff line change
Expand Up @@ -4392,6 +4392,33 @@ def parse_kwargs(desc):
[-0.5872, 0.6932]])
""")

add_docstr(torch.flip,
r"""
flip(input, dims) -> Tensor

Reverse the order of a n-D tensor along given axis in dims.

Args:
input (Tensor): the input tensor
dims (a list or tuple): axis to flip on

Example::

>>> x = torch.arange(8).view(2, 2, 2)
>>> x
tensor([[[ 0, 1],
[ 2, 3]],

[[ 4, 5],
[ 6, 7]]])
>>> torch.flip(x, [0, 1])
tensor([[[ 6, 7],
[ 4, 5]],

[[ 2, 3],
[ 0, 1]]])
""")

add_docstr(torch.take,
r"""
take(input, indices) -> Tensor
Expand Down