[WIP] Rewrote adam optimizer with foreach APIs #43507

izdeby · 2020-08-24T17:26:37Z

Stack from ghstack:

Rewrote Adadelta with _foreach APIs #43926 Rewrote Adadelta with _foreach APIs
Rewrote adamax optimizer with _foreach APIs #43925 Rewrote adamax optimizer with _foreach APIs
Rewrote asgd optimizer with *_foreach APIs #43727 Rewrote asgd optimizer with *_foreach APIs
Rewrote rprop optimizer with _foreach APIs #43726 Rewrote rprop optimizer with _foreach APIs
Rewrote rmsprop with _forerach APIs #43725 Rewrote rmsprop with _forerach APIs
Rewrote SGD with _foreach APIs #43724 Rewrote SGD with _foreach APIs
Rewrote AdamW with _foreach APIs #43562 Rewrote AdamW with _foreach APIs
[WIP] Rewrote adam optimizer with foreach APIs #43507 [WIP] Rewrote adam optimizer with foreach APIs

Differential Revision: D23331893

Motivation
GitHub issue
Current PyTorch optimizer implementations are not efficient in cases when we work with a lot of small feature tensors. Starting a lot of kernels slows down the whole process. We need to reduce the number of kernels that we start.
As an example, we should be looking at NVIDIAs Apex.
In order to track progress, we will pick PyTorchs DCGAN model with Adam optimizer and once the optimizer is reimplemented with tensor lists, benchmark the model performance against original model version, Apexs version with original Adam optimizer and it’s FusedAdam optimizer.

Current API restrictions

List can't be empty (will fixed in upcoming PRs).
All tensors in the list must have the same dtype, device and size.

Broadcasting
At this point we don't support broadcasting.

What is 'Fast' and 'Slow' route
In particular cases, we cant process an op with a fast list CUDA kernel. Still, we can do with a regular for-loop where the op will be applied to each tensor individually through the dispatch mechanisms. There are a few checks that decide whether the op will be performed via a 'fast' or 'slow' path.
To go the fast route,

All tensors must have strided layout
All tensors must be dense and not have overlapping memory
The resulting tensor type must be the same dtype.
All Tensors must be on the same device.

In this PR

We are introducing new namespace under torch.optim - torch.optim.multi_tensor, where we will have optimizers rewritten with foreach* APIs.
Rewriting adam optimizer with foreach* APIs

[ghstack-poisoned]

ghstack-source-id: 41a0700 Pull Request resolved: #43507

dr-ci · 2020-08-24T17:34:34Z

💊 CI failures summary and remediations

As of commit 1402b87 (more details on the Dr. CI page):

11/12 failures possibly* introduced in this PR
- 2/11 non-CircleCI failure(s)
1/12 tentatively recognized as flaky ❄️
- Click here to rerun these jobs

🕵️ 9 new failures recognized by patterns

The following CI failures do not appear to be due to upstream breakages:

pytorch_linux_xenial_py3_6_gcc5_4_ge_config_legacy_test (1/9)

Step: "Run tests" (full log | diagnosis details | 🔁 rerun)

Sep 23 22:55:19 ERROR [0.002s]: test_torchbind_tracing_nested (jit.test_torchbind.TestTorchbind)

Sep 23 22:55:19 Traceback (most recent call last): 
Sep 23 22:55:19   File "/var/lib/jenkins/workspace/test/jit/test_torchbind.py", line 31, in setUp 
Sep 23 22:55:19     torch.ops.load_library(str(p)) 
Sep 23 22:55:19   File "/opt/conda/lib/python3.6/site-packages/torch/_ops.py", line 105, in load_library 
Sep 23 22:55:19     ctypes.CDLL(path) 
Sep 23 22:55:19   File "/opt/conda/lib/python3.6/ctypes/__init__.py", line 348, in __init__ 
Sep 23 22:55:19     self._handle = _dlopen(self._name, mode) 
Sep 23 22:55:19 OSError: /var/lib/jenkins/workspace/build/lib/libtorchbind_test.so: undefined symbol: _ZTIN7testing4TestE 
Sep 23 22:55:19  
Sep 23 22:55:19 ====================================================================== 
Sep 23 22:55:19 ERROR [0.002s]: test_torchbind_tracing_nested (jit.test_torchbind.TestTorchbind) 
Sep 23 22:55:19 ---------------------------------------------------------------------- 
Sep 23 22:55:19 Traceback (most recent call last): 
Sep 23 22:55:19   File "/var/lib/jenkins/workspace/test/jit/test_torchbind.py", line 31, in setUp 
Sep 23 22:55:19     torch.ops.load_library(str(p)) 
Sep 23 22:55:19   File "/opt/conda/lib/python3.6/site-packages/torch/_ops.py", line 105, in load_library 
Sep 23 22:55:19     ctypes.CDLL(path) 
Sep 23 22:55:19   File "/opt/conda/lib/python3.6/ctypes/__init__.py", line 348, in __init__ 
Sep 23 22:55:19     self._handle = _dlopen(self._name, mode) 
Sep 23 22:55:19 OSError: /var/lib/jenkins/workspace/build/lib/libtorchbind_test.so: undefined symbol: _ZTIN7testing4TestE 
Sep 23 22:55:19

pytorch_linux_xenial_py3_clang5_asan_test2 (2/9)

Step: "Run tests" (full log | diagnosis details | 🔁 rerun)

Sep 23 22:50:45 SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior /var/lib/jenkins/workspace/aten/src/ATen/Utils.cpp:11:3 in

Sep 23 22:50:45     #7 0x55a8090c570b in PyEval_EvalCode /tmp/build/80754af9/python_1599604603603/work/Python/ceval.c:731 
Sep 23 22:50:45     #8 0x55a809145573 in run_mod /tmp/build/80754af9/python_1599604603603/work/Python/pythonrun.c:1025 
Sep 23 22:50:45     #9 0x55a80914560c in PyRun_StringFlags /tmp/build/80754af9/python_1599604603603/work/Python/pythonrun.c:949 
Sep 23 22:50:45     #10 0x55a80914566e in PyRun_SimpleStringFlags /tmp/build/80754af9/python_1599604603603/work/Python/pythonrun.c:445 
Sep 23 22:50:45     #11 0x55a809149472 in run_command /tmp/build/80754af9/python_1599604603603/work/Modules/main.c:301 
Sep 23 22:50:45     #12 0x55a809149472 in Py_Main /tmp/build/80754af9/python_1599604603603/work/Modules/main.c:749 
Sep 23 22:50:45     #13 0x55a80901343d in main /tmp/build/80754af9/python_1599604603603/work/Programs/python.c:69 
Sep 23 22:50:45     #14 0x7f7814aa583f in __libc_start_main /build/glibc-e6zv40/glibc-2.23/csu/../csu/libc-start.c:291 
Sep 23 22:50:45     #15 0x55a8090f2d0a in _start /home/rdonnelly/mc/conda-bld/compilers_linux-64_1534865402226/work/.build/src/glibc-2.12.2/csu/../sysdeps/x86_64/elf/start.S:103 
Sep 23 22:50:45  
Sep 23 22:50:45 SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior /var/lib/jenkins/workspace/aten/src/ATen/Utils.cpp:11:3 in  
Sep 23 22:50:45 + retcode=1 
Sep 23 22:50:45 + set -e 
Sep 23 22:50:45 + return 1 
Sep 23 22:50:45 + [[ pytorch-linux-xenial-py3-clang5-asan-test2 == *-NO_AVX-* ]] 
Sep 23 22:50:45 + [[ pytorch-linux-xenial-py3-clang5-asan-test2 == *-NO_AVX2-* ]] 
Sep 23 22:50:45 + '[' -n https://github.com/pytorch/pytorch/pull/43507 ']' 
Sep 23 22:50:45 + [[ pytorch-linux-xenial-py3-clang5-asan-test2 != *coverage* ]] 
Sep 23 22:50:45 ++ mktemp 
Sep 23 22:50:45 + DETERMINE_FROM=/tmp/tmp.L1uKF74yUQ 
Sep 23 22:50:45 + file_diff_from_base /tmp/tmp.L1uKF74yUQ

pytorch_linux_xenial_py3_6_gcc5_4_ge_config_profiling_test (3/9)

Step: "Run tests" (full log | diagnosis details | 🔁 rerun)

Sep 23 22:54:17 ERROR [0.002s]: test_torchbind_tracing_nested (jit.test_torchbind.TestTorchbind)

Sep 23 22:54:17 Traceback (most recent call last): 
Sep 23 22:54:17   File "/var/lib/jenkins/workspace/test/jit/test_torchbind.py", line 31, in setUp 
Sep 23 22:54:17     torch.ops.load_library(str(p)) 
Sep 23 22:54:17   File "/opt/conda/lib/python3.6/site-packages/torch/_ops.py", line 105, in load_library 
Sep 23 22:54:17     ctypes.CDLL(path) 
Sep 23 22:54:17   File "/opt/conda/lib/python3.6/ctypes/__init__.py", line 348, in __init__ 
Sep 23 22:54:17     self._handle = _dlopen(self._name, mode) 
Sep 23 22:54:17 OSError: /var/lib/jenkins/workspace/build/lib/libtorchbind_test.so: undefined symbol: _ZTIN7testing4TestE 
Sep 23 22:54:17  
Sep 23 22:54:17 ====================================================================== 
Sep 23 22:54:17 ERROR [0.002s]: test_torchbind_tracing_nested (jit.test_torchbind.TestTorchbind) 
Sep 23 22:54:17 ---------------------------------------------------------------------- 
Sep 23 22:54:17 Traceback (most recent call last): 
Sep 23 22:54:17   File "/var/lib/jenkins/workspace/test/jit/test_torchbind.py", line 31, in setUp 
Sep 23 22:54:17     torch.ops.load_library(str(p)) 
Sep 23 22:54:17   File "/opt/conda/lib/python3.6/site-packages/torch/_ops.py", line 105, in load_library 
Sep 23 22:54:17     ctypes.CDLL(path) 
Sep 23 22:54:17   File "/opt/conda/lib/python3.6/ctypes/__init__.py", line 348, in __init__ 
Sep 23 22:54:17     self._handle = _dlopen(self._name, mode) 
Sep 23 22:54:17 OSError: /var/lib/jenkins/workspace/build/lib/libtorchbind_test.so: undefined symbol: _ZTIN7testing4TestE 
Sep 23 22:54:17

pytorch_windows_vs2019_py36_cuda10.1_test2 (4/9)

Step: "Test" (full log | diagnosis details | 🔁 rerun)

RuntimeError: test_optim failed!

 
Generating XML reports... 
Generated XML report: test-reports\python-unittest\TEST-TestLRScheduler-20200923233632.xml 
Generated XML report: test-reports\python-unittest\TEST-TestOptim-20200923233632.xml 
Generated XML report: test-reports\python-unittest\TEST-TestSWAUtils-20200923233632.xml 
Traceback (most recent call last): 
  File "run_test.py", line 742, in <module> 
    main() 
  File "run_test.py", line 725, in main 
    raise RuntimeError(err_message) 
RuntimeError: test_optim failed! 
 
(base) circleci@PACKER-5F5FCBA1 C:\Users\circleci\project\test>if ERRORLEVEL 1 exit /b 1  
+ cleanup
+ retcode=1
+ set +x

pytorch_macos_10_13_py3_test (5/9)

Step: "Test" (full log | diagnosis details | 🔁 rerun)

Sep 23 23:30:53 AssertionError: Torch not compiled with CUDA enabled

Sep 23 23:30:53 TypeError: expected Tensor as element 0 in argument 1, but got float 
Sep 23 23:30:53  
Sep 23 23:30:53 ====================================================================== 
Sep 23 23:30:53 FAIL [0.004s]: test_adam_step (__main__.TestOptim) 
Sep 23 23:30:53 ---------------------------------------------------------------------- 
Sep 23 23:30:53 Traceback (most recent call last): 
Sep 23 23:30:53   File "test_optim.py", line 302, in test_adam_step 
Sep 23 23:30:53     weight_base = torch.randn(10, 5, requires_grad=True, device='cuda') 
Sep 23 23:30:53   File "/Users/distiller/workspace/miniconda3/lib/python3.7/site-packages/torch/cuda/__init__.py", line 165, in _lazy_init 
Sep 23 23:30:53     raise AssertionError("Torch not compiled with CUDA enabled") 
Sep 23 23:30:53 AssertionError: Torch not compiled with CUDA enabled 
Sep 23 23:30:53  
Sep 23 23:30:53 ---------------------------------------------------------------------- 
Sep 23 23:30:53 Ran 103 tests in 29.526s 
Sep 23 23:30:53  
Sep 23 23:30:53 FAILED (failures=1, errors=1) 
Sep 23 23:30:53  
Sep 23 23:30:53 Generating XML reports... 
Sep 23 23:30:53 Generated XML report: test-reports/dist-gloo/TEST-TestLRScheduler-20200923233023.xml 
Sep 23 23:30:53 Generated XML report: test-reports/dist-gloo/TEST-TestOptim-20200923233023.xml 
Sep 23 23:30:53 Generated XML report: test-reports/dist-gloo/TEST-TestSWAUtils-20200923233023.xml

pytorch_linux_bionic_py3_6_clang9_test (6/9)

Step: "Run tests" (full log | diagnosis details | 🔁 rerun)

Sep 23 23:23:25 ERROR:sccache::server: Compilation failed: Output { status: ExitStatus(ExitStatus(256)), stdout: "", stderr: "/var/lib/jenkins/.cache/torch_extensions/test_compilation_error_formatting/main.cpp: In function \'int main()\':\n/var/lib/jenkins/.cache/torch_extensions/test_compilation_error_formatting/main.cpp:2:23: error: expected \';\' before \'}\' token\n int main() { return 0 }\n                       ^\n" }

Sep 23 23:23:25     raise RuntimeError(err_message) 
Sep 23 23:23:25 RuntimeError: test_optim failed! 
Sep 23 23:23:25  
Sep 23 23:23:25 real	24m57.321s 
Sep 23 23:23:25 user	30m56.667s 
Sep 23 23:23:25 sys	5m27.241s 
Sep 23 23:23:25 + cleanup 
Sep 23 23:23:25 + retcode=1 
Sep 23 23:23:25 + set +x 
Sep 23 23:23:25 =================== sccache compilation log =================== 
Sep 23 23:23:25 ERROR:sccache::server: Compilation failed: Output { status: ExitStatus(ExitStatus(256)), stdout: "", stderr: "/var/lib/jenkins/.cache/torch_extensions/test_compilation_error_formatting/main.cpp: In function \'int main()\':\n/var/lib/jenkins/.cache/torch_extensions/test_compilation_error_formatting/main.cpp:2:23: error: expected \';\' before \'}\' token\n int main() { return 0 }\n                       ^\n" } 
Sep 23 23:23:25  
Sep 23 23:23:25 =========== If your build fails, please take a look at the log above for possible reasons =========== 
Sep 23 23:23:25 Compile requests                327 
Sep 23 23:23:25 Compile requests executed        35 
Sep 23 23:23:25 Cache hits                       27 
Sep 23 23:23:25 Cache misses                      7 
Sep 23 23:23:25 Cache timeouts                    0 
Sep 23 23:23:25 Cache read errors                 0 
Sep 23 23:23:25 Forced recaches                   0 
Sep 23 23:23:25 Cache write errors                0

pytorch_linux_xenial_py3_6_gcc5_4_test (7/9)

Step: "Run tests" (full log | diagnosis details | 🔁 rerun)

Sep 23 23:28:16 ERROR:sccache::server: Compilation failed: Output { status: ExitStatus(ExitStatus(256)), stdout: "", stderr: "/var/lib/jenkins/.cache/torch_extensions/test_compilation_error_formatting/main.cpp: In function \'int main()\':\n/var/lib/jenkins/.cache/torch_extensions/test_compilation_error_formatting/main.cpp:2:23: error: expected \';\' before \'}\' token\n int main() { return 0 }\n                       ^\n" }

Sep 23 23:28:16 Traceback (most recent call last): 
Sep 23 23:28:16   File "test/run_test.py", line 742, in <module> 
Sep 23 23:28:16     main() 
Sep 23 23:28:16   File "test/run_test.py", line 725, in main 
Sep 23 23:28:16     raise RuntimeError(err_message) 
Sep 23 23:28:16 RuntimeError: test_jit failed! 
Sep 23 23:28:16 + cleanup 
Sep 23 23:28:16 + retcode=1 
Sep 23 23:28:16 + set +x 
Sep 23 23:28:16 =================== sccache compilation log =================== 
Sep 23 23:28:16 ERROR:sccache::server: Compilation failed: Output { status: ExitStatus(ExitStatus(256)), stdout: "", stderr: "/var/lib/jenkins/.cache/torch_extensions/test_compilation_error_formatting/main.cpp: In function \'int main()\':\n/var/lib/jenkins/.cache/torch_extensions/test_compilation_error_formatting/main.cpp:2:23: error: expected \';\' before \'}\' token\n int main() { return 0 }\n                       ^\n" } 
Sep 23 23:28:16  
Sep 23 23:28:16 =========== If your build fails, please take a look at the log above for possible reasons =========== 
Sep 23 23:28:16 Compile requests                327 
Sep 23 23:28:16 Compile requests executed        35 
Sep 23 23:28:16 Cache hits                       27 
Sep 23 23:28:16 Cache misses                      7 
Sep 23 23:28:16 Cache timeouts                    0 
Sep 23 23:28:16 Cache read errors                 0 
Sep 23 23:28:16 Forced recaches                   0 
Sep 23 23:28:16 Cache write errors                0

pytorch_linux_xenial_py3_6_gcc5_4_ge_config_simple_test (8/9)

Step: "Run tests" (full log | diagnosis details | 🔁 rerun)

Sep 23 23:23:57 ERROR:sccache::server: Compilation failed: Output { status: ExitStatus(ExitStatus(256)), stdout: "", stderr: "/var/lib/jenkins/.cache/torch_extensions/test_compilation_error_formatting/main.cpp: In function \'int main()\':\n/var/lib/jenkins/.cache/torch_extensions/test_compilation_error_formatting/main.cpp:2:23: error: expected \';\' before \'}\' token\n int main() { return 0 }\n                       ^\n" }

Sep 23 23:23:57 Traceback (most recent call last): 
Sep 23 23:23:57   File "test/run_test.py", line 742, in <module> 
Sep 23 23:23:57     main() 
Sep 23 23:23:57   File "test/run_test.py", line 725, in main 
Sep 23 23:23:57     raise RuntimeError(err_message) 
Sep 23 23:23:57 RuntimeError: test_jit failed! 
Sep 23 23:23:57 =================== sccache compilation log =================== 
Sep 23 23:23:57 + cleanup 
Sep 23 23:23:57 + retcode=1 
Sep 23 23:23:57 + set +x 
Sep 23 23:23:57 ERROR:sccache::server: Compilation failed: Output { status: ExitStatus(ExitStatus(256)), stdout: "", stderr: "/var/lib/jenkins/.cache/torch_extensions/test_compilation_error_formatting/main.cpp: In function \'int main()\':\n/var/lib/jenkins/.cache/torch_extensions/test_compilation_error_formatting/main.cpp:2:23: error: expected \';\' before \'}\' token\n int main() { return 0 }\n                       ^\n" } 
Sep 23 23:23:57  
Sep 23 23:23:57 =========== If your build fails, please take a look at the log above for possible reasons =========== 
Sep 23 23:23:57 Compile requests                327 
Sep 23 23:23:57 Compile requests executed        35 
Sep 23 23:23:57 Cache hits                       27 
Sep 23 23:23:57 Cache misses                      7 
Sep 23 23:23:57 Cache timeouts                    0 
Sep 23 23:23:57 Cache read errors                 0 
Sep 23 23:23:57 Forced recaches                   0 
Sep 23 23:23:57 Cache write errors                0

pytorch_linux_bionic_py3_8_gcc9_coverage_test (9/9)

Step: "Run tests" (full log | diagnosis details | 🔁 rerun)

Sep 23 23:28:59 ERROR:sccache::server: Compilation failed: Output { status: ExitStatus(ExitStatus(256)), stdout: "", stderr: "/var/lib/jenkins/.cache/torch_extensions/test_compilation_error_formatting/main.cpp: In function ‘int main()’:\n/var/lib/jenkins/.cache/torch_extensions/test_compilation_error_formatting/main.cpp:2:22: error: expected ‘;’ before ‘}’ token\n    2 | int main() { return 0 }\n      |                      ^~\n      |                      ;\n" }

Sep 23 23:28:59     raise RuntimeError(err_message) 
Sep 23 23:28:59 RuntimeError: test_optim failed! 
Sep 23 23:28:59  
Sep 23 23:28:59 real	23m30.062s 
Sep 23 23:28:59 user	28m32.428s 
Sep 23 23:28:59 sys	3m9.095s 
Sep 23 23:28:59 + cleanup 
Sep 23 23:28:59 + retcode=1 
Sep 23 23:28:59 + set +x 
Sep 23 23:28:59 =================== sccache compilation log =================== 
Sep 23 23:28:59 ERROR:sccache::server: Compilation failed: Output { status: ExitStatus(ExitStatus(256)), stdout: "", stderr: "/var/lib/jenkins/.cache/torch_extensions/test_compilation_error_formatting/main.cpp: In function ‘int main()’:\n/var/lib/jenkins/.cache/torch_extensions/test_compilation_error_formatting/main.cpp:2:22: error: expected ‘;’ before ‘}’ token\n    2 | int main() { return 0 }\n      |                      ^~\n      |                      ;\n" } 
Sep 23 23:28:59  
Sep 23 23:28:59 =========== If your build fails, please take a look at the log above for possible reasons =========== 
Sep 23 23:28:59 Compile requests                327 
Sep 23 23:28:59 Compile requests executed        35 
Sep 23 23:28:59 Cache hits                       27 
Sep 23 23:28:59 Cache misses                      7 
Sep 23 23:28:59 Cache timeouts                    0 
Sep 23 23:28:59 Cache read errors                 0 
Sep 23 23:28:59 Forced recaches                   0 
Sep 23 23:28:59 Cache write errors                0

❄️ 1 failure tentatively classified as flaky

but reruns have not yet been triggered to confirm:

pytorch_linux_xenial_cuda10_2_cudnn7_py3_gcc7_test (1/1)

Step: "Run tests" (full log | diagnosis details | 🔁 rerun) ❄️

Sep 23 23:59:18 ConnectionResetError: [Errno 104] Connection reset by peer

Sep 23 23:59:18   File "/opt/conda/lib/python3.6/multiprocessing/connection.py", line 455, in accept 
Sep 23 23:59:18     deliver_challenge(c, self._authkey) 
Sep 23 23:59:18   File "/opt/conda/lib/python3.6/multiprocessing/connection.py", line 722, in deliver_challenge 
Sep 23 23:59:18     response = connection.recv_bytes(256)        # reject large message 
Sep 23 23:59:18   File "/opt/conda/lib/python3.6/multiprocessing/connection.py", line 216, in recv_bytes 
Sep 23 23:59:18     buf = self._recv_bytes(maxlength) 
Sep 23 23:59:18   File "/opt/conda/lib/python3.6/multiprocessing/connection.py", line 407, in _recv_bytes 
Sep 23 23:59:18     buf = self._recv(4) 
Sep 23 23:59:18   File "/opt/conda/lib/python3.6/multiprocessing/connection.py", line 379, in _recv 
Sep 23 23:59:18     chunk = read(handle, remaining) 
Sep 23 23:59:18 ConnectionResetError: [Errno 104] Connection reset by peer 
Sep 23 23:59:18 /opt/conda/lib/python3.6/multiprocessing/semaphore_tracker.py:143: UserWarning: semaphore_tracker: There appear to be 14 leaked semaphores to clean up at shutdown 
Sep 23 23:59:18   len(cache)) 
Sep 23 23:59:21 Process ErrorTrackingProcess-380: 
Sep 23 23:59:21 Traceback (most recent call last): 
Sep 23 23:59:21   File "/opt/conda/lib/python3.6/multiprocessing/process.py", line 258, in _bootstrap 
Sep 23 23:59:21     self.run() 
Sep 23 23:59:21   File "/var/lib/jenkins/workspace/test/test_dataloader.py", line 361, in run 
Sep 23 23:59:21     super(ErrorTrackingProcess, self).run() 
Sep 23 23:59:21   File "/opt/conda/lib/python3.6/multiprocessing/process.py", line 93, in run 
Sep 23 23:59:21     self._target(*self._args, **self._kwargs)

Extra GitHub checks: 1 failed

Failed: GitHub Actions - flake8-py3

ci.pytorch.org: 1 failed

Failed: pr/pytorch-linux-bionic-rocm3.7-py3.6

This comment was automatically generated by Dr. CI (expand for details).

Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions on the GitHub issue tracker or post in the (internal) Dr. CI Users group.

See how this bot performed.

This comment has been revised 131 times.

[ghstack-poisoned]

ghstack-source-id: 6231ead Pull Request resolved: #43507

[ghstack-poisoned]

Differential Revision: [D23331893](https://our.internmc.facebook.com/intern/diff/D23331893) [ghstack-poisoned]

ngimel

This generally looks good, but it reveals that we need a few more (TensorList, ScalarList) operations, because now there are still a few loops over parameters/grads.

ngimel · 2020-09-08T16:53:37Z