Nightly checkout tool #42635

scopatz · 2020-08-05T21:35:40Z

This is cross-platform but I have only tried it on linux, personally. Also, I am not fully certain of the usage pattern, so if there are any additional features / adjustments / tests that you want me to add, please just let me know!

CC @ezyang @rgommers

scopatz · 2020-08-05T21:38:34Z

Also, this is meant to be run from the root-level of the pytorch repo. I haven't added any robustness checks there. I am happy to, if desired, though.

rgommers · 2020-08-05T21:47:40Z

tools/nightly_checkout.py

@@ -0,0 +1,252 @@
+#!/usr/bin/env python3
+"""Checks out the nightly development version of PyTorch and installs pre-built
+binaries into the repo.


Can you add usage instructions here @scopatz, and also in CONTRIBUTING.md?

With a big warning probably: don't do this in your default/root conda env:)

Or is that fine? I didn't see it create a new env, so hurriedly hit Ctrl-C

Yeah, I did this too! Need to be more clear about what the tool is actually going to do, especially if it's going to do things like make new conda environments. A common situation is there's already an env a user wants to use, but not necessarily!

It is totally fine to use this in the root conda-env. I have added some prelim docs. Let me know if more are needed

ie - this doesn't touch the base env in anyway

rgommers · 2020-08-05T21:58:16Z

tools/nightly_checkout.py

+
+def install():
+    """Development install of PyTorch"""
+    deps, pytorch, platform = conda_solve()


This nicely gets the runtime dependencies, but since we're in a new env I'm missing some essential test/dev tools. Can we add mypy, pytest, ipython, sphinx? That covers the basics I think.

I think there should be an option to reuse an env

I'll add both of these. In the reuse case, do you want the latest deps installed or just a straight reuse?

I'd guess straight reuse? If you already have an env set up, it's likely recent so probably no need to download large binaries in case of a small version bump somewhere.

Yeah - I guess my concern with straight reuse is that the dependencies are not compatible anymore with what is in the existing environment.

dr-ci · 2020-08-05T22:01:27Z

💊 CI failures summary and remediations

As of commit 63cc025 (more details on the Dr. CI page):

4/4 failures possibly* introduced in this PR
- 1/4 non-CircleCI failure(s)

🕵️ 3 new failures recognized by patterns

The following CI failures do not appear to be due to upstream breakages:

pytorch_linux_xenial_py3_6_gcc5_4_build (1/3)

Step: "(Optional) Merge target branch" (full log | diagnosis details | 🔁 rerun)

Automatic merge failed; fix conflicts and then commit the result.

CONFLICT (add/add): Merge conflict in .circleci/docker/common/install_conda.sh 
Auto-merging .circleci/docker/common/install_conda.sh 
CONFLICT (add/add): Merge conflict in .circleci/docker/common/install_base.sh 
Auto-merging .circleci/docker/common/install_base.sh 
CONFLICT (add/add): Merge conflict in .circleci/docker/build.sh 
Auto-merging .circleci/docker/build.sh 
CONFLICT (add/add): Merge conflict in .circleci/config.yml 
Auto-merging .circleci/config.yml 
CONFLICT (add/add): Merge conflict in .circleci/cimodel/data/windows_build_definitions.py 
Auto-merging .circleci/cimodel/data/windows_build_definitions.py 
Automatic merge failed; fix conflicts and then commit the result.

pytorch_xla_linux_bionic_py3_6_clang9_build (2/3)

Step: "(Optional) Merge target branch" (full log | diagnosis details | 🔁 rerun)

Automatic merge failed; fix conflicts and then commit the result.

CONFLICT (add/add): Merge conflict in .circleci/docker/common/install_conda.sh 
Auto-merging .circleci/docker/common/install_conda.sh 
CONFLICT (add/add): Merge conflict in .circleci/docker/common/install_base.sh 
Auto-merging .circleci/docker/common/install_base.sh 
CONFLICT (add/add): Merge conflict in .circleci/docker/build.sh 
Auto-merging .circleci/docker/build.sh 
CONFLICT (add/add): Merge conflict in .circleci/config.yml 
Auto-merging .circleci/config.yml 
CONFLICT (add/add): Merge conflict in .circleci/cimodel/data/windows_build_definitions.py 
Auto-merging .circleci/cimodel/data/windows_build_definitions.py 
Automatic merge failed; fix conflicts and then commit the result.

pytorch_windows_vs2019_py36_cuda11.0_build (3/3)

Step: "Build" (full log | diagnosis details | 🔁 rerun)

Error generating file

Retry attempt 3: 
C:/Users/circleci/project/aten/src/ATen/native/sparse/cuda/SparseCUDABlas.cu(236): error: identifier "cusparseScsrmm2" is undefined

C:/Users/circleci/project/aten/src/ATen/native/sparse/cuda/SparseCUDABlas.cu(259): error: identifier "cusparseDcsrmm2" is undefined

2 errors detected in the compilation of "C:/Users/circleci/project/aten/src/ATen/native/sparse/cuda/SparseCUDABlas.cu".
SparseCUDABlas.cu
-- Removing C:/Users/circleci/project/build/caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/sparse/cuda/./torch_cuda_generated_SparseCUDABlas.cu.obj
C:/Jenkins/Miniconda3/Library/bin/cmake.exe -E remove C:/Users/circleci/project/build/caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/sparse/cuda/./torch_cuda_generated_SparseCUDABlas.cu.obj
CMake Error at torch_cuda_generated_SparseCUDABlas.cu.obj.Release.cmake:281 (message):
  Error generating file
  C:/Users/circleci/project/build/caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/sparse/cuda/./torch_cuda_generated_SparseCUDABlas.cu.obj


a3\Library\bin\cmake.exe -D verbose:BOOL=ON -D build_configuration:STRING=Release -D generated_file:STRING=C:/Users/circleci/project/build/caffe2/CMakeFiles/torch_cuda.dir/operators/./torch_cuda_generated_cos_op.cu.obj -D generated_cubin_file:STRING=C:/Users/circleci/project/build/caffe2/CMakeFiles/torch_cuda.dir/operators/./torch_cuda_generated_cos_op.cu.obj.cubin.txt -P C:/Users/circleci/project/build/caffe2/CMakeFiles/torch_cuda.dir/operators/torch_cuda_generated_cos_op.cu.obj.Release.cmake" 
-- Removing C:/Users/circleci/project/build/caffe2/CMakeFiles/torch_cuda.dir/operators/./torch_cuda_generated_cos_op.cu.obj
C:/Jenkins/Miniconda3/Library/bin/cmake.exe -E remove C:/Users/circleci/project/build/caffe2/CMakeFiles/torch_cuda.dir/operators/./torch_cuda_generated_cos_op.cu.obj
-- Generating dependency file: C:/Users/circleci/project/build/caffe2/CMakeFiles/torch_cuda.dir/operators/torch_cuda_generated_cos_op.cu.obj.NVCC-depend
hird_party/catch/single_include -IC:/Users/circleci/project/aten/src/ATen/.. -IC:/Users/circleci/project/build/caffe2/aten/src/ATen -IC:/Users/circleci/project/c10/cuda/../.. -IC:/Users/circleci/project/c10/../ "-IC:/Program Files/NVIDIA Corporation/NvToolsExt/include" -IC:/Users/circleci/project/torch/csrc/api -IC:/Users/circleci/project/torch/csrc/api/include -IC:/Users/circleci/project/build/third_party/ideep/mkl-dnn/include -IC:/Users/circleci/project/third_party/ideep/mkl-dnn/src/../include
cos_op.cu 
-- Generating temporary cmake readable file: C:/Users/circleci/project/build/caffe2/CMakeFiles/torch_cuda.dir/operators/torch_cuda_generated_cos_op.cu.obj.depend.tmp

ci.pytorch.org: 1 failed

Failed: pr/pytorch-linux-xenial-rocm3.5.1-py3.6

This comment was automatically generated by Dr. CI (expand for details).

Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions on the GitHub issue tracker or post in the (internal) Dr. CI Users group.

See how this bot performed.

This comment has been revised 37 times.

rgommers · 2020-08-05T22:03:12Z

First test on macOS: running the tool works fine, then installing dev tools mypy, ipython et al., then:

$ ipython
Python 3.8.5 (default, Aug  5 2020, 03:39:04) 
Type 'copyright', 'credits' or 'license' for more information
IPython 7.16.1 -- An enhanced Interactive Python. Type '?' for help.

In [1]: import torch                                                                                                  
---------------------------------------------------------------------------
OSError                                   Traceback (most recent call last)
<ipython-input-1-eb42ca6e4af3> in <module>
----> 1 import torch

~/Code/pytorch/torch/__init__.py in <module>
    186     # See Note [Global dependencies]
    187     if USE_GLOBAL_DEPS:
--> 188         _load_global_deps()
    189     from torch._C import *
    190 

~/Code/pytorch/torch/__init__.py in _load_global_deps()
    139     lib_path = os.path.join(os.path.dirname(here), 'lib', lib_name)
    140 
--> 141     ctypes.CDLL(lib_path, mode=ctypes.RTLD_GLOBAL)
    142 
    143 

~/anaconda3/envs/pytorch-deps/lib/python3.8/ctypes/__init__.py in __init__(self, name, mode, handle, use_errno, use_last_error, winmode)
    371 
    372         if handle is None:
--> 373             self._handle = _dlopen(self._name, mode)
    374         else:
    375             self._handle = handle

OSError: dlopen(/Users/rgommers/Code/pytorch/torch/lib/libtorch_global_deps.dylib, 10): image not found

One other issue may be that conda list doesn't contain a pytorch entry; that doesn't explain the above traceback, but it's going to be needed. I.e. simulate the non-build part that running python setup.py develop does, so import torch will work from places other than the repo root.

rgommers · 2020-08-05T22:07:05Z

Also, this is meant to be run from the root-level of the pytorch repo. I haven't added any robustness checks there. I am happy to, if desired, though.

A very basic check may be useful (e.g. dir contains setup.py file with the word PyTorch in it), given that failing after downloading 100's of MBs isn't ideal.

rgommers · 2020-08-05T22:10:01Z

tools/nightly_checkout.py

+    move_nightly_files(spdir, platform)
+    pytdir.cleanup()
+    print("-------\nPyTorch Development Environment set up!\nPlease activate to "
+          "enable this environment:\n  $ conda activate pytorch-deps")


Maybe go ahead and just run the activate command at the end of the script?

I don't think that is possible. The activation needs to happen in the calling shell session. If you called it in the script here, it would disappear as soon as the process ended.

rgommers · 2020-08-05T22:12:30Z

Things I will try to test after running the script:

$ mypy
Success: no issues found in 1031 source files

$ ipython
>>> import torch

pytest test/test_nn.py -k Loss -v

ezyang · 2020-08-06T02:22:23Z

I wonder if we can incorporate this tool into the nightly build process so we can get signal when it stops working on a latest nightly

tools/nightly_checkout.py

ezyang · 2020-08-06T02:35:21Z

mypy didn't work when I ran the script, because the pyi stub files weren't installed to torch/_C (and other places)

tools/nightly_checkout.py

kurtamohler · 2020-08-06T19:07:59Z

I notice that after switching into the pytorch-deps environment, I'm able to import torch while my cwd is the root of the repo. But if I move into a subdirectory, like docs, I can't import torch anymore. Is there a way to make torch available everywhere? pytorch/setup.py apparently does it, but I'm not sure how.

rgommers · 2020-08-06T19:23:37Z

Is there a way to make torch available everywhere? pytorch/setup.py apparently does it, but I'm not sure how.

Yep, that's what I meant with my comment above:

One other issue may be that conda list doesn't contain a pytorch entry; that doesn't explain the above traceback, but it's going to be needed. I.e. simulate the non-build part that running python setup.py develop does, so import torch will work from places other than the repo root.

scopatz · 2020-08-06T22:04:28Z

Alright, I have fixed most of the known issues. I'll get to testing with the commands that @rgommers supplied (thanks!) and the *pyi issue tomorrow.

scopatz · 2020-08-07T15:38:15Z

The latest commit works with all of the test commands that @rgommers mentioned. The one caveat is that pytest does not add the current directory to the python path. So the pytest command should be one of the following:

$ env PYTHONPATH=. pytest test/test_nn.py -k Loss -v
$ python -m pytest test/test_nn.py -k Loss -v

There may be a way to configure the repo to do this automatically, but I believe it would would dirupt the standard workflow so I have not added any adjustments here.

scopatz · 2020-08-07T15:56:38Z

I notice that after switching into the pytorch-deps environment, I'm able to import torch while my cwd is the root of the repo. But if I move into a subdirectory, like docs, I can't import torch anymore. Is there a way to make torch available everywhere? pytorch/setup.py apparently does it, but I'm not sure how.

The way to do this is to add something to the conda environment. However, I think this is kind of a bad idea to just add something directly, because conda wouldn't know about it. Something like this could be done with an activation script that is run perhaps. Frankly, this might be more work than it is worth. I'll think about if there is a clever solution.

scopatz · 2020-08-07T15:59:08Z

This would solve the pytest issue, though presumably

rgommers · 2020-08-11T15:13:13Z

There seem to be generic problems with conda create --yes --dry-run --name __pytorch__ -c pytoch-nightly pytorch on windows. Conda can't solve the environment and hives and error about cuda versions not matching, even though they do. Has anyone else seen this issue?

Just a sanity check, it's not due the typo in -c pytoch-nightly right? If nightlies aren't installable on Windows at all in an empty new conda env, that'd be surprising.

scopatz · 2020-08-11T17:59:22Z

Just a sanity check, it's not due the typo in -c pytoch-nightly right?

Yes sorry, that was just a typo in the comment. My shell history had the correct command, and I just retried it and it failed again. So I don't believe that you can install pytorch nightlies from a bare conda install. Perhaps it is because I have cuda 10.2 installed. I am trying to figure out if it is system related.

scopatz · 2020-08-11T18:00:40Z

Still failing shm tests on OS X. I guess we can chalk this up as "not yet supported"?

@ezyang - yeah not yet. This would involve some RPATH rewriting, which probably is not worth it on the first pass here. The shm tests seem to work on Linux, FWIW

ezyang · 2020-08-12T15:17:53Z

Say I have a patch set on old nightly, and I want to "pull" the latest nightly (maybe because it has got some new stuff I need). Is there a way to do this? It sounds like right now I have to create a new branch for the latest nightly from scratch, and then manually cherry-pick/merge my old patches over.

ezyang · 2020-08-12T15:32:21Z

tools/nightly_checkout.py

+    else:
+        # create new environment
+        existing_env = False
+        env_opts = ["--name", "pytorch-deps"]


So, if I install nightly into two unrelated PyTorch worktrees, they will share the same pytorch-deps? Is the idea here that it's ok to share the environments because only dependencies are installed in the environment?

This still seems like a questionable assumption, because you then install a pytorch-nightly.pth to simulate a setup.py develop. So the environments seem worktree dependent, and then we should try hard not to clobber them in this situation

This would only be by default. The user can supply the new/different envrionments at the command line if they want to keep the isolation. If you were doing this multiple times, I think it would make sense to have different environments

ezyang · 2020-08-12T15:34:24Z

tools/nightly_checkout.py

+    cmd.extend(SPECS_TO_INSTALL)
+    p = subprocess.run(cmd, capture_output=True, check=True)
+    # parse solution
+    solve = json.loads(p.stdout)


I'd suggest debug logging this

ezyang · 2020-08-12T15:44:14Z

tools/nightly_checkout.py

+
+def init_logging(level=logging.INFO):
+    """Start up the logger"""
+    logging.basicConfig(level=level)


You're probably going to end up spending some time debugging end user issues, especially from users who are not technically savvy. Might I suggest you consider adding a "rage" based mechanism? At Facebook, "rage" refers to the capability to retroactively get the logs from a run that failed, rather than having to ask the user to rerun the command again with logging. I have a simple implementation at https://github.com/ezyang/ghstack/blob/master/ghstack/logging.py

Thanks for the link. I'll boost the logging game here

I have added logging based on what you had here. In the process I noticed some issues with conda v4.8.4 not being able to perform the solve. Let me know if you encounter that. I downgraded to v4.8.3

ezyang · 2020-08-12T15:47:06Z

I notice there are no tests for the script. If we're not going to test this as part of PyTorch CI, I think it might be better to keep this in a third-party repository (we could put it in the pytorch org if you like). (I'm also not sure how you would even test this in PyTorch CI, since it's tied to nightlies.) You aren't really getting any benefits from being in main PyTorch repository if you're not hooked up to CI; in a separate repo, I could give you direct commit bits and it would be easier to iterate on the script quickly.

ezyang · 2020-08-12T16:34:38Z

I chatted with @soumith about this and we decided that if you want a "move fast" version, you should just maintain a fork of PyTorch, but let's keep it in repo otherwise. I'm going to start landing.

facebook-github-bot

@ezyang has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

scopatz · 2020-08-12T19:28:06Z

Say I have a patch set on old nightly, and I want to "pull" the latest nightly (maybe because it has got some new stuff I need). Is there a way to do this? It sounds like right now I have to create a new branch for the latest nightly from scratch, and then manually cherry-pick/merge my old patches over.

Right, so @rgommers had this same question. It really depends on what semantics you want here. I am happy to implement anything. Do you want more or a pull or a rebase here? Or support both with an optional parameter? Or if this is the direction we are going, it would be reasonable to just call this script nightly.py and then implement checkout, pull/merge, and rebase

rgommers · 2020-08-12T19:33:43Z

Wouldn't it be a matter of

./tools/nightly_checkout.py -b new-branch   # fresh branch based on latest nightly
git co old-branch
git rebase new-branch  # or squash then rebase, or merge new-branch into old-branch

?

What I missed is that the new branch creation the way this script currently does it is necessary, because otherwise the git commit that's checked out and the .so's that are injected are out of sync and may not work.

Given that, the tool really can't help much with reducing the rebase/merge that's needed after that for any old-branch.

scopatz · 2020-08-12T19:34:10Z

@ezyang - I can also work on adding this a a test to pytorch/builder too if you would like. I think it is pretty straight forward and relatively fast

scopatz · 2020-08-12T19:36:54Z

@rgommers - I think merging/rebasing in the new nightly commit is not that big of a deal if all that is needed is an update to contain the nightly's HEAD. We would just recopy the .sos for that new nightly.

facebook-github-bot

@ezyang has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

ezyang · 2020-08-14T02:42:31Z

Do you want more or a pull or a rebase here? Or support both with an optional parameter? Or if this is the direction we are going, it would be reasonable to just call this script nightly.py and then implement checkout, pull/merge, and rebase

I actually kinda like that last idea. I do think pull and merge both make sense.

Wouldn't it be a matter of

I agree this would work, but it's kind of a lot of commands for what should be a very simple operation...

ezyang · 2020-08-14T02:49:59Z

Internal lint suggested this:

Resource pytdir is acquired but not always released. Consider using with to acquire the resource.
https://www.internalfb.com/intern/wiki/Python/lint fbcode/caffe2/tools/nightly_checkout.py Line 294

facebook-github-bot · 2020-08-14T05:46:20Z

@ezyang merged this pull request in 21823aa.

Summary: Fixes #40829 This addresses remaining issues/improvements in #40829 that were brought up prior to #42635 being merged. Namely, this changes the name of the script and adds separate `checkout` and `pull` subcommands. I have tested it locally and everything appears to work. Please let me know if you encounter any issues. I hope that this supports a more natural workflow. CC ezyang rgommers Pull Request resolved: #43294 Reviewed By: pbelevich Differential Revision: D23241849 Pulled By: ezyang fbshipit-source-id: c24556024d7e5d14b9a5006e927819d4ad370dd7

nightly checkout tool

c800080

scopatz mentioned this pull request Aug 5, 2020

Work on PyTorch Python code without having to build from source #40829

Closed

rgommers reviewed Aug 5, 2020

View reviewed changes

pytorchbot added the open source label Aug 5, 2020

rgommers reviewed Aug 5, 2020

View reviewed changes

ezyang reviewed Aug 6, 2020

View reviewed changes

tools/nightly_checkout.py Outdated Show resolved Hide resolved

ezyang reviewed Aug 6, 2020

View reviewed changes

tools/nightly_checkout.py Outdated Show resolved Hide resolved

kurtamohler reviewed Aug 6, 2020

View reviewed changes

tools/nightly_checkout.py Outdated Show resolved Hide resolved

kurtamohler reviewed Aug 6, 2020

View reviewed changes

tools/nightly_checkout.py Outdated Show resolved Hide resolved

scopatz added 4 commits August 6, 2020 15:16

fixed relative path issue

7c55e07

Added -n,--name & -p,--prefix arguments

14c185d

Added branch checkout and fast failure

cd9c353

added some usage docs

28fcf9f

scopatz added 2 commits August 7, 2020 10:10

Move over needed pyi files for nightly checkout

4234cf1

Added hypothesis dep for testsing in nightly

832954e

robustness check for being in PyTorch repo

2bbf23d

ezyang reviewed Aug 12, 2020

View reviewed changes

ezyang approved these changes Aug 12, 2020

View reviewed changes

facebook-github-bot reviewed Aug 12, 2020

View reviewed changes

scopatz added 2 commits August 12, 2020 16:37

added more logging to nightly checkout

f3df988

black nightly_checkout.py

63cc025

facebook-github-bot reviewed Aug 14, 2020

View reviewed changes

facebook-github-bot closed this in 21823aa Aug 14, 2020

facebook-github-bot added the merged label Aug 14, 2020

scopatz deleted the nightly-tool branch August 17, 2020 21:07

scopatz mentioned this pull request Aug 19, 2020

Nightly Pull #43294

Closed

mruberry added the Merged label Oct 28, 2020

Nightly checkout tool #42635

Nightly checkout tool #42635

Uh oh!

Conversation

scopatz commented Aug 5, 2020

Uh oh!

scopatz commented Aug 5, 2020

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dr-ci bot commented Aug 5, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

💊 CI failures summary and remediations

🕵️ 3 new failures recognized by patterns

pytorch_linux_xenial_py3_6_gcc5_4_build (1/3)

pytorch_xla_linux_bionic_py3_6_clang9_build (2/3)

pytorch_windows_vs2019_py36_cuda11.0_build (3/3)

ci.pytorch.org: 1 failed

Uh oh!

rgommers commented Aug 5, 2020

Uh oh!

rgommers commented Aug 5, 2020

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rgommers commented Aug 5, 2020

Uh oh!

ezyang commented Aug 6, 2020

Uh oh!

Uh oh!

Uh oh!

ezyang commented Aug 6, 2020

Uh oh!

Uh oh!

Uh oh!

kurtamohler commented Aug 6, 2020

Uh oh!

rgommers commented Aug 6, 2020

Uh oh!

scopatz commented Aug 6, 2020

Uh oh!

scopatz commented Aug 7, 2020

Uh oh!

scopatz commented Aug 7, 2020

Uh oh!

scopatz commented Aug 7, 2020

Uh oh!

rgommers commented Aug 11, 2020

Uh oh!

scopatz commented Aug 11, 2020

Uh oh!

scopatz commented Aug 11, 2020

Uh oh!

ezyang commented Aug 12, 2020

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

dr-ci bot commented Aug 5, 2020 •

edited

Loading

rgommers commented Aug 12, 2020 •

edited

Loading