Skip to content

Conversation

@scopatz
Copy link
Contributor

@scopatz scopatz commented Aug 5, 2020

Fixes #40829

This is cross-platform but I have only tried it on linux, personally. Also, I am not fully certain of the usage pattern, so if there are any additional features / adjustments / tests that you want me to add, please just let me know!

CC @ezyang @rgommers

@scopatz
Copy link
Contributor Author

scopatz commented Aug 5, 2020

Also, this is meant to be run from the root-level of the pytorch repo. I haven't added any robustness checks there. I am happy to, if desired, though.

@@ -0,0 +1,252 @@
#!/usr/bin/env python3
"""Checks out the nightly development version of PyTorch and installs pre-built
binaries into the repo.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add usage instructions here @scopatz, and also in CONTRIBUTING.md?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With a big warning probably: don't do this in your default/root conda env:)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or is that fine? I didn't see it create a new env, so hurriedly hit Ctrl-C

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I did this too! Need to be more clear about what the tool is actually going to do, especially if it's going to do things like make new conda environments. A common situation is there's already an env a user wants to use, but not necessarily!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is totally fine to use this in the root conda-env. I have added some prelim docs. Let me know if more are needed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ie - this doesn't touch the base env in anyway


def install():
"""Development install of PyTorch"""
deps, pytorch, platform = conda_solve()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This nicely gets the runtime dependencies, but since we're in a new env I'm missing some essential test/dev tools. Can we add mypy, pytest, ipython, sphinx? That covers the basics I think.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there should be an option to reuse an env

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll add both of these. In the reuse case, do you want the latest deps installed or just a straight reuse?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd guess straight reuse? If you already have an env set up, it's likely recent so probably no need to download large binaries in case of a small version bump somewhere.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah - I guess my concern with straight reuse is that the dependencies are not compatible anymore with what is in the existing environment.

@dr-ci
Copy link

dr-ci bot commented Aug 5, 2020

💊 CI failures summary and remediations

As of commit 63cc025 (more details on the Dr. CI page):


  • 4/4 failures possibly* introduced in this PR
    • 1/4 non-CircleCI failure(s)

🕵️ 3 new failures recognized by patterns

The following CI failures do not appear to be due to upstream breakages:

See CircleCI build pytorch_linux_xenial_py3_6_gcc5_4_build (1/3)

Step: "(Optional) Merge target branch" (full log | diagnosis details | 🔁 rerun)

Automatic merge failed; fix conflicts and then commit the result.
CONFLICT (add/add): Merge conflict in .circleci/docker/common/install_conda.sh 
Auto-merging .circleci/docker/common/install_conda.sh 
CONFLICT (add/add): Merge conflict in .circleci/docker/common/install_base.sh 
Auto-merging .circleci/docker/common/install_base.sh 
CONFLICT (add/add): Merge conflict in .circleci/docker/build.sh 
Auto-merging .circleci/docker/build.sh 
CONFLICT (add/add): Merge conflict in .circleci/config.yml 
Auto-merging .circleci/config.yml 
CONFLICT (add/add): Merge conflict in .circleci/cimodel/data/windows_build_definitions.py 
Auto-merging .circleci/cimodel/data/windows_build_definitions.py 
Automatic merge failed; fix conflicts and then commit the result. 

See CircleCI build pytorch_xla_linux_bionic_py3_6_clang9_build (2/3)

Step: "(Optional) Merge target branch" (full log | diagnosis details | 🔁 rerun)

Automatic merge failed; fix conflicts and then commit the result.
CONFLICT (add/add): Merge conflict in .circleci/docker/common/install_conda.sh 
Auto-merging .circleci/docker/common/install_conda.sh 
CONFLICT (add/add): Merge conflict in .circleci/docker/common/install_base.sh 
Auto-merging .circleci/docker/common/install_base.sh 
CONFLICT (add/add): Merge conflict in .circleci/docker/build.sh 
Auto-merging .circleci/docker/build.sh 
CONFLICT (add/add): Merge conflict in .circleci/config.yml 
Auto-merging .circleci/config.yml 
CONFLICT (add/add): Merge conflict in .circleci/cimodel/data/windows_build_definitions.py 
Auto-merging .circleci/cimodel/data/windows_build_definitions.py 
Automatic merge failed; fix conflicts and then commit the result. 

See CircleCI build pytorch_windows_vs2019_py36_cuda11.0_build (3/3)

Step: "Build" (full log | diagnosis details | 🔁 rerun)

Error generating file
Retry attempt 3: 
C:/Users/circleci/project/aten/src/ATen/native/sparse/cuda/SparseCUDABlas.cu(236): error: identifier "cusparseScsrmm2" is undefined

C:/Users/circleci/project/aten/src/ATen/native/sparse/cuda/SparseCUDABlas.cu(259): error: identifier "cusparseDcsrmm2" is undefined

2 errors detected in the compilation of "C:/Users/circleci/project/aten/src/ATen/native/sparse/cuda/SparseCUDABlas.cu".
SparseCUDABlas.cu
-- Removing C:/Users/circleci/project/build/caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/sparse/cuda/./torch_cuda_generated_SparseCUDABlas.cu.obj
C:/Jenkins/Miniconda3/Library/bin/cmake.exe -E remove C:/Users/circleci/project/build/caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/sparse/cuda/./torch_cuda_generated_SparseCUDABlas.cu.obj
CMake Error at torch_cuda_generated_SparseCUDABlas.cu.obj.Release.cmake:281 (message):
  Error generating file
  C:/Users/circleci/project/build/caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/sparse/cuda/./torch_cuda_generated_SparseCUDABlas.cu.obj


a3\Library\bin\cmake.exe -D verbose:BOOL=ON -D build_configuration:STRING=Release -D generated_file:STRING=C:/Users/circleci/project/build/caffe2/CMakeFiles/torch_cuda.dir/operators/./torch_cuda_generated_cos_op.cu.obj -D generated_cubin_file:STRING=C:/Users/circleci/project/build/caffe2/CMakeFiles/torch_cuda.dir/operators/./torch_cuda_generated_cos_op.cu.obj.cubin.txt -P C:/Users/circleci/project/build/caffe2/CMakeFiles/torch_cuda.dir/operators/torch_cuda_generated_cos_op.cu.obj.Release.cmake" 
-- Removing C:/Users/circleci/project/build/caffe2/CMakeFiles/torch_cuda.dir/operators/./torch_cuda_generated_cos_op.cu.obj
C:/Jenkins/Miniconda3/Library/bin/cmake.exe -E remove C:/Users/circleci/project/build/caffe2/CMakeFiles/torch_cuda.dir/operators/./torch_cuda_generated_cos_op.cu.obj
-- Generating dependency file: C:/Users/circleci/project/build/caffe2/CMakeFiles/torch_cuda.dir/operators/torch_cuda_generated_cos_op.cu.obj.NVCC-depend
hird_party/catch/single_include -IC:/Users/circleci/project/aten/src/ATen/.. -IC:/Users/circleci/project/build/caffe2/aten/src/ATen -IC:/Users/circleci/project/c10/cuda/../.. -IC:/Users/circleci/project/c10/../ "-IC:/Program Files/NVIDIA Corporation/NvToolsExt/include" -IC:/Users/circleci/project/torch/csrc/api -IC:/Users/circleci/project/torch/csrc/api/include -IC:/Users/circleci/project/build/third_party/ideep/mkl-dnn/include -IC:/Users/circleci/project/third_party/ideep/mkl-dnn/src/../include
cos_op.cu 
-- Generating temporary cmake readable file: C:/Users/circleci/project/build/caffe2/CMakeFiles/torch_cuda.dir/operators/torch_cuda_generated_cos_op.cu.obj.depend.tmp

ci.pytorch.org: 1 failed


This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions on the GitHub issue tracker or post in the (internal) Dr. CI Users group.

See how this bot performed.

This comment has been revised 37 times.

@rgommers
Copy link
Collaborator

rgommers commented Aug 5, 2020

First test on macOS: running the tool works fine, then installing dev tools mypy, ipython et al., then:

$ ipython
Python 3.8.5 (default, Aug  5 2020, 03:39:04) 
Type 'copyright', 'credits' or 'license' for more information
IPython 7.16.1 -- An enhanced Interactive Python. Type '?' for help.

In [1]: import torch                                                                                                  
---------------------------------------------------------------------------
OSError                                   Traceback (most recent call last)
<ipython-input-1-eb42ca6e4af3> in <module>
----> 1 import torch

~/Code/pytorch/torch/__init__.py in <module>
    186     # See Note [Global dependencies]
    187     if USE_GLOBAL_DEPS:
--> 188         _load_global_deps()
    189     from torch._C import *
    190 

~/Code/pytorch/torch/__init__.py in _load_global_deps()
    139     lib_path = os.path.join(os.path.dirname(here), 'lib', lib_name)
    140 
--> 141     ctypes.CDLL(lib_path, mode=ctypes.RTLD_GLOBAL)
    142 
    143 

~/anaconda3/envs/pytorch-deps/lib/python3.8/ctypes/__init__.py in __init__(self, name, mode, handle, use_errno, use_last_error, winmode)
    371 
    372         if handle is None:
--> 373             self._handle = _dlopen(self._name, mode)
    374         else:
    375             self._handle = handle

OSError: dlopen(/Users/rgommers/Code/pytorch/torch/lib/libtorch_global_deps.dylib, 10): image not found

One other issue may be that conda list doesn't contain a pytorch entry; that doesn't explain the above traceback, but it's going to be needed. I.e. simulate the non-build part that running python setup.py develop does, so import torch will work from places other than the repo root.

@rgommers
Copy link
Collaborator

rgommers commented Aug 5, 2020

Also, this is meant to be run from the root-level of the pytorch repo. I haven't added any robustness checks there. I am happy to, if desired, though.

A very basic check may be useful (e.g. dir contains setup.py file with the word PyTorch in it), given that failing after downloading 100's of MBs isn't ideal.

move_nightly_files(spdir, platform)
pytdir.cleanup()
print("-------\nPyTorch Development Environment set up!\nPlease activate to "
"enable this environment:\n $ conda activate pytorch-deps")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe go ahead and just run the activate command at the end of the script?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that is possible. The activation needs to happen in the calling shell session. If you called it in the script here, it would disappear as soon as the process ended.

@rgommers
Copy link
Collaborator

rgommers commented Aug 5, 2020

Things I will try to test after running the script:

$ mypy
Success: no issues found in 1031 source files
$ ipython
>>> import torch
pytest test/test_nn.py -k Loss -v

@ezyang
Copy link
Contributor

ezyang commented Aug 6, 2020

I wonder if we can incorporate this tool into the nightly build process so we can get signal when it stops working on a latest nightly

@ezyang
Copy link
Contributor

ezyang commented Aug 6, 2020

mypy didn't work when I ran the script, because the pyi stub files weren't installed to torch/_C (and other places)

@kurtamohler
Copy link
Collaborator

I notice that after switching into the pytorch-deps environment, I'm able to import torch while my cwd is the root of the repo. But if I move into a subdirectory, like docs, I can't import torch anymore. Is there a way to make torch available everywhere? pytorch/setup.py apparently does it, but I'm not sure how.

@rgommers
Copy link
Collaborator

rgommers commented Aug 6, 2020

Is there a way to make torch available everywhere? pytorch/setup.py apparently does it, but I'm not sure how.

Yep, that's what I meant with my comment above:

One other issue may be that conda list doesn't contain a pytorch entry; that doesn't explain the above traceback, but it's going to be needed. I.e. simulate the non-build part that running python setup.py develop does, so import torch will work from places other than the repo root.

@scopatz
Copy link
Contributor Author

scopatz commented Aug 6, 2020

Alright, I have fixed most of the known issues. I'll get to testing with the commands that @rgommers supplied (thanks!) and the *pyi issue tomorrow.

@scopatz
Copy link
Contributor Author

scopatz commented Aug 7, 2020

The latest commit works with all of the test commands that @rgommers mentioned. The one caveat is that pytest does not add the current directory to the python path. So the pytest command should be one of the following:

$ env PYTHONPATH=. pytest test/test_nn.py -k Loss -v
$ python -m pytest test/test_nn.py -k Loss -v

There may be a way to configure the repo to do this automatically, but I believe it would would dirupt the standard workflow so I have not added any adjustments here.

@scopatz
Copy link
Contributor Author

scopatz commented Aug 7, 2020

I notice that after switching into the pytorch-deps environment, I'm able to import torch while my cwd is the root of the repo. But if I move into a subdirectory, like docs, I can't import torch anymore. Is there a way to make torch available everywhere? pytorch/setup.py apparently does it, but I'm not sure how.

The way to do this is to add something to the conda environment. However, I think this is kind of a bad idea to just add something directly, because conda wouldn't know about it. Something like this could be done with an activation script that is run perhaps. Frankly, this might be more work than it is worth. I'll think about if there is a clever solution.

@scopatz
Copy link
Contributor Author

scopatz commented Aug 7, 2020

This would solve the pytest issue, though presumably

@rgommers
Copy link
Collaborator

There seem to be generic problems with conda create --yes --dry-run --name __pytorch__ -c pytoch-nightly pytorch on windows. Conda can't solve the environment and hives and error about cuda versions not matching, even though they do. Has anyone else seen this issue?

Just a sanity check, it's not due the typo in -c pytoch-nightly right? If nightlies aren't installable on Windows at all in an empty new conda env, that'd be surprising.

@scopatz
Copy link
Contributor Author

scopatz commented Aug 11, 2020

Just a sanity check, it's not due the typo in -c pytoch-nightly right?

Yes sorry, that was just a typo in the comment. My shell history had the correct command, and I just retried it and it failed again. So I don't believe that you can install pytorch nightlies from a bare conda install. Perhaps it is because I have cuda 10.2 installed. I am trying to figure out if it is system related.

@scopatz
Copy link
Contributor Author

scopatz commented Aug 11, 2020

Still failing shm tests on OS X. I guess we can chalk this up as "not yet supported"?

@ezyang - yeah not yet. This would involve some RPATH rewriting, which probably is not worth it on the first pass here. The shm tests seem to work on Linux, FWIW

@ezyang
Copy link
Contributor

ezyang commented Aug 12, 2020

Say I have a patch set on old nightly, and I want to "pull" the latest nightly (maybe because it has got some new stuff I need). Is there a way to do this? It sounds like right now I have to create a new branch for the latest nightly from scratch, and then manually cherry-pick/merge my old patches over.

else:
# create new environment
existing_env = False
env_opts = ["--name", "pytorch-deps"]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, if I install nightly into two unrelated PyTorch worktrees, they will share the same pytorch-deps? Is the idea here that it's ok to share the environments because only dependencies are installed in the environment?

This still seems like a questionable assumption, because you then install a pytorch-nightly.pth to simulate a setup.py develop. So the environments seem worktree dependent, and then we should try hard not to clobber them in this situation

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would only be by default. The user can supply the new/different envrionments at the command line if they want to keep the isolation. If you were doing this multiple times, I think it would make sense to have different environments

cmd.extend(SPECS_TO_INSTALL)
p = subprocess.run(cmd, capture_output=True, check=True)
# parse solution
solve = json.loads(p.stdout)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd suggest debug logging this


def init_logging(level=logging.INFO):
"""Start up the logger"""
logging.basicConfig(level=level)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're probably going to end up spending some time debugging end user issues, especially from users who are not technically savvy. Might I suggest you consider adding a "rage" based mechanism? At Facebook, "rage" refers to the capability to retroactively get the logs from a run that failed, rather than having to ask the user to rerun the command again with logging. I have a simple implementation at https://github.com/ezyang/ghstack/blob/master/ghstack/logging.py

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the link. I'll boost the logging game here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have added logging based on what you had here. In the process I noticed some issues with conda v4.8.4 not being able to perform the solve. Let me know if you encounter that. I downgraded to v4.8.3

@ezyang
Copy link
Contributor

ezyang commented Aug 12, 2020

I notice there are no tests for the script. If we're not going to test this as part of PyTorch CI, I think it might be better to keep this in a third-party repository (we could put it in the pytorch org if you like). (I'm also not sure how you would even test this in PyTorch CI, since it's tied to nightlies.) You aren't really getting any benefits from being in main PyTorch repository if you're not hooked up to CI; in a separate repo, I could give you direct commit bits and it would be easier to iterate on the script quickly.

@ezyang
Copy link
Contributor

ezyang commented Aug 12, 2020

I chatted with @soumith about this and we decided that if you want a "move fast" version, you should just maintain a fork of PyTorch, but let's keep it in repo otherwise. I'm going to start landing.

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ezyang has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@scopatz
Copy link
Contributor Author

scopatz commented Aug 12, 2020

Say I have a patch set on old nightly, and I want to "pull" the latest nightly (maybe because it has got some new stuff I need). Is there a way to do this? It sounds like right now I have to create a new branch for the latest nightly from scratch, and then manually cherry-pick/merge my old patches over.

Right, so @rgommers had this same question. It really depends on what semantics you want here. I am happy to implement anything. Do you want more or a pull or a rebase here? Or support both with an optional parameter? Or if this is the direction we are going, it would be reasonable to just call this script nightly.py and then implement checkout, pull/merge, and rebase

@rgommers
Copy link
Collaborator

rgommers commented Aug 12, 2020

Wouldn't it be a matter of

./tools/nightly_checkout.py -b new-branch   # fresh branch based on latest nightly
git co old-branch
git rebase new-branch  # or squash then rebase, or merge new-branch into old-branch

?

What I missed is that the new branch creation the way this script currently does it is necessary, because otherwise the git commit that's checked out and the .so's that are injected are out of sync and may not work.

Given that, the tool really can't help much with reducing the rebase/merge that's needed after that for any old-branch.

@scopatz
Copy link
Contributor Author

scopatz commented Aug 12, 2020

@ezyang - I can also work on adding this a a test to pytorch/builder too if you would like. I think it is pretty straight forward and relatively fast

@scopatz
Copy link
Contributor Author

scopatz commented Aug 12, 2020

@rgommers - I think merging/rebasing in the new nightly commit is not that big of a deal if all that is needed is an update to contain the nightly's HEAD. We would just recopy the .sos for that new nightly.

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ezyang has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@ezyang
Copy link
Contributor

ezyang commented Aug 14, 2020

Do you want more or a pull or a rebase here? Or support both with an optional parameter? Or if this is the direction we are going, it would be reasonable to just call this script nightly.py and then implement checkout, pull/merge, and rebase

I actually kinda like that last idea. I do think pull and merge both make sense.

Wouldn't it be a matter of

I agree this would work, but it's kind of a lot of commands for what should be a very simple operation...

@ezyang
Copy link
Contributor

ezyang commented Aug 14, 2020

Internal lint suggested this:

Resource pytdir is acquired but not always released. Consider using with to acquire the resource.
https://www.internalfb.com/intern/wiki/Python/lint fbcode/caffe2/tools/nightly_checkout.py Line 294

@facebook-github-bot
Copy link
Contributor

@ezyang merged this pull request in 21823aa.

@scopatz scopatz deleted the nightly-tool branch August 17, 2020 21:07
@scopatz scopatz mentioned this pull request Aug 19, 2020
facebook-github-bot pushed a commit that referenced this pull request Aug 20, 2020
Summary:
Fixes #40829

This addresses remaining issues/improvements in #40829 that were brought up prior to #42635 being merged.  Namely, this changes the name of the script and adds separate `checkout` and `pull` subcommands. I have tested it locally and everything appears to work.  Please let me know if you encounter any issues. I hope that this supports a more natural workflow.

CC ezyang rgommers

Pull Request resolved: #43294

Reviewed By: pbelevich

Differential Revision: D23241849

Pulled By: ezyang

fbshipit-source-id: c24556024d7e5d14b9a5006e927819d4ad370dd7
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Merged open source triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Work on PyTorch Python code without having to build from source

7 participants