Skip to content

Conversation

@heitorschueroff
Copy link
Contributor

@heitorschueroff heitorschueroff commented Aug 5, 2020

Stack from ghstack:

Differential Revision: D23030317

heitorschueroff added a commit that referenced this pull request Aug 5, 2020
@dr-ci
Copy link

dr-ci bot commented Aug 5, 2020

💊 CI failures summary and remediations

As of commit edfd435 (more details on the Dr. CI page):


  • 2/2 failures possibly* introduced in this PR
    • 1/2 non-CircleCI failure(s)

🕵️ 1 new failure recognized by patterns

The following CI failures do not appear to be due to upstream breakages:

See CircleCI build pytorch_macos_10_13_py3_test (1/1)

Step: "Test" (full log | diagnosis details | 🔁 rerun)

Aug 10 15:16:50 [E request_callback_no_python.cpp:618] Received error while processing request type 2: RuntimeError: Can not pickle torch.futures.Future
Aug 10 15:16:50 At: 
Aug 10 15:16:50   /Users/distiller/workspace/miniconda3/lib/python3.7/site-packages/torch/distributed/rpc/internal.py(93): serialize 
Aug 10 15:16:50   /Users/distiller/workspace/miniconda3/lib/python3.7/site-packages/torch/distributed/rpc/internal.py(145): serialize 
Aug 10 15:16:50  
Aug 10 15:16:50 [E request_callback_no_python.cpp:618] Received error while processing request type 2: RuntimeError: Can not pickle torch.futures.Future 
Aug 10 15:16:50  
Aug 10 15:16:50 At: 
Aug 10 15:16:50   /Users/distiller/workspace/miniconda3/lib/python3.7/site-packages/torch/distributed/rpc/internal.py(93): serialize 
Aug 10 15:16:50   /Users/distiller/workspace/miniconda3/lib/python3.7/site-packages/torch/distributed/rpc/internal.py(145): serialize 
Aug 10 15:16:50  
Aug 10 15:16:50 [E request_callback_no_python.cpp:618] Received error while processing request type 2: RuntimeError: Can not pickle torch.futures.Future 
Aug 10 15:16:50  
Aug 10 15:16:50 At: 
Aug 10 15:16:50   /Users/distiller/workspace/miniconda3/lib/python3.7/site-packages/torch/distributed/rpc/internal.py(93): serialize 
Aug 10 15:16:50   /Users/distiller/workspace/miniconda3/lib/python3.7/site-packages/torch/distributed/rpc/internal.py(145): serialize 
Aug 10 15:16:50  
Aug 10 15:16:50 [W tensorpipe_agent.cpp:504] RPC agent for worker0 encountered error when reading incoming request from worker1: EOF: end of file (this is expected to happen during shutdown) 
Aug 10 15:16:50 [W tensorpipe_agent.cpp:504] RPC agent for worker3 encountered error when reading incoming request from worker2: EOF: end of file (this is expected to happen during shutdown) 
Aug 10 15:16:50 [W tensorpipe_agent.cpp:504] RPC agent for worker0 encountered error when reading incoming request from worker3: EOF: end of file (this is expected to happen during shutdown) 
Aug 10 15:16:50 [W tensorpipe_agent.cpp:504] RPC agent for worker2 encountered error when reading incoming request from worker1: EOF: end of file (this is expected to happen during shutdown) 
Aug 10 15:16:50 [W tensorpipe_agent.cpp:504] RPC agent for worker0 encountered error when reading incoming request from worker2: EOF: end of file (this is expected to happen during shutdown) 

ci.pytorch.org: 1 failed


This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions on the GitHub issue tracker or post in the (internal) Dr. CI Users group.

See how this bot performed.

This comment has been revised 12 times.

@glaringlee
Copy link
Contributor

@ezyang
Hi, Ed, would CUDAUtils.h be a good place to put this function? Can you advise?

@ezyang
Copy link
Contributor

ezyang commented Aug 7, 2020

This probably doesn't belong in the cuda/ folder because it doesn't actually have any direct dependencies on CUDA headers (everything is indirected through Context). Is there a bug report / issue corresponding to this? That would help me assess.

@heitorschueroff
Copy link
Contributor Author

This probably doesn't belong in the cuda/ folder because it doesn't actually have any direct dependencies on CUDA headers (everything is indirected through Context). Is there a bug report / issue corresponding to this? That would help me assess.

#40361

@glaringlee
Copy link
Contributor

glaringlee commented Aug 7, 2020

This probably doesn't belong in the cuda/ folder because it doesn't actually have any direct dependencies on CUDA headers (everything is indirected through Context). Is there a bug report / issue corresponding to this? That would help me assess.

@ezyang
We want to have a c++ equivalent api for torch.cuda.manual_seed, since we do have equivalent api for torch.manual_seed, but not torch.cuda.manual_seed.
The torch.manual_seed equivalent api is in ATen folder here:

static inline void manual_seed(uint64_t seed) {

Please suggest where should this torch.cuda.manual_seed be placed, thanks.

@ezyang
Copy link
Contributor

ezyang commented Aug 7, 2020

Thanks, the issue helps.

There are few things you have to be careful about when writing C++ API ports of PyTorch functionality. (This is mostly directed at @glaringlee, but hopefully this also gives you some useful context @heitorschueroff).

The function of directory structure in Python is different than in C++. In Python, torch.cuda structure is just for organization purposes. In C++, the ATen/cuda folder structure has a very specific purpose, which is to demarcate what goes into the torch_cpu shared library versus the torch_cuda shared library. Because the dependency relationship between these libraries is unidirectional (torch cuda depends on torch cpu, but not vice versa), it means that code in torch_cpu cannot directly call torch_cuda; it must indirect through a dynamic dispatch of some sort. (Python doesn't have this problem, because everything is indirected there!)

Furthermore, we must be careful about distinguishing the ATen/cuda from the torch/csrc/api/include/torch/cuda.h. ATen/cuda are files put in torch_cuda; but torch/cuda.h is shipped with torch_cpu library and is available even if PyTorch wasn't compiled with CUDA (because it does dynamic dispatches). Finally, torch/cuda.h is what actually constitutes the public C++ API; headers in ATen are more private (they just happen to frequently get reexported in torch namespace, and our "official" API is often spotty so people go to ATen to find the stuff they actually need.)

So what does this mean for this issue? The user is requesting torch::cuda::manual_seed in the public C++ API. To me, that indicates that it should go in torch/csrc/api/include/torch/cuda.h. The API should be available even if you didn't build with CUDA support. You should make sure that the new functions are implemented equivalently to their Python counterparts.

@glaringlee
Copy link
Contributor

@ezyang Thanks a lot, Ed, very useful info to me.

heitorschueroff added a commit that referenced this pull request Aug 7, 2020
Copy link
Contributor

@glaringlee glaringlee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM now.

Copy link
Contributor

@glaringlee glaringlee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM now

heitorschueroff added a commit that referenced this pull request Aug 10, 2020
@facebook-github-bot
Copy link
Contributor

@heitorschueroff merged this pull request in d396d13.

@facebook-github-bot facebook-github-bot deleted the gh/heitorschueroff/3/head branch August 15, 2020 14:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants