-
Notifications
You must be signed in to change notification settings - Fork 26.3k
Add torch.multiprocessing.spawn helper #13518
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Tested on Linux. Expect this to work on macOS. Unsure about Windows. |
|
My 2cents, I would prefer this is under |
torch/multiprocessing/spawn.py
Outdated
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
facebook-github-bot
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pietern has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
colesbury
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
You can build and preview docs via cd docs; make singlehtml and then use your favorite method to serve the HTML.
(Here's a script that works with IPv6 only machines: https://gist.github.com/colesbury/b8cf3f8a2346821fdacb59ab981d4456)
torch/multiprocessing/spawn.py
Outdated
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
021d13e to
54e56dd
Compare
|
Thanks for the hints on the docs @colesbury. Since the code here is blocking some folks I will merge this first and address the documentation tomorrow. Then we can iterate on the wording without a rush. |
facebook-github-bot
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pietern has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
54e56dd to
13c9207
Compare
facebook-github-bot
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pietern has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
This helper addresses a common pattern where one spawns N processes to work on some common task (e.g. parallel preprocessing or multiple training loops). A straightforward approach is to use the multiprocessing API directly and then consecutively call join on the resulting processes. This pattern breaks down in the face of errors. If one of the processes terminates with an exception or via some signal, and it is not the first process that was launched, the join call on the first process won't be affected. This helper seeks to solve this by waiting on termination from any of the spawned processes. When any process terminates with a non-zero exit status, it terminates the remaining processes, and raises an exception in the parent process. If the process terminated with an exception, it is propagated to the parent. If the process terminated via a signal (e.g. SIGINT, SIGSEGV), this is mentioned in the exception as well. Requires Python >= 3.4.
13c9207 to
33e98fc
Compare
|
Updates to fix ROCM and Windows builds. |
facebook-github-bot
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pietern has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
This helper addresses a common pattern where one spawns N processes to
work on some common task (e.g. parallel preprocessing or multiple
training loops).
A straightforward approach is to use the multiprocessing API directly
and then consecutively call join on the resulting processes.
This pattern breaks down in the face of errors. If one of the
processes terminates with an exception or via some signal, and it is
not the first process that was launched, the join call on the first
process won't be affected. This helper seeks to solve this by waiting
on termination from any of the spawned processes. When any process
terminates with a non-zero exit status, it terminates the remaining
processes, and raises an exception in the parent process. If the
process terminated with an exception, it is propagated to the parent.
If the process terminated via a signal (e.g. SIGINT, SIGSEGV), this is
mentioned in the exception as well.
Requires Python >= 3.4.