Creates ATen CUDAEvent #9726

mruberry · 2018-07-23T20:48:40Z

This PR:

Creates an ATen version of CUDAEvent
Adds set_device() to ATen's CUDAContext
Extends ATen's cpp tests to exercise the new CUDAEvent

The design of CUDAEvent is based on the CUDAEvent class first proposed for #8354 (incorporating @apaszke's suggestions), with additional requirements taken from the c10d CUDAEvent (incorporating #9415, for example, @pietern's request). The key differences are:

Recording and blocking are done with the CUDAEvent itself, and the behavior is validated.
recordOnce() is added to support scenarios like Updates autograd engine to respect streams set in forward #8354 requires.
Creation is deferred until the event is recorded, and the event's device matches the stream's. This eliminates the bookkeeping complexity of pre-declaring events to be suitable to be associated with a
particular stream later.
The actual events are created with the cudaEventDisableTiming flag, which improves performance.

This PR does not:

Replace the c10d CUDAEvent with this CUDAEvent.

A separate PR will have to resolve the c10d vs ATen divergence of CUDAEvent, CUDAStream, and DeviceGuard.

mruberry · 2018-07-23T22:48:14Z

The Windows failure appears unrelated and is happening for other PRs. It looks like the Windows build is broken.

mruberry · 2018-09-05T20:24:55Z

Closing in favor of PR #11293.

Summary: After submitting PR #9726, PR #10581 created a different CUDAEvent class. The CUDAEvent proposed in #9726 was similar to the c10d::CUDAEvent class with additional testing and functionality. In particular, it was movable but not copyable. The CUDAEvent created by #10581 is refcounted and copyable. This PR retains the refcounting of the latter PR while fixing several bugs, adding tests, and extending the functionality to support testing and usage like in PR #8354. In particular, this PR: - Adds set_device() to CUDAContext - Adds three CUDAEvent tests to stream_test.cpp - Fixes three bugs: - Refcounting was broken. Destroying an of the RAIIs holding a particular CUDAEvent would destroy the event UNLESS it was the last RAII (the check was backwards). - Moving an event would cause a segfault. - Events were not destroyed on the device they were created on. See PR #9415 (pietern) - Adds the happened() and recordOnce() functions - Changes the record() functions to not be const - Adds additional assertions to verify correctness This PR does not: - Make c10d use the ATen CUDAEvent (this is appropriate for a separate PR) Whether events should be refcounted is an interesting question. It adds some atomic operations and makes event creation eager. Making events movable but not copyable (like the c10d events) avoids these costs and allows events to be lazily constructed. Lazy construction is preferable when working with containers (like std::array or std::vector) and because the event's device can be set automatically to the first stream it's recorded on. With eager construction the user is required to understand that events have a device and acquire the device of the stream the event will be recorded on upfront. This can be seen here: https://github.com/pytorch/pytorch/blob/542aadd9a7609892e207c1e15de08a975b697752/aten/src/ATen/native/cudnn/RNN.cpp#L1130-L1132 and that file is the only one which currently uses the ATen CUDAEvent. Refcounting does allow single writer multi-reader scenarios, although these scenarios can be also be supported by providing indirect access to the underlying CUDAEvent. I believe all current and planned usage scenarios do not require refcounting, and if desired I can update this PR to remove refcounting and make the ATen event movable but not copyable like the c10d event. I think not refcounting is preferable because it can improve performance, ease usability, and simplify the code (as seen with two of the above bugs). I have decided to separate this from PR #8354 since while it's required for PR #8354 the changes are, clearly, of independent interest. PR #8354 has a new dependency on this one, however. I am closing PR #9726 in favor of this PR. apaszke ezyang pietern Pull Request resolved: #11293 Differential Revision: D9665836 Pulled By: soumith fbshipit-source-id: a1513fa4f9761e2f304d126e402f6b6950e1c1d2

Summary: After submitting PR pytorch#9726, PR pytorch#10581 created a different CUDAEvent class. The CUDAEvent proposed in pytorch#9726 was similar to the c10d::CUDAEvent class with additional testing and functionality. In particular, it was movable but not copyable. The CUDAEvent created by pytorch#10581 is refcounted and copyable. This PR retains the refcounting of the latter PR while fixing several bugs, adding tests, and extending the functionality to support testing and usage like in PR pytorch#8354. In particular, this PR: - Adds set_device() to CUDAContext - Adds three CUDAEvent tests to stream_test.cpp - Fixes three bugs: - Refcounting was broken. Destroying an of the RAIIs holding a particular CUDAEvent would destroy the event UNLESS it was the last RAII (the check was backwards). - Moving an event would cause a segfault. - Events were not destroyed on the device they were created on. See PR pytorch#9415 (pietern) - Adds the happened() and recordOnce() functions - Changes the record() functions to not be const - Adds additional assertions to verify correctness This PR does not: - Make c10d use the ATen CUDAEvent (this is appropriate for a separate PR) Whether events should be refcounted is an interesting question. It adds some atomic operations and makes event creation eager. Making events movable but not copyable (like the c10d events) avoids these costs and allows events to be lazily constructed. Lazy construction is preferable when working with containers (like std::array or std::vector) and because the event's device can be set automatically to the first stream it's recorded on. With eager construction the user is required to understand that events have a device and acquire the device of the stream the event will be recorded on upfront. This can be seen here: https://github.com/pytorch/pytorch/blob/542aadd9a7609892e207c1e15de08a975b697752/aten/src/ATen/native/cudnn/RNN.cpp#L1130-L1132 and that file is the only one which currently uses the ATen CUDAEvent. Refcounting does allow single writer multi-reader scenarios, although these scenarios can be also be supported by providing indirect access to the underlying CUDAEvent. I believe all current and planned usage scenarios do not require refcounting, and if desired I can update this PR to remove refcounting and make the ATen event movable but not copyable like the c10d event. I think not refcounting is preferable because it can improve performance, ease usability, and simplify the code (as seen with two of the above bugs). I have decided to separate this from PR pytorch#8354 since while it's required for PR pytorch#8354 the changes are, clearly, of independent interest. PR pytorch#8354 has a new dependency on this one, however. I am closing PR pytorch#9726 in favor of this PR. apaszke ezyang pietern Pull Request resolved: pytorch#11293 Differential Revision: D9665836 Pulled By: soumith fbshipit-source-id: a1513fa4f9761e2f304d126e402f6b6950e1c1d2

Creates ATen CUDAEvent

d2cc54c

mruberry requested review from apaszke, colesbury, ezyang, gchanan, soumith and zdevito as code owners July 23, 2018 20:48

mruberry added 2 commits July 23, 2018 14:02

Removes checked call in destructor

2a88733

Windows build fix

aa0e583

li-roy assigned li-roy and unassigned li-roy Jul 24, 2018

li-roy added the ready for review (this tag is deprecated) All PRs are ready for review unless they are draft, WIP, or have undismissed requested changes label Jul 24, 2018

soumith mentioned this pull request Aug 24, 2018

Migrate PyTorch to C++ bindings horovod/horovod#458

Merged

mruberry mentioned this pull request Sep 5, 2018

Improves ATen CUDAEvent #11293

Closed

mruberry closed this Sep 5, 2018

mruberry deleted the cuda_event branch September 25, 2018 16:39

ezyang added the open source label Jun 24, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Creates ATen CUDAEvent #9726

Creates ATen CUDAEvent #9726

Uh oh!

mruberry commented Jul 23, 2018

Uh oh!

mruberry commented Jul 23, 2018

Uh oh!

mruberry commented Sep 5, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Creates ATen CUDAEvent #9726

Creates ATen CUDAEvent #9726

Uh oh!

Conversation

mruberry commented Jul 23, 2018

Uh oh!

mruberry commented Jul 23, 2018

Uh oh!

mruberry commented Sep 5, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants