Make device-specific event inherits from torch.Event #134845

guangyey · 2024-08-30T08:39:45Z

Stack from ghstack (oldest at bottom):

Motivation

This PR intends to make device-specific Event inherit from the generic torch.Event. The benefit is providing a generic abstract class torch.Event for different devices, like torch.Stream. This make it easier for Dynamo to capture the Event of different devices, like torch.cuda.Event and torch.xpu.Event.
And the next PR would like to remove previous useless base class _StreamBase and _EventBase to avoid multiple Inheritance.

cc @jgong5 @mingfeima @XiaobingSuper @sanchitintel @ashokei @jingxu10

pytorch-bot · 2024-08-30T08:39:47Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/134845

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

❌ 1 New Failure, 1 Unrelated Failure

As of commit fd6dac2 with merge base ad8fae2 ():

NEW FAILURE - The following job has failed:

Lint / Test run_test.py is usable without boto3/rockset (gh)
ERROR: Could not find a version that satisfies the requirement torch (from versions: none)

BROKEN TRUNK - The following job failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

trunk / linux-focal-cuda12.4-py3.10-gcc9-experimental-split-build-test / test (nogpu_NO_AVX2, 2, 2, lf.linux.2xlarge) (gh) (trunk failure)
inductor/test_extension_backend.py::ExtensionBackendTests::test_open_device_registration

This comment was automatically generated by Dr. CI and updates every 15 minutes.

albanD · 2024-08-30T13:29:18Z

Why did you change torch._C._CudaEventBase and not torch.cuda.Event ? Is there a significant different between inheriting for one or the other?

[ghstack-poisoned]

guangyey · 2024-09-03T10:16:40Z

Why did you change torch._C._CudaEventBase and not torch.cuda.Event ? Is there a significant different between inheriting for one or the other?

In the previous design, torch.cuda.Event inherits torch._C._CudaEventBase and torch._EventBase. And torch._EventBase becomes the common parent class of each backend Event, like torch.cuda.Event and torch.xpu.Event. We can illustrate them below.

In this design, torch.xxx.Event has two parents class that have duplicated methods. Multiple inheritance can complicate the class hierarchy, making the code redundant and complex.
With this series of PRs, we intend to simply the inheritance hierarchy, which is easier to understand and maintain. In this PR, we make torch._C._CudaEventBase and torch._C._XpuEventBase inherits from torch.Event, which is inspired by the design of torch.Stream and torch.xxx.Stream (refer to

pytorch/torch/csrc/cuda/Stream.h

Line 9 in 23a2161

struct THCPStream : THPStream {

). And in the follow-up PR #134850, we trim the inheritance hierarchy to be single inheritance as below.

The new design leads to a simpler class hierarchy and avoids duplicated methods.

For torch._StreamBase, it is originally an unnecessary parent class. It results in a complicated inheritance shown below.

After removing it, the clear and concise version happens as below.

So, these series of PRs aim to simplify the inheritance hierarchy, avoid unnecessary inheritance, and facilitate the unified Stream and Event that can be easily captured by Dynamo.

albanD · 2024-09-03T18:52:51Z

Thanks a lot for the clear explanation!
The gist I hear is that we want torch.Event to be the actual base class for everything and keep the hierarchy linear (no branching).

My guess is that the long term vision is to remove torch.xxx.Stream altogether and so this is only a cleanup for the temporary state?

albanD

Comments to the cuda code apply to the xpu side as well.

albanD · 2024-09-03T18:54:23Z

torch/csrc/Event.h

  PyObject_HEAD c10::Event event;
 };
-extern PyObject* THPEventClass;
+TORCH_API extern PyTypeObject* THPEventClass;


You add this to our public API because you need it on a third party c++ code?

Yes, I think the out-of-tree backend needs this to support linear inheritance.

albanD · 2024-09-03T18:55:28Z

torch/csrc/cuda/Event.cpp

 };

 void THCPEvent_init(PyObject* module) {
+  Py_INCREF(THPEventClass);


Please assert that it is non-null here. In case someone changes the init order and this has not been set yet.

Good idea!
Updated.

albanD · 2024-09-03T18:55:56Z

torch/csrc/cuda/Event.h


-struct THCPEvent {
+struct THCPEvent : THPEvent {
  PyObject_HEAD at::cuda::CUDAEvent cuda_event;


Do you actually need the PyObject_HEAD after subclassing here?

It is not necessary. Removed.

[ghstack-poisoned]

guangyey · 2024-09-05T03:10:17Z

Thanks a lot for the clear explanation! The gist I hear is that we want torch.Event to be the actual base class for everything and keep the hierarchy linear (no branching).

My guess is that the long term vision is to remove torch.xxx.Stream altogether and so this is only a cleanup for the temporary state?

I don't ensure if torch.xxx.Stream should be removed in the long term. Maybe they can co-exist with torch.Stream due to the device-specific method/property could be firstly placed in torch.xxx.Stream. Anyway, this PR is indeed a cleanup making inheritance hierarchy linear and simpiler.

guangyey · 2024-09-09T03:33:37Z

@albanD, may I know if this series of PR is reasonable for you?

guangyey · 2024-09-23T10:14:08Z

@albanD May I know if there are any other comments for these two PRs?

albanD

Sounds good.
Might be good to add a small test to ensure the inheritence is exactly what we expect it to be if we plan on relying on it.
Can be done in a follow up if needed.

gujinghui · 2024-09-24T01:52:02Z

Sounds good. Might be good to add a small test to ensure the inheritence is exactly what we expect it to be if we plan on relying on it. Can be done in a follow up if needed.

@guangyey let's add case to check the inheritance.

guangyey · 2024-09-24T06:13:37Z

@albanD @gujinghui Add some UTs to check the expected inheritance.

pytorchmergebot · 2024-09-27T06:02:11Z

Merge failed

Reason: 4 jobs have failed, first few of them are: xpu / linux-jammy-xpu-py3.9 / test (default, 1, 4, linux.idc.xpu), xpu / linux-jammy-xpu-py3.9 / test (default, 2, 4, linux.idc.xpu), xpu / linux-jammy-xpu-py3.9 / test (default, 3, 4, linux.idc.xpu), xpu / linux-jammy-xpu-py3.9 / test (default, 4, 4, linux.idc.xpu)

Details for Dev Infra team

Raised by workflow job

[ghstack-poisoned]

guangyey · 2024-09-28T03:01:00Z

"Unrelated failures"
@pytorchbot merge -i

pytorchmergebot · 2024-09-28T03:02:54Z

Merge started

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

pytorchmergebot · 2024-09-28T03:03:06Z

Merge failed

Reason: 14 jobs have failed, first few of them are: pull / linux-focal-cuda12.1-py3.10-gcc9-sm86 / test (default, 2, 5, lf.linux.g5.4xlarge.nvidia.gpu), pull / linux-focal-cuda12.1-py3.10-gcc9-sm86 / test (default, 3, 5, lf.linux.g5.4xlarge.nvidia.gpu), pull / linux-focal-cuda12.1-py3.10-gcc9-sm86 / test (default, 4, 5, lf.linux.g5.4xlarge.nvidia.gpu), pull / linux-focal-cuda12.1-py3.10-gcc9-sm86 / test (default, 5, 5, lf.linux.g5.4xlarge.nvidia.gpu), trunk / linux-focal-rocm6.2-py3.10 / test (default, 1, 2, linux.rocm.gpu)

Details for Dev Infra team

Raised by workflow job

[ghstack-poisoned]

guangyey · 2024-09-29T04:01:30Z

"Unrelated failures"
@pytorchbot merge -i

pytorchmergebot · 2024-09-29T04:03:01Z

Merge failed

Reason: Not merging any PRs at the moment because there is a merge blocking https://github.com/pytorch/pytorch/labels/ci:%20sev issue open at:
#136928

Details for Dev Infra team

Raised by workflow job

[ghstack-poisoned]

cyyever · 2024-10-01T06:26:45Z

@pytorchbot merge -f "Unrelated failures"

pytorchmergebot · 2024-10-01T06:28:30Z

Merge started

Your change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Please use -f as last resort and instead consider -i/--ignore-current to continue the merge ignoring current failures. This will allow currently pending tests to finish and report signal before the merge.

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

# Motivation This PR intends to make device-specific Event inherit from the generic torch.Event. The benefit is providing a generic abstract class `torch.Event` for different devices, like `torch.Stream`. This make it easier for Dynamo to capture the Event of different devices, like torch.cuda.Event and torch.xpu.Event. And the next PR would like to remove previous useless base class `_StreamBase` and `_EventBase` to avoid multiple Inheritance. Pull Request resolved: pytorch#134845 Approved by: https://github.com/albanD, https://github.com/EikanWang

This PR intends to make device-specific Event inherit from the generic torch.Event. The benefit is providing a generic abstract class `torch.Event` for different devices, like `torch.Stream`. This make it easier for Dynamo to capture the Event of different devices, like torch.cuda.Event and torch.xpu.Event. And the next PR would like to remove previous useless base class `_StreamBase` and `_EventBase` to avoid multiple Inheritance. Pull Request resolved: pytorch#134845 Approved by: https://github.com/albanD, https://github.com/EikanWang

guangyey requested review from EikanWang, eqy, gujinghui and syed-ahmed as code owners August 30, 2024 08:39

guangyey added ciflow/xpu Run XPU CI tasks ciflow/rocm Trigger "default" config CI on ROCm topic: not user facing topic category labels Aug 30, 2024

pytorchbot added the open source label Aug 30, 2024

guangyey mentioned this pull request Aug 30, 2024

Use torch.Stream&torch.Event for Dynamo capature #134850

Closed

guangyey added the intel This tag is for PR from Intel label Aug 30, 2024

guangyey changed the title ~~make device-specific event inherits from torch.Event~~ Make device-specific event inherits from torch.Event Aug 30, 2024

guangyey requested a review from albanD August 30, 2024 12:46

guangyey added 2 commits August 30, 2024 16:22

Update

61b3923

[ghstack-poisoned]

Update

63f09e0

[ghstack-poisoned]

albanD reviewed Sep 3, 2024

View reviewed changes

guangyey requested a review from albanD September 4, 2024 02:37

guangyey added 3 commits September 4, 2024 10:00

Update

4c9a9ff

[ghstack-poisoned]

Update

8e9cd53

[ghstack-poisoned]

Update

53af6b2

[ghstack-poisoned]

albanD approved these changes Sep 23, 2024

View reviewed changes

pytorchmergebot removed the merging label Sep 27, 2024

guangyey added 2 commits September 27, 2024 10:56

Update

79ae129

[ghstack-poisoned]

Update

30362d9

[ghstack-poisoned]

pytorchmergebot added the merging label Sep 28, 2024

pytorchmergebot removed the merging label Sep 28, 2024

Update

5025b49

[ghstack-poisoned]

pytorchmergebot added the merging label Sep 29, 2024

pytorchmergebot removed the merging label Sep 29, 2024

guangyey added 3 commits September 29, 2024 08:55

Update

76838a9

[ghstack-poisoned]

Update

05f09d9

[ghstack-poisoned]

Update

fd6dac2

[ghstack-poisoned]

pytorchmergebot added the merging label Oct 1, 2024

pytorchmergebot added the Merged label Oct 1, 2024

pytorchmergebot closed this in df5bbc0 Oct 1, 2024

pytorchmergebot removed the merging label Oct 1, 2024

github-actions bot deleted the gh/guangyey/66/head branch November 3, 2024 02:13

Make device-specific event inherits from torch.Event #134845

Make device-specific event inherits from torch.Event #134845

Uh oh!

Conversation

guangyey commented Aug 30, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Uh oh!

pytorch-bot bot commented Aug 30, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/134845

❌ 1 New Failure, 1 Unrelated Failure

Uh oh!

albanD commented Aug 30, 2024

Uh oh!

guangyey commented Sep 3, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

albanD commented Sep 3, 2024

Uh oh!

albanD left a comment

Choose a reason for hiding this comment

Uh oh!

albanD Sep 3, 2024

Choose a reason for hiding this comment

Uh oh!

guangyey Sep 4, 2024

Choose a reason for hiding this comment

Uh oh!

albanD Sep 3, 2024

Choose a reason for hiding this comment

Uh oh!

guangyey Sep 4, 2024

Choose a reason for hiding this comment

Uh oh!

albanD Sep 3, 2024

Choose a reason for hiding this comment

Uh oh!

guangyey Sep 4, 2024

Choose a reason for hiding this comment

Uh oh!

guangyey commented Sep 5, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

guangyey commented Sep 9, 2024

Uh oh!

guangyey commented Sep 23, 2024

Uh oh!

albanD left a comment

Choose a reason for hiding this comment

Uh oh!

gujinghui commented Sep 24, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

guangyey commented Sep 24, 2024

Uh oh!

pytorchmergebot commented Sep 27, 2024

Merge failed

Uh oh!

guangyey commented Sep 28, 2024

Uh oh!

pytorchmergebot commented Sep 28, 2024

Merge started

Uh oh!

pytorchmergebot commented Sep 28, 2024

Merge failed

Uh oh!

guangyey commented Sep 29, 2024

Uh oh!

pytorchmergebot commented Sep 29, 2024

Merge failed

Uh oh!

cyyever commented Oct 1, 2024

Uh oh!

pytorchmergebot commented Oct 1, 2024

Merge started

Uh oh!

Reviewers

Assignees

guangyey commented Aug 30, 2024 •

edited

Loading

pytorch-bot bot commented Aug 30, 2024 •

edited

Loading

guangyey commented Sep 3, 2024 •

edited

Loading

guangyey commented Sep 5, 2024 •

edited

Loading

gujinghui commented Sep 24, 2024 •

edited

Loading