Skip to content

Conversation

@guangyey
Copy link
Collaborator

@guangyey guangyey commented Aug 30, 2024

Stack from ghstack (oldest at bottom):

Motivation

This PR intends to make device-specific Event inherit from the generic torch.Event. The benefit is providing a generic abstract class torch.Event for different devices, like torch.Stream. This make it easier for Dynamo to capture the Event of different devices, like torch.cuda.Event and torch.xpu.Event.
And the next PR would like to remove previous useless base class _StreamBase and _EventBase to avoid multiple Inheritance.

cc @jgong5 @mingfeima @XiaobingSuper @sanchitintel @ashokei @jingxu10

@pytorch-bot
Copy link

pytorch-bot bot commented Aug 30, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/134845

Note: Links to docs will display an error until the docs builds have been completed.

❌ 1 New Failure, 1 Unrelated Failure

As of commit fd6dac2 with merge base ad8fae2 (image):

NEW FAILURE - The following job has failed:

BROKEN TRUNK - The following job failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@guangyey guangyey added ciflow/xpu Run XPU CI tasks ciflow/rocm Trigger "default" config CI on ROCm topic: not user facing topic category labels Aug 30, 2024
@guangyey guangyey added the intel This tag is for PR from Intel label Aug 30, 2024
@guangyey guangyey changed the title make device-specific event inherits from torch.Event Make device-specific event inherits from torch.Event Aug 30, 2024
@guangyey guangyey requested a review from albanD August 30, 2024 12:46
@albanD
Copy link
Collaborator

albanD commented Aug 30, 2024

Why did you change torch._C._CudaEventBase and not torch.cuda.Event ? Is there a significant different between inheriting for one or the other?

[ghstack-poisoned]
[ghstack-poisoned]
@guangyey
Copy link
Collaborator Author

guangyey commented Sep 3, 2024

Why did you change torch._C._CudaEventBase and not torch.cuda.Event ? Is there a significant different between inheriting for one or the other?

In the previous design, torch.cuda.Event inherits torch._C._CudaEventBase and torch._EventBase. And torch._EventBase becomes the common parent class of each backend Event, like torch.cuda.Event and torch.xpu.Event. We can illustrate them below.
image
In this design, torch.xxx.Event has two parents class that have duplicated methods. Multiple inheritance can complicate the class hierarchy, making the code redundant and complex.
With this series of PRs, we intend to simply the inheritance hierarchy, which is easier to understand and maintain. In this PR, we make torch._C._CudaEventBase and torch._C._XpuEventBase inherits from torch.Event, which is inspired by the design of torch.Stream and torch.xxx.Stream (refer to

struct THCPStream : THPStream {
). And in the follow-up PR #134850, we trim the inheritance hierarchy to be single inheritance as below.
image
The new design leads to a simpler class hierarchy and avoids duplicated methods.

For torch._StreamBase, it is originally an unnecessary parent class. It results in a complicated inheritance shown below.
image
After removing it, the clear and concise version happens as below.
image
So, these series of PRs aim to simplify the inheritance hierarchy, avoid unnecessary inheritance, and facilitate the unified Stream and Event that can be easily captured by Dynamo.

@albanD
Copy link
Collaborator

albanD commented Sep 3, 2024

Thanks a lot for the clear explanation!
The gist I hear is that we want torch.Event to be the actual base class for everything and keep the hierarchy linear (no branching).

My guess is that the long term vision is to remove torch.xxx.Stream altogether and so this is only a cleanup for the temporary state?

Copy link
Collaborator

@albanD albanD left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comments to the cuda code apply to the xpu side as well.

PyObject_HEAD c10::Event event;
};
extern PyObject* THPEventClass;
TORCH_API extern PyTypeObject* THPEventClass;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You add this to our public API because you need it on a third party c++ code?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think the out-of-tree backend needs this to support linear inheritance.

};

void THCPEvent_init(PyObject* module) {
Py_INCREF(THPEventClass);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please assert that it is non-null here. In case someone changes the init order and this has not been set yet.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea!
Updated.


struct THCPEvent {
struct THCPEvent : THPEvent {
PyObject_HEAD at::cuda::CUDAEvent cuda_event;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you actually need the PyObject_HEAD after subclassing here?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not necessary. Removed.

@guangyey guangyey requested a review from albanD September 4, 2024 02:37
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
@guangyey
Copy link
Collaborator Author

guangyey commented Sep 5, 2024

Thanks a lot for the clear explanation! The gist I hear is that we want torch.Event to be the actual base class for everything and keep the hierarchy linear (no branching).

My guess is that the long term vision is to remove torch.xxx.Stream altogether and so this is only a cleanup for the temporary state?

I don't ensure if torch.xxx.Stream should be removed in the long term. Maybe they can co-exist with torch.Stream due to the device-specific method/property could be firstly placed in torch.xxx.Stream. Anyway, this PR is indeed a cleanup making inheritance hierarchy linear and simpiler.

@guangyey
Copy link
Collaborator Author

guangyey commented Sep 9, 2024

@albanD, may I know if this series of PR is reasonable for you?

@guangyey
Copy link
Collaborator Author

@albanD May I know if there are any other comments for these two PRs?

Copy link
Collaborator

@albanD albanD left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good.
Might be good to add a small test to ensure the inheritence is exactly what we expect it to be if we plan on relying on it.
Can be done in a follow up if needed.

@gujinghui
Copy link
Collaborator

gujinghui commented Sep 24, 2024

Sounds good. Might be good to add a small test to ensure the inheritence is exactly what we expect it to be if we plan on relying on it. Can be done in a follow up if needed.

@guangyey let's add case to check the inheritance.

@guangyey
Copy link
Collaborator Author

@albanD @gujinghui Add some UTs to check the expected inheritance.

@pytorchmergebot
Copy link
Collaborator

[ghstack-poisoned]
[ghstack-poisoned]
@guangyey
Copy link
Collaborator Author

"Unrelated failures"
@pytorchbot merge -i

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged while ignoring the following 18 checks: pull / linux-focal-cuda12.1-py3.10-gcc9-sm86 / test (default, 1, 5, lf.linux.g5.4xlarge.nvidia.gpu), pull / linux-focal-cuda12.1-py3.10-gcc9-sm86 / test (default, 2, 5, lf.linux.g5.4xlarge.nvidia.gpu), pull / linux-focal-cuda12.1-py3.10-gcc9-sm86 / test (default, 3, 5, lf.linux.g5.4xlarge.nvidia.gpu), pull / linux-focal-cuda12.1-py3.10-gcc9-sm86 / test (default, 4, 5, lf.linux.g5.4xlarge.nvidia.gpu), pull / linux-focal-cuda12.1-py3.10-gcc9-sm86 / test (default, 5, 5, lf.linux.g5.4xlarge.nvidia.gpu), xpu / linux-jammy-xpu-py3.9 / test (default, 1, 4, linux.idc.xpu), xpu / linux-jammy-xpu-py3.9 / test (default, 2, 4, linux.idc.xpu), xpu / linux-jammy-xpu-py3.9 / test (default, 3, 4, linux.idc.xpu), xpu / linux-jammy-xpu-py3.9 / test (default, 4, 4, linux.idc.xpu), trunk / linux-focal-rocm6.2-py3.10 / test (default, 1, 2, linux.rocm.gpu), trunk / linux-focal-cuda12.4-py3.10-gcc9-sm86 / test (default, 1, 5, lf.linux.g5.4xlarge.nvidia.gpu), trunk / linux-focal-cuda12.4-py3.10-gcc9-sm86 / test (default, 2, 5, lf.linux.g5.4xlarge.nvidia.gpu), trunk / linux-focal-cuda12.4-py3.10-gcc9-sm86 / test (default, 3, 5, lf.linux.g5.4xlarge.nvidia.gpu), trunk / linux-focal-cuda12.4-py3.10-gcc9-sm86 / test (default, 4, 5, lf.linux.g5.4xlarge.nvidia.gpu), trunk / linux-focal-cuda12.4-py3.10-gcc9-sm86 / test (default, 5, 5, lf.linux.g5.4xlarge.nvidia.gpu), rocm / linux-focal-rocm6.2-py3.10 / test (default, 1, 6, linux.rocm.gpu.2), rocm / linux-focal-rocm6.2-py3.10 / test (default, 3, 6, linux.rocm.gpu.2), rocm / linux-focal-rocm6.2-py3.10 / test (default, 4, 6, linux.rocm.gpu.2)

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

[ghstack-poisoned]
@guangyey
Copy link
Collaborator Author

"Unrelated failures"
@pytorchbot merge -i

@pytorchmergebot
Copy link
Collaborator

Merge failed

Reason: Not merging any PRs at the moment because there is a merge blocking https://github.com/pytorch/pytorch/labels/ci:%20sev issue open at:
#136928

Details for Dev Infra team Raised by workflow job

[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
@cyyever
Copy link
Collaborator

cyyever commented Oct 1, 2024

@pytorchbot merge -f "Unrelated failures"

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Please use -f as last resort and instead consider -i/--ignore-current to continue the merge ignoring current failures. This will allow currently pending tests to finish and report signal before the merge.

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

AnantGulati pushed a commit to AnantGulati/pytorch that referenced this pull request Oct 2, 2024
# Motivation
This PR intends to make device-specific Event inherit from the generic torch.Event. The benefit is providing a generic abstract class `torch.Event` for different devices, like `torch.Stream`. This make it easier for Dynamo to capture the Event of different devices, like torch.cuda.Event and torch.xpu.Event.
And the next PR would like to remove previous useless base class `_StreamBase` and `_EventBase` to avoid multiple Inheritance.

Pull Request resolved: pytorch#134845
Approved by: https://github.com/albanD, https://github.com/EikanWang
rahulsingh-intel pushed a commit to rahulsingh-intel/pytorch that referenced this pull request Oct 17, 2024
This PR intends to make device-specific Event inherit from the generic torch.Event. The benefit is providing a generic abstract class `torch.Event` for different devices, like `torch.Stream`. This make it easier for Dynamo to capture the Event of different devices, like torch.cuda.Event and torch.xpu.Event.
And the next PR would like to remove previous useless base class `_StreamBase` and `_EventBase` to avoid multiple Inheritance.

Pull Request resolved: pytorch#134845
Approved by: https://github.com/albanD, https://github.com/EikanWang
@github-actions github-actions bot deleted the gh/guangyey/66/head branch November 3, 2024 02:13
rahulsingh-intel pushed a commit to rahulsingh-intel/pytorch that referenced this pull request Dec 10, 2024
This PR intends to make device-specific Event inherit from the generic torch.Event. The benefit is providing a generic abstract class `torch.Event` for different devices, like `torch.Stream`. This make it easier for Dynamo to capture the Event of different devices, like torch.cuda.Event and torch.xpu.Event.
And the next PR would like to remove previous useless base class `_StreamBase` and `_EventBase` to avoid multiple Inheritance.

Pull Request resolved: pytorch#134845
Approved by: https://github.com/albanD, https://github.com/EikanWang
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/rocm Trigger "default" config CI on ROCm ciflow/trunk Trigger trunk jobs on your pull request ciflow/xpu Run XPU CI tasks intel This tag is for PR from Intel Merged open source topic: not user facing topic category

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

8 participants