[XPU]fix cuda event used in XPU model runner #23708

jikunshang · 2025-08-27T05:00:09Z

Purpose

#22760 introduced torch.cuda.Event in GPUModelRunner init method. this cause some compatible issue in xpu. this PR fix it by adding a api wrapper.

Test Plan

CI.

Test Result

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

gemini-code-assist

Code Review

This pull request aims to fix a compatibility issue with torch.cuda.Event on XPU devices by introducing a _torch_cuda_wrapper context manager to monkey-patch torch.cuda.Event during the initialization of XPUModelRunner.

While the approach of using a context manager to temporarily patch the torch.cuda.Event is sound, the current implementation has a critical flaw. It permanently replaces torch.cuda.Event with a placeholder, which can cause side effects across the application. My review includes a critical comment with a suggested fix to properly restore the original torch.cuda.Event after use, ensuring the change is safe and contained.

gemini-code-assist · 2025-08-27T05:01:29Z

vllm/v1/worker/xpu_model_runner.py

+    class _EventPlaceholder:
+
+        def __init__(self, *args, **kwargs) -> None:
+            self.record = lambda: None
+            self.synchronize = lambda: None
+
+    try:
+        # replace cuda Event with xpu Event, this should work by default
+        torch.cuda.Event = torch.xpu.Event
+        yield
+    finally:
+        # if anything goes wrong, just patch it with a placeholder
+        torch.cuda.Event = _EventPlaceholder


The current implementation of _torch_cuda_wrapper permanently alters the global torch.cuda.Event by setting it to _EventPlaceholder in the finally block. This is a dangerous side effect that can break functionality in other parts of the codebase that rely on the original torch.cuda.Event, especially in mixed-device environments or if other operations use torch.cuda.Event after this runner is initialized.

The finally block should restore the original state of torch.cuda.Event instead of replacing it with a placeholder. This ensures that the monkey-patching is contained within the with statement and does not leak. The _EventPlaceholder class is also no longer needed with this change.

Suggested change

class _EventPlaceholder:

def __init__(self, *args, **kwargs) -> None:

self.record = lambda: None

self.synchronize = lambda: None

try:

# replace cuda Event with xpu Event, this should work by default

torch.cuda.Event = torch.xpu.Event

yield

finally:

# if anything goes wrong, just patch it with a placeholder

torch.cuda.Event = _EventPlaceholder

original_event = getattr(torch.cuda, "Event", None)

try:

# replace cuda Event with xpu Event, this should work by default

torch.cuda.Event = torch.xpu.Event

yield

finally:

# Restore the original torch.cuda.Event.

if original_event is not None:

torch.cuda.Event = original_event

elif hasattr(torch.cuda, "Event"):

# If it didn't exist before, remove our monkey-patched attribute.

delattr(torch.cuda, "Event")

Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>

Signed-off-by: Kunshang Ji <kunshang.ji@intel.com> Signed-off-by: Xiao Yu <xiao.yu@amd.com>

Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>

jikunshang requested review from WoosukKwon, alexm-redhat, comaniac, njhill, robertgshaw2-redhat and ywang96 as code owners August 27, 2025 05:00

mergify bot added the v1 label Aug 27, 2025

gemini-code-assist bot reviewed Aug 27, 2025

View reviewed changes

bigPYJ1151 approved these changes Aug 27, 2025

View reviewed changes

bigPYJ1151 enabled auto-merge (squash) August 27, 2025 05:01

github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Aug 27, 2025

bigPYJ1151 merged commit 6446677 into vllm-project:main Aug 27, 2025
47 checks passed

fix cuda event used in xpu model runner

f2b3e26

Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>

epwalsh pushed a commit to epwalsh/vllm that referenced this pull request Aug 28, 2025

[XPU]fix cuda event used in XPU model runner (vllm-project#23708)

5b8364d

Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>

xiao-llm pushed a commit to xiao-llm/vllm that referenced this pull request Aug 28, 2025

[XPU]fix cuda event used in XPU model runner (vllm-project#23708)

b3ebc11

Signed-off-by: Kunshang Ji <kunshang.ji@intel.com> Signed-off-by: Xiao Yu <xiao.yu@amd.com>

zhewenl pushed a commit to zhewenl/vllm that referenced this pull request Aug 28, 2025

[XPU]fix cuda event used in XPU model runner (vllm-project#23708)

f1b97c6

Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>

zhewenl pushed a commit to zhewenl/vllm that referenced this pull request Sep 3, 2025

[XPU]fix cuda event used in XPU model runner (vllm-project#23708)

5a7c5aa

Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>

FeiDaLI pushed a commit to FeiDaLI/vllm that referenced this pull request Sep 25, 2025

[XPU]fix cuda event used in XPU model runner (vllm-project#23708)

7b02d1e

Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[XPU]fix cuda event used in XPU model runner #23708

[XPU]fix cuda event used in XPU model runner #23708

Uh oh!

jikunshang commented Aug 27, 2025 •

edited by github-actions bot

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Aug 27, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

[XPU]fix cuda event used in XPU model runner #23708

[XPU]fix cuda event used in XPU model runner #23708

Uh oh!

Conversation

jikunshang commented Aug 27, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Aug 27, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

jikunshang commented Aug 27, 2025 •

edited by github-actions bot

Loading