Skip to content

Conversation

@jikunshang
Copy link
Collaborator

@jikunshang jikunshang commented Aug 27, 2025

Purpose

#22760 introduced torch.cuda.Event in GPUModelRunner init method. this cause some compatible issue in xpu. this PR fix it by adding a api wrapper.

Test Plan

CI.

Test Result


Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request aims to fix a compatibility issue with torch.cuda.Event on XPU devices by introducing a _torch_cuda_wrapper context manager to monkey-patch torch.cuda.Event during the initialization of XPUModelRunner.

While the approach of using a context manager to temporarily patch the torch.cuda.Event is sound, the current implementation has a critical flaw. It permanently replaces torch.cuda.Event with a placeholder, which can cause side effects across the application. My review includes a critical comment with a suggested fix to properly restore the original torch.cuda.Event after use, ensuring the change is safe and contained.

Comment on lines +41 to +53
class _EventPlaceholder:

def __init__(self, *args, **kwargs) -> None:
self.record = lambda: None
self.synchronize = lambda: None

try:
# replace cuda Event with xpu Event, this should work by default
torch.cuda.Event = torch.xpu.Event
yield
finally:
# if anything goes wrong, just patch it with a placeholder
torch.cuda.Event = _EventPlaceholder
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

The current implementation of _torch_cuda_wrapper permanently alters the global torch.cuda.Event by setting it to _EventPlaceholder in the finally block. This is a dangerous side effect that can break functionality in other parts of the codebase that rely on the original torch.cuda.Event, especially in mixed-device environments or if other operations use torch.cuda.Event after this runner is initialized.

The finally block should restore the original state of torch.cuda.Event instead of replacing it with a placeholder. This ensures that the monkey-patching is contained within the with statement and does not leak. The _EventPlaceholder class is also no longer needed with this change.

Suggested change
class _EventPlaceholder:
def __init__(self, *args, **kwargs) -> None:
self.record = lambda: None
self.synchronize = lambda: None
try:
# replace cuda Event with xpu Event, this should work by default
torch.cuda.Event = torch.xpu.Event
yield
finally:
# if anything goes wrong, just patch it with a placeholder
torch.cuda.Event = _EventPlaceholder
original_event = getattr(torch.cuda, "Event", None)
try:
# replace cuda Event with xpu Event, this should work by default
torch.cuda.Event = torch.xpu.Event
yield
finally:
# Restore the original torch.cuda.Event.
if original_event is not None:
torch.cuda.Event = original_event
elif hasattr(torch.cuda, "Event"):
# If it didn't exist before, remove our monkey-patched attribute.
delattr(torch.cuda, "Event")

@bigPYJ1151 bigPYJ1151 enabled auto-merge (squash) August 27, 2025 05:01
@github-actions github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Aug 27, 2025
@bigPYJ1151 bigPYJ1151 merged commit 6446677 into vllm-project:main Aug 27, 2025
47 checks passed
Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>
epwalsh pushed a commit to epwalsh/vllm that referenced this pull request Aug 28, 2025
Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>
xiao-llm pushed a commit to xiao-llm/vllm that referenced this pull request Aug 28, 2025
Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>
Signed-off-by: Xiao Yu <xiao.yu@amd.com>
zhewenl pushed a commit to zhewenl/vllm that referenced this pull request Aug 28, 2025
Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>
zhewenl pushed a commit to zhewenl/vllm that referenced this pull request Sep 3, 2025
Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>
FeiDaLI pushed a commit to FeiDaLI/vllm that referenced this pull request Sep 25, 2025
Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready ONLY add when PR is ready to merge/full CI is needed v1

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants