Skip to content

Conversation

@qthequartermasterman
Copy link
Contributor

@qthequartermasterman qthequartermasterman commented Sep 5, 2025

Purpose

As requested by @DarkLight1337 in #24278 (comment), this PR refactors GPUModelRunner.inputs_embeds to use CpuGpuBuffer to reduce the total diff in #24278.

Test Plan

All relevant unit tests are passing locally for me. Pending CI, it should be fine. This change works in the full context of #24278.

Test Result

Pending CI.


Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Signed-off-by: Andrew Sansom <andrew@protopia.ai>
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors GPUModelRunner.inputs_embeds to use the CpuGpuBuffer class. This is achieved by extending CpuGpuBuffer to optionally disable the creation of a NumPy view, which is necessary for bfloat16 tensors. The changes are consistent and correctly adapt the usage of inputs_embeds throughout GPUModelRunner. My main feedback is on improving the robustness of the CpuGpuBuffer class to prevent potential runtime errors.

@qthequartermasterman
Copy link
Contributor Author

@DarkLight1337

@DarkLight1337 DarkLight1337 enabled auto-merge (squash) September 6, 2025 02:01
@github-actions github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Sep 6, 2025
@vllm-bot vllm-bot merged commit 305a1cc into vllm-project:main Sep 6, 2025
46 of 48 checks passed
eicherseiji pushed a commit to eicherseiji/vllm that referenced this pull request Sep 9, 2025
skyloevil pushed a commit to skyloevil/vllm that referenced this pull request Sep 13, 2025
FeiDaLI pushed a commit to FeiDaLI/vllm that referenced this pull request Sep 25, 2025
xuebwang-amd pushed a commit to xuebwang-amd/vllm that referenced this pull request Oct 10, 2025
…roject#24345)

Signed-off-by: Andrew Sansom <andrew@protopia.ai>
Signed-off-by: xuebwang-amd <xuebwang@amd.com>
xuebwang-amd pushed a commit to xuebwang-amd/vllm that referenced this pull request Oct 24, 2025
…roject#24345)

Signed-off-by: Andrew Sansom <andrew@protopia.ai>
Signed-off-by: xuebwang-amd <xuebwang@amd.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready ONLY add when PR is ready to merge/full CI is needed v1

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants