[Feature][P/D]: Optimize NIXL Connector xfer Launch #23887

david6666666 · 2025-08-29T02:09:33Z

Purpose

Test Plan

prefill instance:

CUDA_VISIBLE_DEVICES=0 VLLM_NIXL_SIDE_CHANNEL_PORT=5567 vllm serve /workspace/models/qwen2.5_7B \
  --port 20001 \
  --tensor-parallel-size 1 \
  --enforce-eager \
  --block-size 16 \
  --enable-log-requests \
  --kv-transfer-config '{"kv_connector":"NixlConnector","kv_role":"kv_both"}'

decode instance:

CUDA_VISIBLE_DEVICES=1 VLLM_NIXL_SIDE_CHANNEL_PORT=5667 viztracer -o decode.json vllm serve /workspace/models/qwen2.5_7B \
  --port 20002 \
  --tensor-parallel-size 1 \
  --enforce-eager \
  --block-size 16 \
  --enable-log-requests \
  --kv-transfer-config '{"kv_connector":"NixlConnector","kv_role":"kv_both"}'

proxy:

python ./tests/v1/kv_connector/nixl_integration/toy_proxy_server.py \
      --port 40000 \
      --prefiller-port 20001 \
      --decoder-port 20002

python benchmarks/benchmark_serving.py \
    --backend vllm \
    --endpoint /v1/completions \
    --model /workspace/models/qwen2.5_7B \
    --dataset-name random \
    --random-input 800 \
    --random-output 100 \
    --num-prompts 100 \
    --request-rate 5 \
    --host localhost \
    --port 40000 \

Test Result

origin:

after this pr:

_get_block_descs_ids time reduced from ~2ms to ~0.05ms

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

gemini-code-assist

Code Review

This pull request introduces a performance optimization in the NixlConnector by vectorizing the computation in the _get_block_descs_ids method. The change replaces nested Python loops with NumPy broadcasting operations, which significantly reduces the execution time for generating block descriptor IDs, as demonstrated by the performance results in the description. The implementation is correct and effectively leverages NumPy for better performance. The addition of the numpy import is necessary and appropriate for this change.

robertgshaw2-redhat · 2025-08-29T02:35:35Z

nice job

vllm/distributed/kv_transfer/kv_connector/v1/nixl_connector.py

Signed-off-by: ycyaw66 <497410282@qq.com>

david6666666 · 2025-09-03T07:18:02Z

hi @robertgshaw2-redhat, PTAL thanks.

robertgshaw2-redhat · 2025-09-03T17:18:12Z

really nice work

Signed-off-by: ycyaw66 <497410282@qq.com> Co-authored-by: ycyaw66 <497410282@qq.com>

gemini-code-assist bot reviewed Aug 29, 2025

View reviewed changes

robertgshaw2-redhat requested changes Aug 29, 2025

View reviewed changes

vllm/distributed/kv_transfer/kv_connector/v1/nixl_connector.py Outdated Show resolved Hide resolved

CarrotShoo added 3 commits September 1, 2025 09:31

replace list with numpy array

c0fc5ef

Signed-off-by: ycyaw66 <497410282@qq.com>

fix import

b340390

Signed-off-by: ycyaw66 <497410282@qq.com>

change outputs to numpy

db73beb

Signed-off-by: ycyaw66 <497410282@qq.com>

CarrotShoo force-pushed the issue-23780 branch from 590edbc to db73beb Compare September 1, 2025 01:31

david6666666 requested a review from robertgshaw2-redhat September 3, 2025 01:11

robertgshaw2-redhat approved these changes Sep 3, 2025

View reviewed changes

robertgshaw2-redhat enabled auto-merge (squash) September 3, 2025 17:19

github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Sep 3, 2025

robertgshaw2-redhat merged commit 6adaed4 into vllm-project:main Sep 3, 2025
52 checks passed

eicherseiji pushed a commit to eicherseiji/vllm that referenced this pull request Sep 9, 2025

[Feature][P/D]: Optimize NIXL Connector xfer Launch (vllm-project#23887)

607c6fd

Signed-off-by: ycyaw66 <497410282@qq.com> Co-authored-by: ycyaw66 <497410282@qq.com>

FeiDaLI pushed a commit to FeiDaLI/vllm that referenced this pull request Sep 25, 2025

[Feature][P/D]: Optimize NIXL Connector xfer Launch (vllm-project#23887)

d30333f

Signed-off-by: ycyaw66 <497410282@qq.com> Co-authored-by: ycyaw66 <497410282@qq.com>

MerHS mentioned this pull request Oct 17, 2025

[RFC]: Remove redundant multi-modal input preprocessing during disaggregated inference #27094

Open

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Feature][P/D]: Optimize NIXL Connector xfer Launch #23887

[Feature][P/D]: Optimize NIXL Connector xfer Launch #23887

david6666666 commented Aug 29, 2025 •

edited by github-actions bot

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

robertgshaw2-redhat commented Aug 29, 2025

Uh oh!

Uh oh!

david6666666 commented Sep 3, 2025

Uh oh!

robertgshaw2-redhat commented Sep 3, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

[Feature][P/D]: Optimize NIXL Connector xfer Launch #23887

[Feature][P/D]: Optimize NIXL Connector xfer Launch #23887

Conversation

david6666666 commented Aug 29, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

robertgshaw2-redhat commented Aug 29, 2025

Uh oh!

Uh oh!

david6666666 commented Sep 3, 2025

Uh oh!

robertgshaw2-redhat commented Sep 3, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

david6666666 commented Aug 29, 2025 •

edited by github-actions bot

Loading