[ROCm][CI/Build] Sync ROCm dockerfiles with the ROCm fork #24279

gshtras · 2025-09-04T21:23:02Z

Bringing dockerfiles in sync with the ROCm fork, to match what is used to build rocm/vllm-dev:base; rocm/vllm-dev:nightly and rocm/vllm images

Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>

gemini-code-assist

Code Review

This pull request syncs the ROCm Dockerfiles with the ROCm fork. The changes include updating the base image to ROCm 6.4.1, updating various dependency versions, and refining the build process for components like Triton. These changes generally improve the build's stability and robustness. I have identified one issue where a new Docker build stage is defined but not utilized in the final image construction, which should be addressed to improve efficiency and clarity.

…ct#24279) Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>

…ct#24279) Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com> Signed-off-by: xuebwang-amd <xuebwang@amd.com>

fxmarty-amd · 2025-11-12T12:31:00Z

docker/Dockerfile.rocm_base

+RUN if [ -d triton/python/triton_kernels ]; then pip install build && cd triton/python/triton_kernels \
+    && python3 -m build --wheel && cp dist/*.whl /app/install; fi


We don't expect compatibility with pytorch-triton-rocm==3.5.0?

Seems like triton_kernels / https://github.com/ROCm/triton/tree/57c693b627fe058878ade4163a0a8df95d9fefa1/python/triton_kernels is not shipped with it.

Yes, does not work with upstream triton from https://download.pytorch.org/whl/rocm6.4:

(base) root@felix-marty-job-torch-vllm-1-rmfqs:~# vllm serve /models/openai_gpt-oss-20b --tensor-parallel-size 1 --enforce-e ager Traceback (most recent call last): File "/root/miniforge3/bin/vllm", line 33, in <module> sys.exit(load_entry_point('vllm', 'console_scripts', 'vllm')()) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/miniforge3/bin/vllm", line 25, in importlib_load_entry_point return next(matches).load() ^^^^^^^^^^^^^^^^^^^^ File "/root/miniforge3/lib/python3.12/importlib/metadata/__init__.py", line 205, in load module = import_module(match.group('module')) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/miniforge3/lib/python3.12/importlib/__init__.py", line 90, in import_module return _bootstrap._gcd_import(name[level:], package, level) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "<frozen importlib._bootstrap>", line 1387, in _gcd_import File "<frozen importlib._bootstrap>", line 1360, in _find_and_load File "<frozen importlib._bootstrap>", line 1310, in _find_and_load_unlocked File "<frozen importlib._bootstrap>", line 488, in _call_with_frames_removed File "<frozen importlib._bootstrap>", line 1387, in _gcd_import File "<frozen importlib._bootstrap>", line 1360, in _find_and_load File "<frozen importlib._bootstrap>", line 1331, in _find_and_load_unlocked File "<frozen importlib._bootstrap>", line 935, in _load_unlocked File "<frozen importlib._bootstrap_external>", line 999, in exec_module File "<frozen importlib._bootstrap>", line 488, in _call_with_frames_removed File "/shared_volume/repos/vllm/vllm/entrypoints/cli/__init__.py", line 4, in <module> from vllm.entrypoints.cli.benchmark.serve import BenchmarkServingSubcommand File "/shared_volume/repos/vllm/vllm/entrypoints/cli/benchmark/serve.py", line 5, in <module> from vllm.benchmarks.serve import add_cli_args, main File "/shared_volume/repos/vllm/vllm/benchmarks/serve.py", line 41, in <module> from vllm.benchmarks.datasets import SampleRequest, add_dataset_parser, get_samples File "/shared_volume/repos/vllm/vllm/benchmarks/datasets.py", line 39, in <module> from vllm.lora.utils import get_adapter_absolute_path File "/shared_volume/repos/vllm/vllm/lora/utils.py", line 22, in <module> from vllm.lora.layers import ( File "/shared_volume/repos/vllm/vllm/lora/layers/__init__.py", line 14, in <module> from vllm.lora.layers.fused_moe import FusedMoEWithLoRA File "/shared_volume/repos/vllm/vllm/lora/layers/fused_moe.py", line 17, in <module> from vllm.model_executor.layers.fused_moe import FusedMoE File "/shared_volume/repos/vllm/vllm/model_executor/layers/fused_moe/__init__.py", line 7, in <module> from vllm.model_executor.layers.fused_moe.config import FusedMoEConfig File "/shared_volume/repos/vllm/vllm/model_executor/layers/fused_moe/config.py", line 26, in <module> from triton_kernels.matmul_ogs import PrecisionConfig File "/shared_volume/repos/triton/python/triton_kernels/triton_kernels/matmul_ogs.py", line 11, in <module> from .matmul_ogs_details._matmul_ogs import _compute_writeback_idx File "/shared_volume/repos/triton/python/triton_kernels/triton_kernels/matmul_ogs_details/_matmul_ogs.py", line 7, in <module> from triton_kernels.numerics_details.flexpoint import float_to_flex, load_scale File "/shared_volume/repos/triton/python/triton_kernels/triton_kernels/numerics_details/flexpoint.py", line 55, in <module> @tl.constexpr_function ^^^^^^^^^^^^^^^^^^^^^ AttributeError: module 'triton.language' has no attribute 'constexpr_function' (base) root@felix-marty-job-torch-vllm-1-rmfqs:~# pip list | grep triton conch-triton-kernels 1.2.1 pytorch-triton-rocm 3.5.0

Sync ROCm dockerfiles with the ROCm fork

428db87

Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>

mergify bot added ci/build rocm Related to AMD ROCm labels Sep 4, 2025

gshtras mentioned this pull request Sep 4, 2025

[rocm] update pytorch rocm from 6.3 to 6.4 #23968

Open

gemini-code-assist bot reviewed Sep 4, 2025

View reviewed changes

gshtras added the ready ONLY add when PR is ready to merge/full CI is needed label Sep 6, 2025

SageMoore approved these changes Sep 9, 2025

View reviewed changes

gshtras enabled auto-merge (squash) September 9, 2025 14:50

ProExpertProg approved these changes Sep 9, 2025

View reviewed changes

gshtras merged commit b9a1c4c into vllm-project:main Sep 9, 2025
26 checks passed

eicherseiji pushed a commit to eicherseiji/vllm that referenced this pull request Sep 9, 2025

[ROCm][CI/Build] Sync ROCm dockerfiles with the ROCm fork (vllm-proje…

e0181af

…ct#24279) Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>

skyloevil pushed a commit to skyloevil/vllm that referenced this pull request Sep 13, 2025

[ROCm][CI/Build] Sync ROCm dockerfiles with the ROCm fork (vllm-proje…

a3abf62

…ct#24279) Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>

FeiDaLI pushed a commit to FeiDaLI/vllm that referenced this pull request Sep 25, 2025

[ROCm][CI/Build] Sync ROCm dockerfiles with the ROCm fork (vllm-proje…

6a987d4

…ct#24279) Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>

fxmarty-amd reviewed Nov 12, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[ROCm][CI/Build] Sync ROCm dockerfiles with the ROCm fork #24279

[ROCm][CI/Build] Sync ROCm dockerfiles with the ROCm fork #24279

gshtras commented Sep 4, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

fxmarty-amd Nov 12, 2025 •

edited

Loading

Uh oh!

fxmarty-amd Nov 12, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

		RUN if [ -d triton/python/triton_kernels ]; then pip install build && cd triton/python/triton_kernels \
		&& python3 -m build --wheel && cp dist/*.whl /app/install; fi

Uh oh!

[ROCm][CI/Build] Sync ROCm dockerfiles with the ROCm fork #24279

[ROCm][CI/Build] Sync ROCm dockerfiles with the ROCm fork #24279

Conversation

gshtras commented Sep 4, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

fxmarty-amd Nov 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

fxmarty-amd Nov 12, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

fxmarty-amd Nov 12, 2025 •

edited

Loading