feat(benchmarks): Add torch profiler support via `ENABLE_PROFILE` environment variable by haofrank · Pull Request #611 · InferenceMAX/InferenceMAX

haofrank · 2026-01-31T00:43:16Z

Summary

Add torch profiler support to benchmark scripts, enabling collection of detailed GPU performance data during benchmark runs.

Resolves #610

Changes

`benchmarks/benchmark_lib.sh`

Added automatic profiler setup when ENABLE_PROFILE=true
Auto-configure VLLM_TORCH_PROFILER_DIR to /workspace/profiling if not explicitly set
Auto-create profiling output directory
Added --profile flag to benchmark_serving.py invocation in run_benchmark_serving()

Runner Scripts

Added ENABLE_PROFILE and VLLM_TORCH_PROFILER_DIR environment variable passthrough to docker containers:

runners/launch_b200-dgxc.sh
runners/launch_h100-cr.sh
runners/launch_mi300x-amd.sh
runners/launch_mi300x-cr.sh

Note: SLURM-based runners using --export=ALL already pass through all environment variables.

Usage

Enable profiling with default output directory (/workspace/profiling)

ENABLE_PROFILE=true ./benchmarks/dsr1_fp8_b200.sh

Enable profiling with custom output directory

ENABLE_PROFILE=true VLLM_TORCH_PROFILER_DIR=/custom/path ./benchmarks/dsr1_fp8_b200.sh

How It Works

When benchmark_lib.sh is sourced, it checks ENABLE_PROFILE
If enabled, sets VLLM_TORCH_PROFILER_DIR (server uses this to enable /start_profile and /stop_profile endpoints)
During benchmark, benchmark_serving.py calls these endpoints to collect torch profiling data
Profiling results are saved to VLLM_TORCH_PROFILER_DIR

Testing

Tested with ENABLE_PROFILE=true on AMD GPU node
Verified profiling data is generated in the output directory

functionstackx · 2026-01-31T00:58:14Z

@haofrank thanks! any chance you or ur ai assistant can add the abiltty for only a couple of fwd passes. doing the whole time would create an large trace

functionstackx · 2026-01-31T01:16:05Z

@Oseltamivir was working on some of this too with outputting to github action artfacts or side repo and even have an perfetto relay for inferencemax (tho put on hold to work on evals)

https://github.com/Oseltamivir/profiler-storage

https://oseltamivir.github.io/profiler-storage/?src=https%3A%2F%2Fraw.githubusercontent.com%2FOseltamivir%2Fprofiler-storage%2F9fcdc64038c4f1e297e2ad629b25d4fa77ae975d%2Fprofiles%2F9313de6f1e694aa65314a8f83ab0e4983643b401%2Fh200%2Fsglang%2Fdsr1_1k1k_fp8_tp8_ep1_conc64%2Ftrace.json.gz&title=dsr1_1k1k_fp8_tp8_ep1_conc64

Oseltamivir · 2026-01-31T02:24:43Z

Thanks @haofrank, we already have a profiling branch at https://github.com/InferenceMAX/InferenceMAX/tree/profiling, where we tried to do automatic profiler analysis of the trace in order to give InferenceMAX another axis of information: How to improve.

However, it is currently stale as only hopper had usable information (kernel input dims, shapes, etc).

Nevertheless, having the ability to run the profiler may be useful. We will consider merging the profiler choice with the perfetto relay into main.

Oseltamivir · 2026-02-03T00:23:22Z

@haofrank If you would like to contribute to the repo, we would appreciate it if you could help update the branch at https://github.com/InferenceMAX/InferenceMAX/tree/profiling and PR. Else, I will probably do that this coming weekend.

haofrank · 2026-02-03T01:08:31Z

Hi @Oseltamivir, thanks for the note! I’d be happy to help with this.

From a quick look, it seems the profiling branch already supports enabling profiling via the PROFILE env variable, but there are also quite a few other diffs in the branch. I’m not sure yet whether some of them were experimental or intended to be merged.

I’ll need a bit of time to go through and understand the intent. If I’m not able to make meaningful progress in time, it probably makes sense for you to proceed it.

Oseltamivir · 2026-02-04T00:09:33Z

Hi @haofrank, yep, that branch is very different in its intentions. If you could help:

Update the branch with main(container images, etc)
Add the profiling option to benchmark (Like this PR)
Profile decode & prefill
Enable a way to run with profiler by adding a profiler arg with .github/workflows/e2e-tests.yml and utils/matrix_logic/generate_sweep_configs.py
Enable artifact and relay, like how https://github.com/InferenceMAX/InferenceMAX/actions/runs/20731533868 has a perfetto relay link
Remove the python file that analyzes the results (original intent of profiling that went stale)

We value open source contributions and can definitly merge it in. The profiling branch works so you can assume env vars/relays are working.

However, I agree it might be difficult as there are quite a few parts involved. Let me know what you think, I'll work on this next week Wednesday if you're unable to make progress

feat: add torch profiler support for benchmark scripts

56f2e58

haofrank requested a review from a team as a code owner January 31, 2026 00:43

github-project-automation bot added this to InferenceMAX Board Jan 31, 2026

haofrank changed the title ~~feat(benchmarks): Add torch profiler support via ENABLE_PROFILE environment variable~~ feat(benchmarks): Add torch profiler support via ENABLE_PROFILE environment variable Jan 31, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(benchmarks): Add torch profiler support via `ENABLE_PROFILE` environment variable#611

feat(benchmarks): Add torch profiler support via `ENABLE_PROFILE` environment variable#611
haofrank wants to merge 1 commit intoInferenceMAX:mainfrom
haofrank:main

haofrank commented Jan 31, 2026 •

edited

Loading

Uh oh!

functionstackx commented Jan 31, 2026

Uh oh!

functionstackx commented Jan 31, 2026

Uh oh!

Oseltamivir commented Jan 31, 2026

Uh oh!

Oseltamivir commented Feb 3, 2026

Uh oh!

haofrank commented Feb 3, 2026

Uh oh!

Oseltamivir commented Feb 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

haofrank commented Jan 31, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

benchmarks/benchmark_lib.sh

Runner Scripts

Usage

Enable profiling with default output directory (/workspace/profiling)

Enable profiling with custom output directory

How It Works

Testing

Uh oh!

functionstackx commented Jan 31, 2026

Uh oh!

functionstackx commented Jan 31, 2026

Uh oh!

Oseltamivir commented Jan 31, 2026

Uh oh!

Oseltamivir commented Feb 3, 2026

Uh oh!

haofrank commented Feb 3, 2026

Uh oh!

Oseltamivir commented Feb 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

haofrank commented Jan 31, 2026 •

edited

Loading

`benchmarks/benchmark_lib.sh`