Skip to content

feat(benchmarks): Add torch profiler support via ENABLE_PROFILE environment variable#611

Open
haofrank wants to merge 1 commit intoInferenceMAX:mainfrom
haofrank:main
Open

feat(benchmarks): Add torch profiler support via ENABLE_PROFILE environment variable#611
haofrank wants to merge 1 commit intoInferenceMAX:mainfrom
haofrank:main

Conversation

@haofrank
Copy link

@haofrank haofrank commented Jan 31, 2026

Summary

Add torch profiler support to benchmark scripts, enabling collection of detailed GPU performance data during benchmark runs.

Resolves #610

Changes

benchmarks/benchmark_lib.sh

  • Added automatic profiler setup when ENABLE_PROFILE=true
  • Auto-configure VLLM_TORCH_PROFILER_DIR to /workspace/profiling if not explicitly set
  • Auto-create profiling output directory
  • Added --profile flag to benchmark_serving.py invocation in run_benchmark_serving()

Runner Scripts

Added ENABLE_PROFILE and VLLM_TORCH_PROFILER_DIR environment variable passthrough to docker containers:

  • runners/launch_b200-dgxc.sh
  • runners/launch_h100-cr.sh
  • runners/launch_mi300x-amd.sh
  • runners/launch_mi300x-cr.sh

Note: SLURM-based runners using --export=ALL already pass through all environment variables.

Usage

Enable profiling with default output directory (/workspace/profiling)

ENABLE_PROFILE=true ./benchmarks/dsr1_fp8_b200.sh

Enable profiling with custom output directory

ENABLE_PROFILE=true VLLM_TORCH_PROFILER_DIR=/custom/path ./benchmarks/dsr1_fp8_b200.sh

How It Works

  1. When benchmark_lib.sh is sourced, it checks ENABLE_PROFILE
  2. If enabled, sets VLLM_TORCH_PROFILER_DIR (server uses this to enable /start_profile and /stop_profile endpoints)
  3. During benchmark, benchmark_serving.py calls these endpoints to collect torch profiling data
  4. Profiling results are saved to VLLM_TORCH_PROFILER_DIR

Testing

  • Tested with ENABLE_PROFILE=true on AMD GPU node
  • Verified profiling data is generated in the output directory

@haofrank haofrank requested a review from a team as a code owner January 31, 2026 00:43
@haofrank haofrank changed the title feat(benchmarks): Add torch profiler support via ENABLE_PROFILE environment variable feat(benchmarks): Add torch profiler support via ENABLE_PROFILE environment variable Jan 31, 2026
@functionstackx
Copy link
Contributor

@haofrank thanks! any chance you or ur ai assistant can add the abiltty for only a couple of fwd passes. doing the whole time would create an large trace

@functionstackx
Copy link
Contributor

@Oseltamivir
Copy link
Collaborator

Thanks @haofrank, we already have a profiling branch at https://github.com/InferenceMAX/InferenceMAX/tree/profiling, where we tried to do automatic profiler analysis of the trace in order to give InferenceMAX another axis of information: How to improve.

However, it is currently stale as only hopper had usable information (kernel input dims, shapes, etc).

Nevertheless, having the ability to run the profiler may be useful. We will consider merging the profiler choice with the perfetto relay into main.

@Oseltamivir
Copy link
Collaborator

@haofrank If you would like to contribute to the repo, we would appreciate it if you could help update the branch at https://github.com/InferenceMAX/InferenceMAX/tree/profiling and PR. Else, I will probably do that this coming weekend.

@haofrank
Copy link
Author

haofrank commented Feb 3, 2026

Hi @Oseltamivir, thanks for the note! I’d be happy to help with this.

From a quick look, it seems the profiling branch already supports enabling profiling via the PROFILE env variable, but there are also quite a few other diffs in the branch. I’m not sure yet whether some of them were experimental or intended to be merged.

I’ll need a bit of time to go through and understand the intent. If I’m not able to make meaningful progress in time, it probably makes sense for you to proceed it.

@Oseltamivir
Copy link
Collaborator

Hi @haofrank, yep, that branch is very different in its intentions. If you could help:

  1. Update the branch with main(container images, etc)
  2. Add the profiling option to benchmark (Like this PR)
  3. Profile decode & prefill
  4. Enable a way to run with profiler by adding a profiler arg with .github/workflows/e2e-tests.yml and utils/matrix_logic/generate_sweep_configs.py
  5. Enable artifact and relay, like how https://github.com/InferenceMAX/InferenceMAX/actions/runs/20731533868 has a perfetto relay link
  6. Remove the python file that analyzes the results (original intent of profiling that went stale)

We value open source contributions and can definitly merge it in. The profiling branch works so you can assume env vars/relays are working.

However, I agree it might be difficult as there are quite a few parts involved. Let me know what you think, I'll work on this next week Wednesday if you're unable to make progress

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

Support enabling torch profiler via environment variable

3 participants