[Benchmark] add benchmark for custom activation op #23908

ZJY0516 · 2025-08-29T07:55:05Z

Purpose

Add benchmark for custom activation op.

#19817

Test Plan

python kernels/benchmark_activation.py

Test Result

activation-op-performance:
     batch_size  seq_len  intermediate_size  Custom OP   Compiled
0           1.0      1.0             3072.0   0.001223   0.000941
1           1.0      1.0             9728.0   0.001954   0.000952
2           1.0      1.0            12288.0   0.002208   0.000953
3           1.0     16.0             3072.0   0.001285   0.001258
4           1.0     16.0             9728.0   0.002078   0.001373
5           1.0     16.0            12288.0   0.002375   0.001431
6           1.0     64.0             3072.0   0.001365   0.001426
7           1.0     64.0             9728.0   0.002169   0.001929
8           1.0     64.0            12288.0   0.002459   0.002236
9           1.0    128.0             3072.0   0.001483   0.001714
10          1.0    128.0             9728.0   0.002291   0.002802
11          1.0    128.0            12288.0   0.002595   0.003191
12          1.0    256.0             3072.0   0.002156   0.002235
13          1.0    256.0             9728.0   0.003737   0.004379
14          1.0    256.0            12288.0   0.004304   0.005216
15          1.0    512.0             3072.0   0.003489   0.003187
16          1.0    512.0             9728.0   0.006649   0.007542
17          1.0    512.0            12288.0   0.007775   0.009209
18          1.0   1024.0             3072.0   0.006092   0.005217
19          1.0   1024.0             9728.0   0.012118   0.013985
20          1.0   1024.0            12288.0   0.015247   0.017411
21          1.0   2048.0             3072.0   0.011028   0.009220
22          1.0   2048.0             9728.0   0.024933   0.030249
23          1.0   2048.0            12288.0   0.037146   0.041409
24          1.0   4096.0             3072.0   0.021270   0.017677
25          1.0   4096.0             9728.0   0.141096   0.169027
26          1.0   4096.0            12288.0   0.218071   0.215046
27         16.0      1.0             3072.0   0.001293   0.001254
28         16.0      1.0             9728.0   0.002082   0.001370
29         16.0      1.0            12288.0   0.002364   0.001427
30         16.0     16.0             3072.0   0.002161   0.002237
31         16.0     16.0             9728.0   0.003745   0.004413
32         16.0     16.0            12288.0   0.004322   0.005217
33         16.0     64.0             3072.0   0.006092   0.005190
34         16.0     64.0             9728.0   0.012809   0.014001
35         16.0     64.0            12288.0   0.015609   0.017554
36         16.0    128.0             3072.0   0.010975   0.009221
37         16.0    128.0             9728.0   0.024521   0.029338
38         16.0    128.0            12288.0   0.037956   0.041443
39         16.0    256.0             3072.0   0.021108   0.017428
40         16.0    256.0             9728.0   0.144411   0.168987
41         16.0    256.0            12288.0   0.218100   0.215015
42         16.0    512.0             3072.0   0.043201   0.041063
43         16.0    512.0             9728.0   0.348653   0.342984
44         16.0    512.0            12288.0   0.436873   0.432006
45         16.0   1024.0             3072.0   0.211183   0.216593
46         16.0   1024.0             9728.0   0.698254   0.682798
47         16.0   1024.0            12288.0   0.872119   0.865484
48         16.0   2048.0             3072.0   0.420624   0.429078
49         16.0   2048.0             9728.0   1.393591   1.359306
50         16.0   2048.0            12288.0   1.741905   1.717667
51         16.0   4096.0             3072.0   0.841009   0.884876
52         16.0   4096.0             9728.0   2.790091   2.720251
53         16.0   4096.0            12288.0   3.486080   3.435315
54         32.0      1.0             3072.0   0.001294   0.001306
55         32.0      1.0             9728.0   0.002102   0.001641
56         32.0      1.0            12288.0   0.002380   0.001705
57         32.0     16.0             3072.0   0.003488   0.003196
58         32.0     16.0             9728.0   0.006630   0.007558
59         32.0     16.0            12288.0   0.007915   0.009234
60         32.0     64.0             3072.0   0.010994   0.009234
61         32.0     64.0             9728.0   0.024935   0.030219
62         32.0     64.0            12288.0   0.035615   0.035514
63         32.0    128.0             3072.0   0.021196   0.017534
64         32.0    128.0             9728.0   0.166938   0.169127
65         32.0    128.0            12288.0   0.217841   0.218477
66         32.0    256.0             3072.0   0.043824   0.042190
67         32.0    256.0             9728.0   0.308808   0.341875
68         32.0    256.0            12288.0   0.436702   0.433015
69         32.0    512.0             3072.0   0.211054   0.219288
70         32.0    512.0             9728.0   0.697179   0.682196
71         32.0    512.0            12288.0   0.869955   0.854082
72         32.0   1024.0             3072.0   0.421191   0.427895
73         32.0   1024.0             9728.0   1.396846   1.361014
74         32.0   1024.0            12288.0   1.737914   1.715146
75         32.0   2048.0             3072.0   0.840322   0.868718
76         32.0   2048.0             9728.0   2.792302   2.734885
77         32.0   2048.0            12288.0   3.481395   3.409680
78         32.0   4096.0             3072.0   1.681036   1.706253
79         32.0   4096.0             9728.0   5.583680   5.441024
80         32.0   4096.0            12288.0   7.623680   7.002856
81         64.0      1.0             3072.0   0.001371   0.001428
82         64.0      1.0             9728.0   0.002180   0.001931
83         64.0      1.0            12288.0   0.002467   0.002238
84         64.0     16.0             3072.0   0.006083   0.005209
85         64.0     16.0             9728.0   0.012633   0.014167
86         64.0     16.0            12288.0   0.014455   0.018034
87         64.0     64.0             3072.0   0.021487   0.017593
88         64.0     64.0             9728.0   0.167601   0.171055
89         64.0     64.0            12288.0   0.217877   0.220154
90         64.0    128.0             3072.0   0.042184   0.035615
91         64.0    128.0             9728.0   0.348718   0.342977
92         64.0    128.0            12288.0   0.436804   0.430604
93         64.0    256.0             3072.0   0.211211   0.216612
94         64.0    256.0             9728.0   0.697051   0.685659
95         64.0    256.0            12288.0   0.872751   0.865636
96         64.0    512.0             3072.0   0.420603   0.430231
97         64.0    512.0             9728.0   1.394651   1.359608
98         64.0    512.0            12288.0   1.742057   1.715660
99         64.0   1024.0             3072.0   0.841033   0.882463
100        64.0   1024.0             9728.0   2.789961   2.734782
101        64.0   1024.0            12288.0   3.487130   3.457926
102        64.0   2048.0             3072.0   1.679639   1.762630
103        64.0   2048.0             9728.0   5.584016   5.459968
104        64.0   2048.0            12288.0   7.622144   6.920448
105        64.0   4096.0             3072.0   3.359232   3.462656
106        64.0   4096.0             9728.0  12.243840  11.890656
107        64.0   4096.0            12288.0  15.257728  14.903296
108       128.0      1.0             3072.0   0.001480   0.001710
109       128.0      1.0             9728.0   0.002308   0.002806
110       128.0      1.0            12288.0   0.002595   0.003193
111       128.0     16.0             3072.0   0.010948   0.009278
112       128.0     16.0             9728.0   0.026788   0.030849
113       128.0     16.0            12288.0   0.036312   0.039285
114       128.0     64.0             3072.0   0.041743   0.043185
115       128.0     64.0             9728.0   0.348416   0.341254
116       128.0     64.0            12288.0   0.435245   0.433262
117       128.0    128.0             3072.0   0.211766   0.219153
118       128.0    128.0             9728.0   0.697600   0.681964
119       128.0    128.0            12288.0   0.872715   0.857085
120       128.0    256.0             3072.0   0.421191   0.428043
121       128.0    256.0             9728.0   1.396809   1.362605
122       128.0    256.0            12288.0   1.741124   1.709103
123       128.0    512.0             3072.0   0.840234   0.872982
124       128.0    512.0             9728.0   2.790027   2.720907
125       128.0    512.0            12288.0   3.483750   3.408896
126       128.0   1024.0             3072.0   1.681036   1.712727
127       128.0   1024.0             9728.0   5.587115   5.440320
128       128.0   1024.0            12288.0   7.623160   6.855168
129       128.0   2048.0             3072.0   3.359725   3.459062
130       128.0   2048.0             9728.0  12.259840  11.898816
131       128.0   2048.0            12288.0  15.249408  14.933952
132       128.0   4096.0             3072.0   7.207928   6.905600
133       128.0   4096.0             9728.0  24.509953  23.771648
134       128.0   4096.0            12288.0  30.573055  29.791744

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>

ZJY0516 · 2025-08-29T07:56:15Z

@ProExpertProg Hi, I add some benchmarks for custom op. Could you please take a look?

gemini-code-assist

Code Review

This pull request introduces a new benchmark script for custom activation operations. The script is well-structured, but I've found a couple of issues that should be addressed. There's a minor logic error in a conditional block that could lead to unexpected behavior, and some unreachable code due to a misunderstanding of the argparse library's error handling. Addressing these points will improve the script's correctness and maintainability.

benchmarks/kernels/benchmark_activation.py

Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>

ProExpertProg

Could you also add the ability to compare to the torch.compiled forward_native numbers?

benchmarks/kernels/benchmark_activation.py

Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>

ZJY0516 · 2025-08-29T13:25:22Z

Could you also add the ability to compare to the torch.compiled forward_native numbers?

Sorry, I'm not sure what that means. I compiled the forward_native function using torch.compile(layer.forward_native).

Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>

ZJY0516 · 2025-08-30T14:14:53Z

@mgoin Hi, could you please review this when you get a chance?

ProExpertProg

Instead of manually specifying the dimensions, I would just always do a sweep of popular sizes (and you can let users override via a comma-separated CLI flag).

benchmarks/kernels/benchmark_activation.py

ProExpertProg · 2025-09-04T13:39:45Z

I would set up the sweep similar to https://github.com/vllm-project/vllm/blob/main/benchmarks/kernels/bench_per_token_quant_fp8.py

Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>

benchmarks/kernels/benchmark_activation.py

Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>

benchmarks/kernels/benchmark_activation.py

ProExpertProg

Another nit!

benchmarks/kernels/benchmark_activation.py

Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com> Signed-off-by: Jiangyun Zhu <riverclouds.zhu@qq.com>

Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>

Signed-off-by: zjy0516 <riverclouds.zhu@qq.com> Signed-off-by: Jiangyun Zhu <riverclouds.zhu@qq.com> Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>

Signed-off-by: zjy0516 <riverclouds.zhu@qq.com> Signed-off-by: Jiangyun Zhu <riverclouds.zhu@qq.com> Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com> Signed-off-by: xuebwang-amd <xuebwang@amd.com>

add benchmark for custom activation op

da47050

Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>

mergify bot added the performance Performance-related issues label Aug 29, 2025

gemini-code-assist bot reviewed Aug 29, 2025

View reviewed changes

benchmarks/kernels/benchmark_activation.py Outdated Show resolved Hide resolved

benchmarks/kernels/benchmark_activation.py Outdated Show resolved Hide resolved

fix

7d2aeea

Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>

ProExpertProg reviewed Aug 29, 2025

View reviewed changes

benchmarks/kernels/benchmark_activation.py Outdated Show resolved Hide resolved

benchmarks/kernels/benchmark_activation.py Outdated Show resolved Hide resolved

fix desc

93ced9d

Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>

ZJY0516 added 3 commits August 29, 2025 21:38

use CustomOp.op_registry

2839968

Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>

Merge branch 'main' into bench-activation

378bbed

fix

e6249c3

Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>

ZJY0516 requested a review from ProExpertProg August 29, 2025 15:23

ProExpertProg reviewed Sep 4, 2025

View reviewed changes

benchmarks/kernels/benchmark_activation.py Outdated Show resolved Hide resolved

benchmarks/kernels/benchmark_activation.py Outdated Show resolved Hide resolved

benchmarks/kernels/benchmark_activation.py Outdated Show resolved Hide resolved

fix

486d24e

Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>

ZJY0516 requested a review from ProExpertProg September 4, 2025 15:42

fix

10df5f0

Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>

ProExpertProg reviewed Sep 5, 2025

View reviewed changes

benchmarks/kernels/benchmark_activation.py Outdated Show resolved Hide resolved

benchmarks/kernels/benchmark_activation.py Outdated Show resolved Hide resolved

ZJY0516 added 2 commits September 6, 2025 10:58

avoid using global vars

944bad2

Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>

Merge branch 'main' into bench-activation

0da8146

ZJY0516 requested a review from ProExpertProg September 6, 2025 03:01

fix

bdff238

Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>

ProExpertProg approved these changes Sep 6, 2025

View reviewed changes

benchmarks/kernels/benchmark_activation.py Outdated Show resolved Hide resolved

ProExpertProg reviewed Sep 6, 2025

View reviewed changes

benchmarks/kernels/benchmark_activation.py Outdated Show resolved Hide resolved

benchmarks/kernels/benchmark_activation.py Outdated Show resolved Hide resolved

benchmarks/kernels/benchmark_activation.py Outdated Show resolved Hide resolved

ZJY0516 and others added 2 commits September 6, 2025 21:50

Apply suggestions from code review

30825bf

Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com> Signed-off-by: Jiangyun Zhu <riverclouds.zhu@qq.com>

fix

700aa8c

Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>

ZJY0516 requested a review from ProExpertProg September 6, 2025 14:07

ProExpertProg approved these changes Sep 6, 2025

View reviewed changes

ProExpertProg enabled auto-merge (squash) September 6, 2025 19:10

github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Sep 6, 2025

vllm-bot merged commit 77aec83 into vllm-project:main Sep 7, 2025
13 of 14 checks passed

ZJY0516 deleted the bench-activation branch September 30, 2025 16:14

Uh oh!

[Benchmark] add benchmark for custom activation op #23908

[Benchmark] add benchmark for custom activation op #23908

Uh oh!

Conversation

ZJY0516 commented Aug 29, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

ZJY0516 commented Aug 29, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

ProExpertProg left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

ZJY0516 commented Aug 29, 2025

Uh oh!

ZJY0516 commented Aug 30, 2025

Uh oh!

ProExpertProg left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ProExpertProg commented Sep 4, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ProExpertProg left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

ZJY0516 commented Aug 29, 2025 •

edited by github-actions bot

Loading