Skip to content

Conversation

@ZJY0516
Copy link
Contributor

@ZJY0516 ZJY0516 commented Aug 29, 2025

Purpose

Add benchmark for custom activation op.

#19817

Test Plan

python kernels/benchmark_activation.py

Test Result

activation-op-performance:
     batch_size  seq_len  intermediate_size  Custom OP   Compiled
0           1.0      1.0             3072.0   0.001223   0.000941
1           1.0      1.0             9728.0   0.001954   0.000952
2           1.0      1.0            12288.0   0.002208   0.000953
3           1.0     16.0             3072.0   0.001285   0.001258
4           1.0     16.0             9728.0   0.002078   0.001373
5           1.0     16.0            12288.0   0.002375   0.001431
6           1.0     64.0             3072.0   0.001365   0.001426
7           1.0     64.0             9728.0   0.002169   0.001929
8           1.0     64.0            12288.0   0.002459   0.002236
9           1.0    128.0             3072.0   0.001483   0.001714
10          1.0    128.0             9728.0   0.002291   0.002802
11          1.0    128.0            12288.0   0.002595   0.003191
12          1.0    256.0             3072.0   0.002156   0.002235
13          1.0    256.0             9728.0   0.003737   0.004379
14          1.0    256.0            12288.0   0.004304   0.005216
15          1.0    512.0             3072.0   0.003489   0.003187
16          1.0    512.0             9728.0   0.006649   0.007542
17          1.0    512.0            12288.0   0.007775   0.009209
18          1.0   1024.0             3072.0   0.006092   0.005217
19          1.0   1024.0             9728.0   0.012118   0.013985
20          1.0   1024.0            12288.0   0.015247   0.017411
21          1.0   2048.0             3072.0   0.011028   0.009220
22          1.0   2048.0             9728.0   0.024933   0.030249
23          1.0   2048.0            12288.0   0.037146   0.041409
24          1.0   4096.0             3072.0   0.021270   0.017677
25          1.0   4096.0             9728.0   0.141096   0.169027
26          1.0   4096.0            12288.0   0.218071   0.215046
27         16.0      1.0             3072.0   0.001293   0.001254
28         16.0      1.0             9728.0   0.002082   0.001370
29         16.0      1.0            12288.0   0.002364   0.001427
30         16.0     16.0             3072.0   0.002161   0.002237
31         16.0     16.0             9728.0   0.003745   0.004413
32         16.0     16.0            12288.0   0.004322   0.005217
33         16.0     64.0             3072.0   0.006092   0.005190
34         16.0     64.0             9728.0   0.012809   0.014001
35         16.0     64.0            12288.0   0.015609   0.017554
36         16.0    128.0             3072.0   0.010975   0.009221
37         16.0    128.0             9728.0   0.024521   0.029338
38         16.0    128.0            12288.0   0.037956   0.041443
39         16.0    256.0             3072.0   0.021108   0.017428
40         16.0    256.0             9728.0   0.144411   0.168987
41         16.0    256.0            12288.0   0.218100   0.215015
42         16.0    512.0             3072.0   0.043201   0.041063
43         16.0    512.0             9728.0   0.348653   0.342984
44         16.0    512.0            12288.0   0.436873   0.432006
45         16.0   1024.0             3072.0   0.211183   0.216593
46         16.0   1024.0             9728.0   0.698254   0.682798
47         16.0   1024.0            12288.0   0.872119   0.865484
48         16.0   2048.0             3072.0   0.420624   0.429078
49         16.0   2048.0             9728.0   1.393591   1.359306
50         16.0   2048.0            12288.0   1.741905   1.717667
51         16.0   4096.0             3072.0   0.841009   0.884876
52         16.0   4096.0             9728.0   2.790091   2.720251
53         16.0   4096.0            12288.0   3.486080   3.435315
54         32.0      1.0             3072.0   0.001294   0.001306
55         32.0      1.0             9728.0   0.002102   0.001641
56         32.0      1.0            12288.0   0.002380   0.001705
57         32.0     16.0             3072.0   0.003488   0.003196
58         32.0     16.0             9728.0   0.006630   0.007558
59         32.0     16.0            12288.0   0.007915   0.009234
60         32.0     64.0             3072.0   0.010994   0.009234
61         32.0     64.0             9728.0   0.024935   0.030219
62         32.0     64.0            12288.0   0.035615   0.035514
63         32.0    128.0             3072.0   0.021196   0.017534
64         32.0    128.0             9728.0   0.166938   0.169127
65         32.0    128.0            12288.0   0.217841   0.218477
66         32.0    256.0             3072.0   0.043824   0.042190
67         32.0    256.0             9728.0   0.308808   0.341875
68         32.0    256.0            12288.0   0.436702   0.433015
69         32.0    512.0             3072.0   0.211054   0.219288
70         32.0    512.0             9728.0   0.697179   0.682196
71         32.0    512.0            12288.0   0.869955   0.854082
72         32.0   1024.0             3072.0   0.421191   0.427895
73         32.0   1024.0             9728.0   1.396846   1.361014
74         32.0   1024.0            12288.0   1.737914   1.715146
75         32.0   2048.0             3072.0   0.840322   0.868718
76         32.0   2048.0             9728.0   2.792302   2.734885
77         32.0   2048.0            12288.0   3.481395   3.409680
78         32.0   4096.0             3072.0   1.681036   1.706253
79         32.0   4096.0             9728.0   5.583680   5.441024
80         32.0   4096.0            12288.0   7.623680   7.002856
81         64.0      1.0             3072.0   0.001371   0.001428
82         64.0      1.0             9728.0   0.002180   0.001931
83         64.0      1.0            12288.0   0.002467   0.002238
84         64.0     16.0             3072.0   0.006083   0.005209
85         64.0     16.0             9728.0   0.012633   0.014167
86         64.0     16.0            12288.0   0.014455   0.018034
87         64.0     64.0             3072.0   0.021487   0.017593
88         64.0     64.0             9728.0   0.167601   0.171055
89         64.0     64.0            12288.0   0.217877   0.220154
90         64.0    128.0             3072.0   0.042184   0.035615
91         64.0    128.0             9728.0   0.348718   0.342977
92         64.0    128.0            12288.0   0.436804   0.430604
93         64.0    256.0             3072.0   0.211211   0.216612
94         64.0    256.0             9728.0   0.697051   0.685659
95         64.0    256.0            12288.0   0.872751   0.865636
96         64.0    512.0             3072.0   0.420603   0.430231
97         64.0    512.0             9728.0   1.394651   1.359608
98         64.0    512.0            12288.0   1.742057   1.715660
99         64.0   1024.0             3072.0   0.841033   0.882463
100        64.0   1024.0             9728.0   2.789961   2.734782
101        64.0   1024.0            12288.0   3.487130   3.457926
102        64.0   2048.0             3072.0   1.679639   1.762630
103        64.0   2048.0             9728.0   5.584016   5.459968
104        64.0   2048.0            12288.0   7.622144   6.920448
105        64.0   4096.0             3072.0   3.359232   3.462656
106        64.0   4096.0             9728.0  12.243840  11.890656
107        64.0   4096.0            12288.0  15.257728  14.903296
108       128.0      1.0             3072.0   0.001480   0.001710
109       128.0      1.0             9728.0   0.002308   0.002806
110       128.0      1.0            12288.0   0.002595   0.003193
111       128.0     16.0             3072.0   0.010948   0.009278
112       128.0     16.0             9728.0   0.026788   0.030849
113       128.0     16.0            12288.0   0.036312   0.039285
114       128.0     64.0             3072.0   0.041743   0.043185
115       128.0     64.0             9728.0   0.348416   0.341254
116       128.0     64.0            12288.0   0.435245   0.433262
117       128.0    128.0             3072.0   0.211766   0.219153
118       128.0    128.0             9728.0   0.697600   0.681964
119       128.0    128.0            12288.0   0.872715   0.857085
120       128.0    256.0             3072.0   0.421191   0.428043
121       128.0    256.0             9728.0   1.396809   1.362605
122       128.0    256.0            12288.0   1.741124   1.709103
123       128.0    512.0             3072.0   0.840234   0.872982
124       128.0    512.0             9728.0   2.790027   2.720907
125       128.0    512.0            12288.0   3.483750   3.408896
126       128.0   1024.0             3072.0   1.681036   1.712727
127       128.0   1024.0             9728.0   5.587115   5.440320
128       128.0   1024.0            12288.0   7.623160   6.855168
129       128.0   2048.0             3072.0   3.359725   3.459062
130       128.0   2048.0             9728.0  12.259840  11.898816
131       128.0   2048.0            12288.0  15.249408  14.933952
132       128.0   4096.0             3072.0   7.207928   6.905600
133       128.0   4096.0             9728.0  24.509953  23.771648
134       128.0   4096.0            12288.0  30.573055  29.791744

Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>
@mergify mergify bot added the performance Performance-related issues label Aug 29, 2025
@ZJY0516
Copy link
Contributor Author

ZJY0516 commented Aug 29, 2025

@ProExpertProg Hi, I add some benchmarks for custom op. Could you please take a look?

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a new benchmark script for custom activation operations. The script is well-structured, but I've found a couple of issues that should be addressed. There's a minor logic error in a conditional block that could lead to unexpected behavior, and some unreachable code due to a misunderstanding of the argparse library's error handling. Addressing these points will improve the script's correctness and maintainability.

Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>
Copy link
Collaborator

@ProExpertProg ProExpertProg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you also add the ability to compare to the torch.compiled forward_native numbers?

Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>
@ZJY0516
Copy link
Contributor Author

ZJY0516 commented Aug 29, 2025

Could you also add the ability to compare to the torch.compiled forward_native numbers?

Sorry, I'm not sure what that means. I compiled the forward_native function using torch.compile(layer.forward_native).

Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>
Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>
@ZJY0516 ZJY0516 requested a review from ProExpertProg August 29, 2025 15:23
@ZJY0516
Copy link
Contributor Author

ZJY0516 commented Aug 30, 2025

@mgoin Hi, could you please review this when you get a chance?

Copy link
Collaborator

@ProExpertProg ProExpertProg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of manually specifying the dimensions, I would just always do a sweep of popular sizes (and you can let users override via a comma-separated CLI flag).

@ProExpertProg
Copy link
Collaborator

Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>
Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>
Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>
Copy link
Collaborator

@ProExpertProg ProExpertProg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another nit!

ZJY0516 and others added 2 commits September 6, 2025 21:50
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
Signed-off-by: Jiangyun Zhu <riverclouds.zhu@qq.com>
Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>
@ProExpertProg ProExpertProg enabled auto-merge (squash) September 6, 2025 19:10
@github-actions github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Sep 6, 2025
@vllm-bot vllm-bot merged commit 77aec83 into vllm-project:main Sep 7, 2025
13 of 14 checks passed
eicherseiji pushed a commit to eicherseiji/vllm that referenced this pull request Sep 9, 2025
Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>
Signed-off-by: Jiangyun Zhu <riverclouds.zhu@qq.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
skyloevil pushed a commit to skyloevil/vllm that referenced this pull request Sep 13, 2025
Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>
Signed-off-by: Jiangyun Zhu <riverclouds.zhu@qq.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
FeiDaLI pushed a commit to FeiDaLI/vllm that referenced this pull request Sep 25, 2025
Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>
Signed-off-by: Jiangyun Zhu <riverclouds.zhu@qq.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
@ZJY0516 ZJY0516 deleted the bench-activation branch September 30, 2025 16:14
xuebwang-amd pushed a commit to xuebwang-amd/vllm that referenced this pull request Oct 10, 2025
Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>
Signed-off-by: Jiangyun Zhu <riverclouds.zhu@qq.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
Signed-off-by: xuebwang-amd <xuebwang@amd.com>
xuebwang-amd pushed a commit to xuebwang-amd/vllm that referenced this pull request Oct 24, 2025
Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>
Signed-off-by: Jiangyun Zhu <riverclouds.zhu@qq.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
Signed-off-by: xuebwang-amd <xuebwang@amd.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

performance Performance-related issues ready ONLY add when PR is ready to merge/full CI is needed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants