[Profiler] Pay for what you use (v2) #74484

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Closed

robieta wants to merge 1 commit into pytorch:master from robieta:export-D34779994

Contributor

robieta commented Mar 21, 2022

Summary:
In my first attempt at this in December I stamped out specializations using variadic templates. However I'm able to get comparable performance using simple conditionals since the branch is very predictable and AppendOnlyList::emplace_back is low enough overhead that multiple calls don't cause an issue.

This is also a chance to do some BE: rather than force ops and backend events to use the same fields (which in practice means setting a bunch of default values when reporting backend events), I just split them and use a variant.

Test Plan: The single threaded benchmark (with no extra options set) improved considerably from ~0.88 us to ~0.62 us. The stress test benchmark improved modestly from ~6.1 us to ~5.8 us. So the bottleneck for multi-threading is somewhere else, but doing less wasted work is still able to move the needle a little bit.

Reviewed By: swolchok

Differential Revision: D34779994

robieta requested review from albanD and soulitzer as code owners

March 21, 2022 19:02

facebook-github-bot added the cla signed label

Contributor

facebook-github-bot commented Mar 21, 2022 •

edited

Loading

🔗 Helpful links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/74484
Need help or want to give feedback on the CI? Visit our office hours

💊 CI failures summary and remediations

As of commit 9af2568 (more details on the Dr. CI page):

💚 💚 Looks good so far! There are no failures yet. 💚 💚

This comment was automatically generated by Dr. CI (expand for details).

Please report bugs/suggestions to the (internal) Dr. CI Users group.

Click here to manually regenerate this comment.

facebook-github-bot added the fb-exported label

Contributor

facebook-github-bot commented Mar 21, 2022

This pull request was exported from Phabricator. Differential Revision: D34779994

robieta force-pushed the export-D34779994 branch from 1de8e0a to 9df93e2 Compare

March 22, 2022 00:18

Contributor

facebook-github-bot commented Mar 22, 2022

This pull request was exported from Phabricator. Differential Revision: D34779994

albanD removed their request for review

March 22, 2022 19:45

Contributor

facebook-github-bot commented Mar 22, 2022

This pull request was exported from Phabricator. Differential Revision: D34779994

robieta force-pushed the export-D34779994 branch from 9df93e2 to a99e473 Compare

March 22, 2022 20:46

robieta force-pushed the export-D34779994 branch from a99e473 to 610bae5 Compare

March 23, 2022 06:44

Contributor

facebook-github-bot commented Mar 23, 2022

This pull request was exported from Phabricator. Differential Revision: D34779994


          [Profiler] Pay for what you use (v2) (pytorch#74484)

9af2568

Summary:
Pull Request resolved: pytorch#74484

In my first attempt at this in December I stamped out specializations using variadic templates. However I'm able to get comparable performance using simple conditionals since the branch is very predictable and AppendOnlyList::emplace_back is low enough overhead that multiple calls don't cause an issue.

This is also a chance to do some BE: rather than force ops and backend events to use the same fields (which in practice means setting a bunch of default values when reporting backend events), I just split them and use a variant.

Test Plan: The single threaded benchmark (with no extra options set) improved considerably from ~0.88 us to ~0.62 us. The stress test benchmark improved modestly from ~6.1 us to ~5.8 us. So the bottleneck for multi-threading is somewhere else, but doing less wasted work is still able to move the needle a little bit.

Reviewed By: swolchok

Differential Revision: D34779994

fbshipit-source-id: 0b0503678929b2d61e9ddb1b27887e64fdcb7fe4

Contributor

facebook-github-bot commented Mar 23, 2022

This pull request was exported from Phabricator. Differential Revision: D34779994

robieta force-pushed the export-D34779994 branch from 610bae5 to 9af2568 Compare

March 23, 2022 20:52

aaronenyeshi approved these changes

View reviewed changes

Member

aaronenyeshi left a comment

LGTM, reviewed internally! Thanks a lot for this patch!

facebook-github-bot pushed a commit that referenced this pull request


          [Profiler] Pay for what you use (v2) (#74484)

f0a49ff

Summary:
Pull Request resolved: #74484

In my first attempt at this in December I stamped out specializations using variadic templates. However I'm able to get comparable performance using simple conditionals since the branch is very predictable and AppendOnlyList::emplace_back is low enough overhead that multiple calls don't cause an issue.

This is also a chance to do some BE: rather than force ops and backend events to use the same fields (which in practice means setting a bunch of default values when reporting backend events), I just split them and use a variant.

Test Plan: The single threaded benchmark (with no extra options set) improved considerably from ~0.88 us to ~0.62 us. The stress test benchmark improved modestly from ~6.1 us to ~5.8 us. So the bottleneck for multi-threading is somewhere else, but doing less wasted work is still able to move the needle a little bit.

Reviewed By: swolchok

Differential Revision: D34779994

fbshipit-source-id: 392bc7c6f12797fa5e18777063aa21210d9d2067

pytorchmergebot closed this in

2ecf743

Contributor

github-actions bot commented Mar 24, 2022

Hey @robieta.
You've committed this PR, but it does not have both a 'release notes: ...' and 'topics: ...' label. Please add one of each to the PR. The 'release notes: ...' label should represent the part of PyTorch that this PR changes (fx, autograd, distributed, etc) and the 'topics: ...' label should represent the kind of PR it is (not user facing, new feature, bug fix, perf improvement, etc). The list of valid labels can be found here for the 'release notes: ...' and here for the 'topics: ...'.
For changes that are 'topic: not user facing' there is no need for a release notes label.

robieta added the release notes: profiler label

shahofblah pushed a commit that referenced this pull request


          [Profiler] Pay for what you use (v2) (#74484)

ff51434

Summary:
Pull Request resolved: #74484

In my first attempt at this in December I stamped out specializations using variadic templates. However I'm able to get comparable performance using simple conditionals since the branch is very predictable and AppendOnlyList::emplace_back is low enough overhead that multiple calls don't cause an issue.

This is also a chance to do some BE: rather than force ops and backend events to use the same fields (which in practice means setting a bunch of default values when reporting backend events), I just split them and use a variant.

Test Plan: The single threaded benchmark (with no extra options set) improved considerably from ~0.88 us to ~0.62 us. The stress test benchmark improved modestly from ~6.1 us to ~5.8 us. So the bottleneck for multi-threading is somewhere else, but doing less wasted work is still able to move the needle a little bit.

Reviewed By: swolchok

Differential Revision: D34779994

fbshipit-source-id: 392bc7c6f12797fa5e18777063aa21210d9d2067
(cherry picked from commit f0a49ff)

WBobby mentioned this pull request

Add ROCm5.2.3/AMDGPU support for PyTorch WBobby/pytorch#2

Closed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cla signed fb-exported release notes: profiler