-
Notifications
You must be signed in to change notification settings - Fork 26.3k
[Profiler] Switch to thread local subqueues to reduce lock contention. #74151
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
CI Flow Status⚛️ CI FlowRuleset - Version:
|
🔗 Helpful links
💊 CI failures summary and remediationsAs of commit dc073fe (more details on the Dr. CI page): 💚 💚 Looks good so far! There are no failures yet. 💚 💚 This comment was automatically generated by Dr. CI (expand for details).Please report bugs/suggestions to the (internal) Dr. CI Users group. |
|
This pull request was exported from Phabricator. Differential Revision: D34720171 |
1 similar comment
|
This pull request was exported from Phabricator. Differential Revision: D34720171 |
a76b2a9 to
739dde5
Compare
|
This pull request was exported from Phabricator. Differential Revision: D34720171 |
739dde5 to
f2bf5cc
Compare
|
This pull request was exported from Phabricator. Differential Revision: D34720171 |
f2bf5cc to
3e89fab
Compare
|
This pull request was exported from Phabricator. Differential Revision: D34720171 |
3e89fab to
d9c6290
Compare
pytorch#74151) Summary: Pull Request resolved: pytorch#74151 The first of several changes to move to an optimized recording data structure to back profiler. This PR keeps the existing monolithic `OpEventData` struct, but splits storage into thread local subqueues so we don't have to lock to insert. Test Plan: Unit tests and benchmarks. The single threaded benchmark is unchanged, and the multithreaded stress test dropped from ~21 us to ~6us. Reviewed By: chaekit Differential Revision: D34720171 fbshipit-source-id: e68589551ff4b05afef4d4040c0e64b2f21e7c27
|
This pull request was exported from Phabricator. Differential Revision: D34720171 |
d9c6290 to
dc073fe
Compare
#74151) Summary: Pull Request resolved: #74151 The first of several changes to move to an optimized recording data structure to back profiler. This PR keeps the existing monolithic `OpEventData` struct, but splits storage into thread local subqueues so we don't have to lock to insert. Test Plan: Unit tests and benchmarks. The single threaded benchmark is unchanged, and the multithreaded stress test dropped from ~21 us to ~6us. Reviewed By: chaekit Differential Revision: D34720171 fbshipit-source-id: 90b5ebe618b91099e0a19c1f31cfcd8fe1c2ea12
|
Hey @robieta. |
Summary: The first of several changes to move to an optimized recording data structure to back profiler. This PR keeps the existing monolithic
OpEventDatastruct, but splits storage into thread local subqueues so we don't have to lock to insert.Test Plan: Unit tests and benchmarks. The single threaded benchmark is unchanged, and the multithreaded stress test dropped from ~21 us to ~6us.
Reviewed By: chaekit
Differential Revision: D34720171