-
Notifications
You must be signed in to change notification settings - Fork 26.3k
[RPC profiling] Don't wrap toHere() calls with profiling #44655
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Since `toHere()` does not execute operations over RPC and simply transfers the value to the local node, we don't need to enable the profiler remotely for this message. This causes unnecessary overhead and is not needed. Since `toHere` is a blocking call, we already profile the call on the local node using `RECORD_USER_SCOPE`, so this does not change the expected profiler results (validated by ensuring all remote profiling tests pass). Differential Revision: [D23641466](https://our.internmc.facebook.com/intern/diff/D23641466/) [ghstack-poisoned]
Since `toHere()` does not execute operations over RPC and simply transfers the value to the local node, we don't need to enable the profiler remotely for this message. This causes unnecessary overhead and is not needed. Since `toHere` is a blocking call, we already profile the call on the local node using `RECORD_USER_SCOPE`, so this does not change the expected profiler results (validated by ensuring all remote profiling tests pass). Differential Revision: [D23641466](https://our.internmc.facebook.com/intern/diff/D23641466/) ghstack-source-id: 112012912 Pull Request resolved: #44655
Codecov Report
@@ Coverage Diff @@
## gh/rohan-varma/172/base #44655 +/- ##
==========================================================
Coverage ? 67.85%
==========================================================
Files ? 384
Lines ? 50020
Branches ? 0
==========================================================
Hits ? 33940
Misses ? 16080
Partials ? 0 Continue to review full report at Codecov.
|
| // If profiler is enabled, wrap this message with profiling metadata that will | ||
| // tell the remote end to process this request with the profiler enabled. | ||
| if (torch::autograd::profiler::profilerEnabled()) { | ||
| if (!forceDisableProfiling && torch::autograd::profiler::profilerEnabled()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks OK to me. But I wonder if we can avoid this new arg by making torch::autograd::profiler::profilerEnabled() return false? E.g., is it possible to use a guard in rref_impl.cpp to toggle the status of profilerEnabled?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess one thing we could do is set a thread_local that allows this override, and have a guard that sets/restores it. If it's true, then we would override profilerEnabled() to always be false. Although, I don't think this should exist in profiler::profilerEnabled() itself since that may increase complexity in the profiler too much, maybe we can have a separate function for RPC profiling specifically that wraps around profilerEnabled() and checks this guard value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, if it's not that easy to toggle profiler on/off. The current version LGTM! Thanks!
Since `toHere()` does not execute operations (torch operators) over RPC and simply transfers the value to the local node, we don't need to enable the profiler remotely for this message. This causes unnecessary overhead and is not needed. Since `toHere` is a blocking call, we already profile the call on the local node using `RECORD_USER_SCOPE`, so this does not change the expected profiler results (validated by ensuring all remote profiling tests pass). Differential Revision: [D23641466](https://our.internmc.facebook.com/intern/diff/D23641466/) [ghstack-poisoned]
server * #44664 [RPC profiling] Extend RPC profiling to support async function execution over RPC. * **#44655 [RPC profiling] Don't wrap toHere() calls with profiling** * #44653 [RPC profiling] Allow disableProfiler() to be called from another thread. * #44646 Remove thread_local RecordFunctionGuard from profiler. Since `toHere()` does not execute operations (torch operators) over RPC and simply transfers the value to the local node, we don't need to enable the profiler remotely for this message. This causes unnecessary overhead and is not needed. Since `toHere` is a blocking call, we already profile the call on the local node using `RECORD_USER_SCOPE`, so this does not change the expected profiler results (validated by ensuring all remote profiling tests pass). Differential Revision: [D23641466](https://our.internmc.facebook.com/intern/diff/D23641466/) [ghstack-poisoned]
server * #44664 [RPC profiling] Extend RPC profiling to support async function execution over RPC. * #44655 [RPC profiling] Don't wrap toHere() calls with profiling * #44653 [RPC profiling] Allow disableProfiler() to be called from another thread. * **#44646 Remove thread_local RecordFunctionGuard from profiler.** Per a discussion with @ilia-cher, this is not needed anymore and removing it would make some future changes to support async RPC profiling easier. Tested by ensuring profiling tests in `test_autograd.py` still pass. Differential Revision: [D23683998](https://our.internmc.facebook.com/intern/diff/D23683998/) [ghstack-poisoned]
…another thread. " server * #44664 [RPC profiling] Extend RPC profiling to support async function execution over RPC. * #44655 [RPC profiling] Don't wrap toHere() calls with profiling * **#44653 [RPC profiling] Allow disableProfiler() to be called from another thread.** * #44646 Remove thread_local RecordFunctionGuard from profiler. This changes the profiler per a discussion with @ilia-cher offline that enables `disableProfiler()` event consolidation logic to be called from different threads (i.e. threads where the profiler was not explicitly enabled). This is needed to support the functionality enabled by D23638387 where we defer profiling event collection until executing an async callback that can execute on a different thread, to support RPC async function profiling. This is done by introducing 2 flags `cleanupTLSState` and `consolidate` which controls whether we should clean up thread local settings (we don't do this when calling `disableProfiler()` on non-main threads) and whether we should consolidate all profiled events. Backwards compatiblity is ensured since both options are true by default. Added a test in `test_misc.cpp` to test this. Differential Revision: [D23638499](https://our.internmc.facebook.com/intern/diff/D23638499/) **NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D23638499/)! [ghstack-poisoned]
…tion execution over RPC." server * **#44664 [RPC profiling] Extend RPC profiling to support async function execution over RPC.** * #44655 [RPC profiling] Don't wrap toHere() calls with profiling * #44653 [RPC profiling] Allow disableProfiler() to be called from another thread. * #44646 Remove thread_local RecordFunctionGuard from profiler. Closes #39971. This PR adds support for functions decorated with `@rpc.functions.async_execution` to be profiled over RPC as builtins, jit functions, and blocking python UDFs currently can be. The reasoning for this is to provide complete feature support in terms of RPC profiling and the various types of functions users can run. To enable this, the PR below this enables calling `disableProfiler()` safely from another thread. We use that functionality to defer disabling the profiler on the server until the future corresponding to the RPC request completes (rather than only the blocking `processRPC` call as was done previously). Since when the future completes we've kicked off the async function and the future corresponding to it has completed, we are able to capture any RPCs the function would have called and the actual work done on the other node. For example, if the following async function is ran on a server over RPC: ``` def slow_add(x, y): time.sleep(1) return torch.add(x, y) @rpc.functions.async_execution def slow_async_add(to, x, y): return rpc.rpc_async(to, slow_add, args=(x, y)) ``` we expect to see the original RPC profiled, the nested RPC profiled, and the actual torch.add() work. All of these events should be recorded with the correct node id. Here is an example profiling output: ``` ------------------------------------------------------------------------------------------------------------------------- --------------- --------------- --------------- -------- ------- --------------- --------------- --------------- Name Self CPU total % Self CPU total CPU total % CPU total CPU time avg Number of Calls Node ID ------------------------------------------------------------------------------------------------------------------------- --------------- --------------- --------------- -------- ------- --------------- --------------- --------------- rpc_async#slow_async_add(worker1 -> worker2) 0.00% 0.000us 0 1.012s 1.012s 1 1 aten::empty 7.02% 11.519us 7.02% 11.519us 11.519us 1 1 rpc_async#slow_async_add(worker1 -> worker2)#remote_op: rpc_async#slow_add(worker2 -> worker3) 0.00% 0.000us 0 1.006s 1.006s 1 2 rpc_async#slow_async_add(worker1 -> worker2)#remote_op: aten::empty 7.21% 11.843us 7.21% 11.843us 11.843us 1 2 rpc_async#slow_async_add(worker1 -> worker2)#remote_op: rpc_async#slow_add(worker2 -> worker3)#remote_op: aten::add 71.94% 118.107us 85.77% 140.802us 140.802us 1 3 rpc_async#slow_async_add(worker1 -> worker2)#remote_op: rpc_async#slow_add(worker2 -> worker3)#remote_op: aten::empty 13.82% 22.695us 13.82% 22.695us 22.695us 1 3 ------------------------------------------------------------------------------------------------------------------------- --------------- --------------- --------------- -------- ------- --------------- --------------- --------------- Self CPU time total: 164.164us ``` This PR also moves a bunch of the profiling logic to `rpc/utils.cpp` to declutter `request_callback` code. Differential Revision: [D23638387](https://our.internmc.facebook.com/intern/diff/D23638387/) [ghstack-poisoned]
… single threaded server" server** * #44664 [RPC profiling] Extend RPC profiling to support async function execution over RPC. * #44655 [RPC profiling] Don't wrap toHere() calls with profiling * #44653 [RPC profiling] Allow disableProfiler() to be called from another thread. * #44646 Remove thread_local RecordFunctionGuard from profiler. server This ensures that RPC profiling works in single-threaded server scenarios and that we won't make the assumption that we'll have multiple threads when working on this code. For example, this assumption resulted in a bug in the previous diff (which was fixed). Differential Revision: [D23691304](https://our.internmc.facebook.com/intern/diff/D23691304/) [ghstack-poisoned]
💊 CI failures summary and remediationsAs of commit 5382e89 (more details on the Dr. CI page):
XLA failureJob pytorch_xla_linux_bionic_py3_6_clang9_build is failing. Please create an issue with title prefixed by This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.Please report bugs/suggestions on the GitHub issue tracker or post in the (internal) Dr. CI Users group. This comment has been revised 14 times. |
server * #44664 [RPC profiling] Extend RPC profiling to support async function execution over RPC. * **#44655 [RPC profiling] Don't wrap toHere() calls with profiling** * #44653 [RPC profiling] Allow disableProfiler() to be called from another thread. * #44646 Remove thread_local RecordFunctionGuard from profiler. server * #44664 [RPC profiling] Extend RPC profiling to support async function execution over RPC. * **#44655 [RPC profiling] Don't wrap toHere() calls with profiling** * #44653 [RPC profiling] Allow disableProfiler() to be called from another thread. * #44646 Remove thread_local RecordFunctionGuard from profiler. Since `toHere()` does not execute operations (torch operators) over RPC and simply transfers the value to the local node, we don't need to enable the profiler remotely for this message. This causes unnecessary overhead and is not needed. Since `toHere` is a blocking call, we already profile the call on the local node using `RECORD_USER_SCOPE`, so this does not change the expected profiler results (validated by ensuring all remote profiling tests pass). Differential Revision: [D23641466](https://our.internmc.facebook.com/intern/diff/D23641466/) [ghstack-poisoned]
…another thread. " server * #44664 [RPC profiling] Extend RPC profiling to support async function execution over RPC. * #44655 [RPC profiling] Don't wrap toHere() calls with profiling * **#44653 [RPC profiling] Allow disableProfiler() to be called from another thread.** * #44646 Remove thread_local RecordFunctionGuard from profiler. server * #44664 [RPC profiling] Extend RPC profiling to support async function execution over RPC. * #44655 [RPC profiling] Don't wrap toHere() calls with profiling * **#44653 [RPC profiling] Allow disableProfiler() to be called from another thread.** * #44646 Remove thread_local RecordFunctionGuard from profiler. This changes the profiler per a discussion with @ilia-cher offline that enables `disableProfiler()` event consolidation logic to be called from different threads (i.e. threads where the profiler was not explicitly enabled). This is needed to support the functionality enabled by D23638387 where we defer profiling event collection until executing an async callback that can execute on a different thread, to support RPC async function profiling. This is done by introducing 2 flags `cleanupTLSState` and `consolidate` which controls whether we should clean up thread local settings (we don't do this when calling `disableProfiler()` on non-main threads) and whether we should consolidate all profiled events. Backwards compatiblity is ensured since both options are true by default. Added a test in `test_misc.cpp` to test this. Differential Revision: [D23638499](https://our.internmc.facebook.com/intern/diff/D23638499/) **NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D23638499/)! [ghstack-poisoned]
…tion execution over RPC." server * **#44664 [RPC profiling] Extend RPC profiling to support async function execution over RPC.** * #44655 [RPC profiling] Don't wrap toHere() calls with profiling * #44653 [RPC profiling] Allow disableProfiler() to be called from another thread. * #44646 Remove thread_local RecordFunctionGuard from profiler. server * **#44664 [RPC profiling] Extend RPC profiling to support async function execution over RPC.** * #44655 [RPC profiling] Don't wrap toHere() calls with profiling * #44653 [RPC profiling] Allow disableProfiler() to be called from another thread. * #44646 Remove thread_local RecordFunctionGuard from profiler. Closes #39971. This PR adds support for functions decorated with `@rpc.functions.async_execution` to be profiled over RPC as builtins, jit functions, and blocking python UDFs currently can be. The reasoning for this is to provide complete feature support in terms of RPC profiling and the various types of functions users can run. To enable this, the PR below this enables calling `disableProfiler()` safely from another thread. We use that functionality to defer disabling the profiler on the server until the future corresponding to the RPC request completes (rather than only the blocking `processRPC` call as was done previously). Since when the future completes we've kicked off the async function and the future corresponding to it has completed, we are able to capture any RPCs the function would have called and the actual work done on the other node. For example, if the following async function is ran on a server over RPC: ``` def slow_add(x, y): time.sleep(1) return torch.add(x, y) @rpc.functions.async_execution def slow_async_add(to, x, y): return rpc.rpc_async(to, slow_add, args=(x, y)) ``` we expect to see the original RPC profiled, the nested RPC profiled, and the actual torch.add() work. All of these events should be recorded with the correct node id. Here is an example profiling output: ``` ------------------------------------------------------------------------------------------------------------------------- --------------- --------------- --------------- -------- ------- --------------- --------------- --------------- Name Self CPU total % Self CPU total CPU total % CPU total CPU time avg Number of Calls Node ID ------------------------------------------------------------------------------------------------------------------------- --------------- --------------- --------------- -------- ------- --------------- --------------- --------------- rpc_async#slow_async_add(worker1 -> worker2) 0.00% 0.000us 0 1.012s 1.012s 1 1 aten::empty 7.02% 11.519us 7.02% 11.519us 11.519us 1 1 rpc_async#slow_async_add(worker1 -> worker2)#remote_op: rpc_async#slow_add(worker2 -> worker3) 0.00% 0.000us 0 1.006s 1.006s 1 2 rpc_async#slow_async_add(worker1 -> worker2)#remote_op: aten::empty 7.21% 11.843us 7.21% 11.843us 11.843us 1 2 rpc_async#slow_async_add(worker1 -> worker2)#remote_op: rpc_async#slow_add(worker2 -> worker3)#remote_op: aten::add 71.94% 118.107us 85.77% 140.802us 140.802us 1 3 rpc_async#slow_async_add(worker1 -> worker2)#remote_op: rpc_async#slow_add(worker2 -> worker3)#remote_op: aten::empty 13.82% 22.695us 13.82% 22.695us 22.695us 1 3 ------------------------------------------------------------------------------------------------------------------------- --------------- --------------- --------------- -------- ------- --------------- --------------- --------------- Self CPU time total: 164.164us ``` This PR also moves a bunch of the profiling logic to `rpc/utils.cpp` to declutter `request_callback` code. Differential Revision: [D23638387](https://our.internmc.facebook.com/intern/diff/D23638387/) [ghstack-poisoned]
… single threaded server" server** * #44664 [RPC profiling] Extend RPC profiling to support async function execution over RPC. * #44655 [RPC profiling] Don't wrap toHere() calls with profiling * #44653 [RPC profiling] Allow disableProfiler() to be called from another thread. * #44646 Remove thread_local RecordFunctionGuard from profiler. server** * #44664 [RPC profiling] Extend RPC profiling to support async function execution over RPC. * #44655 [RPC profiling] Don't wrap toHere() calls with profiling * #44653 [RPC profiling] Allow disableProfiler() to be called from another thread. * #44646 Remove thread_local RecordFunctionGuard from profiler. server This ensures that RPC profiling works in single-threaded server scenarios and that we won't make the assumption that we'll have multiple threads when working on this code. For example, this assumption resulted in a bug in the previous diff (which was fixed). Differential Revision: [D23691304](https://our.internmc.facebook.com/intern/diff/D23691304/) [ghstack-poisoned]
…filing" server * #44664 [RPC profiling] Extend RPC profiling to support async function execution over RPC. * #44655 [RPC profiling] Don't wrap toHere() calls with profiling * #44653 [RPC profiling] Allow disableProfiler() to be called from another thread. * #44646 Remove thread_local RecordFunctionGuard from profiler. When enabling profiler on server, if it is a different machine it may not have CUDA while caller does. In this case, we would crash but now we fallback to CPU and log a warning. For testing, I forced it to return CUDA profiler state, and validated that it falls back. Not sure how to add a unittest given that we have single machine tests and the machine either has or doesn't have cuda. Differential Revision: [D23790729](https://our.internmc.facebook.com/intern/diff/D23790729/) [ghstack-poisoned]
server * #44664 [RPC profiling] Extend RPC profiling to support async function execution over RPC. * **#44655 [RPC profiling] Don't wrap toHere() calls with profiling** * #44653 [RPC profiling] Allow disableProfiler() to be called from another thread. * #44646 Remove thread_local RecordFunctionGuard from profiler. server * #44664 [RPC profiling] Extend RPC profiling to support async function execution over RPC. * **#44655 [RPC profiling] Don't wrap toHere() calls with profiling** * #44653 [RPC profiling] Allow disableProfiler() to be called from another thread. * #44646 Remove thread_local RecordFunctionGuard from profiler. server * #44664 [RPC profiling] Extend RPC profiling to support async function execution over RPC. * **#44655 [RPC profiling] Don't wrap toHere() calls with profiling** * #44653 [RPC profiling] Allow disableProfiler() to be called from another thread. * #44646 Remove thread_local RecordFunctionGuard from profiler. server * #44664 [RPC profiling] Extend RPC profiling to support async function execution over RPC. * **#44655 [RPC profiling] Don't wrap toHere() calls with profiling** * #44653 [RPC profiling] Allow disableProfiler() to be called from another thread. * #44646 Remove thread_local RecordFunctionGuard from profiler. Since `toHere()` does not execute operations (torch operators) over RPC and simply transfers the value to the local node, we don't need to enable the profiler remotely for this message. This causes unnecessary overhead and is not needed. Since `toHere` is a blocking call, we already profile the call on the local node using `RECORD_USER_SCOPE`, so this does not change the expected profiler results (validated by ensuring all remote profiling tests pass). Differential Revision: [D23641466](https://our.internmc.facebook.com/intern/diff/D23641466/) [ghstack-poisoned]
server * #44664 [RPC profiling] Extend RPC profiling to support async function execution over RPC. * #44655 [RPC profiling] Don't wrap toHere() calls with profiling * #44653 [RPC profiling] Allow disableProfiler() to be called from another thread. * **#44646 Remove thread_local RecordFunctionGuard from profiler.** server * #44664 [RPC profiling] Extend RPC profiling to support async function execution over RPC. * #44655 [RPC profiling] Don't wrap toHere() calls with profiling * #44653 [RPC profiling] Allow disableProfiler() to be called from another thread. * **#44646 Remove thread_local RecordFunctionGuard from profiler.** server * #44664 [RPC profiling] Extend RPC profiling to support async function execution over RPC. * #44655 [RPC profiling] Don't wrap toHere() calls with profiling * #44653 [RPC profiling] Allow disableProfiler() to be called from another thread. * **#44646 Remove thread_local RecordFunctionGuard from profiler.** server * #44664 [RPC profiling] Extend RPC profiling to support async function execution over RPC. * #44655 [RPC profiling] Don't wrap toHere() calls with profiling * #44653 [RPC profiling] Allow disableProfiler() to be called from another thread. * **#44646 Remove thread_local RecordFunctionGuard from profiler.** Per a discussion with @ilia-cher, this is not needed anymore and removing it would make some future changes to support async RPC profiling easier. Tested by ensuring profiling tests in `test_autograd.py` still pass. Differential Revision: [D23683998](https://our.internmc.facebook.com/intern/diff/D23683998/) [ghstack-poisoned]
…another thread. " server * #44664 [RPC profiling] Extend RPC profiling to support async function execution over RPC. * #44655 [RPC profiling] Don't wrap toHere() calls with profiling * **#44653 [RPC profiling] Allow disableProfiler() to be called from another thread.** * #44646 Remove thread_local RecordFunctionGuard from profiler. server * #44664 [RPC profiling] Extend RPC profiling to support async function execution over RPC. * #44655 [RPC profiling] Don't wrap toHere() calls with profiling * **#44653 [RPC profiling] Allow disableProfiler() to be called from another thread.** * #44646 Remove thread_local RecordFunctionGuard from profiler. server * #44664 [RPC profiling] Extend RPC profiling to support async function execution over RPC. * #44655 [RPC profiling] Don't wrap toHere() calls with profiling * **#44653 [RPC profiling] Allow disableProfiler() to be called from another thread.** * #44646 Remove thread_local RecordFunctionGuard from profiler. server * #44664 [RPC profiling] Extend RPC profiling to support async function execution over RPC. * #44655 [RPC profiling] Don't wrap toHere() calls with profiling * **#44653 [RPC profiling] Allow disableProfiler() to be called from another thread.** * #44646 Remove thread_local RecordFunctionGuard from profiler. This changes the profiler per a discussion with @ilia-cher offline that enables `disableProfiler()` event consolidation logic to be called from different threads (i.e. threads where the profiler was not explicitly enabled). This is needed to support the functionality enabled by D23638387 where we defer profiling event collection until executing an async callback that can execute on a different thread, to support RPC async function profiling. This is done by introducing 2 flags `cleanupTLSState` and `consolidate` which controls whether we should clean up thread local settings (we don't do this when calling `disableProfiler()` on non-main threads) and whether we should consolidate all profiled events. Backwards compatiblity is ensured since both options are true by default. Added a test in `test_misc.cpp` to test this. Differential Revision: [D23638499](https://our.internmc.facebook.com/intern/diff/D23638499/) **NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D23638499/)! [ghstack-poisoned]
…tion execution over RPC." server * **#44664 [RPC profiling] Extend RPC profiling to support async function execution over RPC.** * #44655 [RPC profiling] Don't wrap toHere() calls with profiling * #44653 [RPC profiling] Allow disableProfiler() to be called from another thread. * #44646 Remove thread_local RecordFunctionGuard from profiler. server * **#44664 [RPC profiling] Extend RPC profiling to support async function execution over RPC.** * #44655 [RPC profiling] Don't wrap toHere() calls with profiling * #44653 [RPC profiling] Allow disableProfiler() to be called from another thread. * #44646 Remove thread_local RecordFunctionGuard from profiler. Closes #39971. This PR adds support for functions decorated with `@rpc.functions.async_execution` to be profiled over RPC as builtins, jit functions, and blocking python UDFs currently can be. The reasoning for this is to provide complete feature support in terms of RPC profiling and the various types of functions users can run. To enable this, the PR below this enables calling `disableProfiler()` safely from another thread. We use that functionality to defer disabling the profiler on the server until the future corresponding to the RPC request completes (rather than only the blocking `processRPC` call as was done previously). Since when the future completes we've kicked off the async function and the future corresponding to it has completed, we are able to capture any RPCs the function would have called and the actual work done on the other node. For example, if the following async function is ran on a server over RPC: ``` def slow_add(x, y): time.sleep(1) return torch.add(x, y) @rpc.functions.async_execution def slow_async_add(to, x, y): return rpc.rpc_async(to, slow_add, args=(x, y)) ``` we expect to see the original RPC profiled, the nested RPC profiled, and the actual torch.add() work. All of these events should be recorded with the correct node id. Here is an example profiling output: ``` ------------------------------------------------------------------------------------------------------------------------- --------------- --------------- --------------- -------- ------- --------------- --------------- --------------- Name Self CPU total % Self CPU total CPU total % CPU total CPU time avg Number of Calls Node ID ------------------------------------------------------------------------------------------------------------------------- --------------- --------------- --------------- -------- ------- --------------- --------------- --------------- rpc_async#slow_async_add(worker1 -> worker2) 0.00% 0.000us 0 1.012s 1.012s 1 1 aten::empty 7.02% 11.519us 7.02% 11.519us 11.519us 1 1 rpc_async#slow_async_add(worker1 -> worker2)#remote_op: rpc_async#slow_add(worker2 -> worker3) 0.00% 0.000us 0 1.006s 1.006s 1 2 rpc_async#slow_async_add(worker1 -> worker2)#remote_op: aten::empty 7.21% 11.843us 7.21% 11.843us 11.843us 1 2 rpc_async#slow_async_add(worker1 -> worker2)#remote_op: rpc_async#slow_add(worker2 -> worker3)#remote_op: aten::add 71.94% 118.107us 85.77% 140.802us 140.802us 1 3 rpc_async#slow_async_add(worker1 -> worker2)#remote_op: rpc_async#slow_add(worker2 -> worker3)#remote_op: aten::empty 13.82% 22.695us 13.82% 22.695us 22.695us 1 3 ------------------------------------------------------------------------------------------------------------------------- --------------- --------------- --------------- -------- ------- --------------- --------------- --------------- Self CPU time total: 164.164us ``` This PR also moves a bunch of the profiling logic to `rpc/utils.cpp` to declutter `request_callback` code. Differential Revision: [D23638387](https://our.internmc.facebook.com/intern/diff/D23638387/) [ghstack-poisoned]
…tion execution over RPC." server * **#44664 [RPC profiling] Extend RPC profiling to support async function execution over RPC.** * #44655 [RPC profiling] Don't wrap toHere() calls with profiling * #44653 [RPC profiling] Allow disableProfiler() to be called from another thread. * #44646 Remove thread_local RecordFunctionGuard from profiler. server * **#44664 [RPC profiling] Extend RPC profiling to support async function execution over RPC.** * #44655 [RPC profiling] Don't wrap toHere() calls with profiling * #44653 [RPC profiling] Allow disableProfiler() to be called from another thread. * #44646 Remove thread_local RecordFunctionGuard from profiler. server * **#44664 [RPC profiling] Extend RPC profiling to support async function execution over RPC.** * #44655 [RPC profiling] Don't wrap toHere() calls with profiling * #44653 [RPC profiling] Allow disableProfiler() to be called from another thread. * #44646 Remove thread_local RecordFunctionGuard from profiler. server * **#44664 [RPC profiling] Extend RPC profiling to support async function execution over RPC.** * #44655 [RPC profiling] Don't wrap toHere() calls with profiling * #44653 [RPC profiling] Allow disableProfiler() to be called from another thread. * #44646 Remove thread_local RecordFunctionGuard from profiler. server * **#44664 [RPC profiling] Extend RPC profiling to support async function execution over RPC.** * #44655 [RPC profiling] Don't wrap toHere() calls with profiling * #44653 [RPC profiling] Allow disableProfiler() to be called from another thread. * #44646 Remove thread_local RecordFunctionGuard from profiler. Closes #39971. This PR adds support for functions decorated with `@rpc.functions.async_execution` to be profiled over RPC as builtins, jit functions, and blocking python UDFs currently can be. The reasoning for this is to provide complete feature support in terms of RPC profiling and the various types of functions users can run. To enable this, the PR below this enables calling `disableProfiler()` safely from another thread. We use that functionality to defer disabling the profiler on the server until the future corresponding to the RPC request completes (rather than only the blocking `processRPC` call as was done previously). Since when the future completes we've kicked off the async function and the future corresponding to it has completed, we are able to capture any RPCs the function would have called and the actual work done on the other node. For example, if the following async function is ran on a server over RPC: ``` def slow_add(x, y): time.sleep(1) return torch.add(x, y) @rpc.functions.async_execution def slow_async_add(to, x, y): return rpc.rpc_async(to, slow_add, args=(x, y)) ``` we expect to see the original RPC profiled, the nested RPC profiled, and the actual torch.add() work. All of these events should be recorded with the correct node id. Here is an example profiling output: ``` ------------------------------------------------------------------------------------------------------------------------- --------------- --------------- --------------- -------- ------- --------------- --------------- --------------- Name Self CPU total % Self CPU total CPU total % CPU total CPU time avg Number of Calls Node ID ------------------------------------------------------------------------------------------------------------------------- --------------- --------------- --------------- -------- ------- --------------- --------------- --------------- rpc_async#slow_async_add(worker1 -> worker2) 0.00% 0.000us 0 1.012s 1.012s 1 1 aten::empty 7.02% 11.519us 7.02% 11.519us 11.519us 1 1 rpc_async#slow_async_add(worker1 -> worker2)#remote_op: rpc_async#slow_add(worker2 -> worker3) 0.00% 0.000us 0 1.006s 1.006s 1 2 rpc_async#slow_async_add(worker1 -> worker2)#remote_op: aten::empty 7.21% 11.843us 7.21% 11.843us 11.843us 1 2 rpc_async#slow_async_add(worker1 -> worker2)#remote_op: rpc_async#slow_add(worker2 -> worker3)#remote_op: aten::add 71.94% 118.107us 85.77% 140.802us 140.802us 1 3 rpc_async#slow_async_add(worker1 -> worker2)#remote_op: rpc_async#slow_add(worker2 -> worker3)#remote_op: aten::empty 13.82% 22.695us 13.82% 22.695us 22.695us 1 3 ------------------------------------------------------------------------------------------------------------------------- --------------- --------------- --------------- -------- ------- --------------- --------------- --------------- Self CPU time total: 164.164us ``` This PR also moves a bunch of the profiling logic to `rpc/utils.cpp` to declutter `request_callback` code. Differential Revision: [D23638387](https://our.internmc.facebook.com/intern/diff/D23638387/) [ghstack-poisoned]
… single threaded server" server** * #44664 [RPC profiling] Extend RPC profiling to support async function execution over RPC. * #44655 [RPC profiling] Don't wrap toHere() calls with profiling * #44653 [RPC profiling] Allow disableProfiler() to be called from another thread. * #44646 Remove thread_local RecordFunctionGuard from profiler. server** * #44664 [RPC profiling] Extend RPC profiling to support async function execution over RPC. * #44655 [RPC profiling] Don't wrap toHere() calls with profiling * #44653 [RPC profiling] Allow disableProfiler() to be called from another thread. * #44646 Remove thread_local RecordFunctionGuard from profiler. server** * #44664 [RPC profiling] Extend RPC profiling to support async function execution over RPC. * #44655 [RPC profiling] Don't wrap toHere() calls with profiling * #44653 [RPC profiling] Allow disableProfiler() to be called from another thread. * #44646 Remove thread_local RecordFunctionGuard from profiler. server** * #44664 [RPC profiling] Extend RPC profiling to support async function execution over RPC. * #44655 [RPC profiling] Don't wrap toHere() calls with profiling * #44653 [RPC profiling] Allow disableProfiler() to be called from another thread. * #44646 Remove thread_local RecordFunctionGuard from profiler. server** * #44664 [RPC profiling] Extend RPC profiling to support async function execution over RPC. * #44655 [RPC profiling] Don't wrap toHere() calls with profiling * #44653 [RPC profiling] Allow disableProfiler() to be called from another thread. * #44646 Remove thread_local RecordFunctionGuard from profiler. server** * #44664 [RPC profiling] Extend RPC profiling to support async function execution over RPC. * #44655 [RPC profiling] Don't wrap toHere() calls with profiling * #44653 [RPC profiling] Allow disableProfiler() to be called from another thread. * #44646 Remove thread_local RecordFunctionGuard from profiler. server** * #44664 [RPC profiling] Extend RPC profiling to support async function execution over RPC. * #44655 [RPC profiling] Don't wrap toHere() calls with profiling * #44653 [RPC profiling] Allow disableProfiler() to be called from another thread. * #44646 Remove thread_local RecordFunctionGuard from profiler. server This ensures that RPC profiling works in single-threaded server scenarios and that we won't make the assumption that we'll have multiple threads when working on this code. For example, this assumption resulted in a bug in the previous diff (which was fixed). Differential Revision: [D23691304](https://our.internmc.facebook.com/intern/diff/D23691304/) [ghstack-poisoned]
…filing" server * #44664 [RPC profiling] Extend RPC profiling to support async function execution over RPC. * #44655 [RPC profiling] Don't wrap toHere() calls with profiling * #44653 [RPC profiling] Allow disableProfiler() to be called from another thread. * #44646 Remove thread_local RecordFunctionGuard from profiler. server * #44664 [RPC profiling] Extend RPC profiling to support async function execution over RPC. * #44655 [RPC profiling] Don't wrap toHere() calls with profiling * #44653 [RPC profiling] Allow disableProfiler() to be called from another thread. * #44646 Remove thread_local RecordFunctionGuard from profiler. server * #44664 [RPC profiling] Extend RPC profiling to support async function execution over RPC. * #44655 [RPC profiling] Don't wrap toHere() calls with profiling * #44653 [RPC profiling] Allow disableProfiler() to be called from another thread. * #44646 Remove thread_local RecordFunctionGuard from profiler. server * #44664 [RPC profiling] Extend RPC profiling to support async function execution over RPC. * #44655 [RPC profiling] Don't wrap toHere() calls with profiling * #44653 [RPC profiling] Allow disableProfiler() to be called from another thread. * #44646 Remove thread_local RecordFunctionGuard from profiler. A comment from @mrshenli on #44664 led us to the following concern: when enabling profiler on server, if it is a different machine it may not have CUDA while caller does. In this case, we would crash but now we fallback to CPU and log a warning. For testing, I forced it to return CUDA profiler state, and validated that it falls back. Not sure how to add a unittest given that we have single machine tests and the machine either has or doesn't have cuda. Differential Revision: [D23790729](https://our.internmc.facebook.com/intern/diff/D23790729/) [ghstack-poisoned]
|
This pull request has been merged in d4a634c. |
1 similar comment
|
This pull request has been merged in d4a634c. |
…filing" server * #44664 [RPC profiling] Extend RPC profiling to support async function execution over RPC. * #44655 [RPC profiling] Don't wrap toHere() calls with profiling * #44653 [RPC profiling] Allow disableProfiler() to be called from another thread. * #44646 Remove thread_local RecordFunctionGuard from profiler. server * #44664 [RPC profiling] Extend RPC profiling to support async function execution over RPC. * #44655 [RPC profiling] Don't wrap toHere() calls with profiling * #44653 [RPC profiling] Allow disableProfiler() to be called from another thread. * #44646 Remove thread_local RecordFunctionGuard from profiler. server * #44664 [RPC profiling] Extend RPC profiling to support async function execution over RPC. * #44655 [RPC profiling] Don't wrap toHere() calls with profiling * #44653 [RPC profiling] Allow disableProfiler() to be called from another thread. * #44646 Remove thread_local RecordFunctionGuard from profiler. server * #44664 [RPC profiling] Extend RPC profiling to support async function execution over RPC. * #44655 [RPC profiling] Don't wrap toHere() calls with profiling * #44653 [RPC profiling] Allow disableProfiler() to be called from another thread. * #44646 Remove thread_local RecordFunctionGuard from profiler. server * #44664 [RPC profiling] Extend RPC profiling to support async function execution over RPC. * #44655 [RPC profiling] Don't wrap toHere() calls with profiling * #44653 [RPC profiling] Allow disableProfiler() to be called from another thread. * #44646 Remove thread_local RecordFunctionGuard from profiler. A comment from @mrshenli on #44664 led us to the following concern: when enabling profiler on server, if it is a different machine it may not have CUDA while caller does. In this case, we would crash but now we fallback to CPU and log a warning. For testing, I forced it to return CUDA profiler state, and validated that it falls back. Not sure how to add a unittest given that we have single machine tests and the machine either has or doesn't have cuda. Differential Revision: [D23790729](https://our.internmc.facebook.com/intern/diff/D23790729/) [ghstack-poisoned]
…filing" server * #44664 [RPC profiling] Extend RPC profiling to support async function execution over RPC. * #44655 [RPC profiling] Don't wrap toHere() calls with profiling * #44653 [RPC profiling] Allow disableProfiler() to be called from another thread. * #44646 Remove thread_local RecordFunctionGuard from profiler. server * #44664 [RPC profiling] Extend RPC profiling to support async function execution over RPC. * #44655 [RPC profiling] Don't wrap toHere() calls with profiling * #44653 [RPC profiling] Allow disableProfiler() to be called from another thread. * #44646 Remove thread_local RecordFunctionGuard from profiler. server * #44664 [RPC profiling] Extend RPC profiling to support async function execution over RPC. * #44655 [RPC profiling] Don't wrap toHere() calls with profiling * #44653 [RPC profiling] Allow disableProfiler() to be called from another thread. * #44646 Remove thread_local RecordFunctionGuard from profiler. server * #44664 [RPC profiling] Extend RPC profiling to support async function execution over RPC. * #44655 [RPC profiling] Don't wrap toHere() calls with profiling * #44653 [RPC profiling] Allow disableProfiler() to be called from another thread. * #44646 Remove thread_local RecordFunctionGuard from profiler. server * #44664 [RPC profiling] Extend RPC profiling to support async function execution over RPC. * #44655 [RPC profiling] Don't wrap toHere() calls with profiling * #44653 [RPC profiling] Allow disableProfiler() to be called from another thread. * #44646 Remove thread_local RecordFunctionGuard from profiler. server * #44664 [RPC profiling] Extend RPC profiling to support async function execution over RPC. * #44655 [RPC profiling] Don't wrap toHere() calls with profiling * #44653 [RPC profiling] Allow disableProfiler() to be called from another thread. * #44646 Remove thread_local RecordFunctionGuard from profiler. server * #44664 [RPC profiling] Extend RPC profiling to support async function execution over RPC. * #44655 [RPC profiling] Don't wrap toHere() calls with profiling * #44653 [RPC profiling] Allow disableProfiler() to be called from another thread. * #44646 Remove thread_local RecordFunctionGuard from profiler. A comment from @mrshenli on #44664 led us to the following concern: when enabling profiler on server, if it is a different machine it may not have CUDA while caller does. In this case, we would crash but now we fallback to CPU and log a warning. For testing, I forced it to return CUDA profiler state, and validated that it falls back. Not sure how to add a unittest given that we have single machine tests and the machine either has or doesn't have cuda. Differential Revision: [D23790729](https://our.internmc.facebook.com/intern/diff/D23790729/) [ghstack-poisoned]
…pport async function execution over RPC." server * **#44664 [RPC profiling] Extend RPC profiling to support async function execution over RPC.** * #44655 [RPC profiling] Don't wrap toHere() calls with profiling * #44653 [RPC profiling] Allow disableProfiler() to be called from another thread. * #44646 Remove thread_local RecordFunctionGuard from profiler. server * **#44664 [RPC profiling] Extend RPC profiling to support async function execution over RPC.** * #44655 [RPC profiling] Don't wrap toHere() calls with profiling * #44653 [RPC profiling] Allow disableProfiler() to be called from another thread. * #44646 Remove thread_local RecordFunctionGuard from profiler. server * **#44664 [RPC profiling] Extend RPC profiling to support async function execution over RPC.** * #44655 [RPC profiling] Don't wrap toHere() calls with profiling * #44653 [RPC profiling] Allow disableProfiler() to be called from another thread. * #44646 Remove thread_local RecordFunctionGuard from profiler. server * **#44664 [RPC profiling] Extend RPC profiling to support async function execution over RPC.** * #44655 [RPC profiling] Don't wrap toHere() calls with profiling * #44653 [RPC profiling] Allow disableProfiler() to be called from another thread. * #44646 Remove thread_local RecordFunctionGuard from profiler. server * **#44664 [RPC profiling] Extend RPC profiling to support async function execution over RPC.** * #44655 [RPC profiling] Don't wrap toHere() calls with profiling * #44653 [RPC profiling] Allow disableProfiler() to be called from another thread. * #44646 Remove thread_local RecordFunctionGuard from profiler. server * **#44664 [RPC profiling] Extend RPC profiling to support async function execution over RPC.** * #44655 [RPC profiling] Don't wrap toHere() calls with profiling * #44653 [RPC profiling] Allow disableProfiler() to be called from another thread. * #44646 Remove thread_local RecordFunctionGuard from profiler. server * **#44664 [RPC profiling] Extend RPC profiling to support async function execution over RPC.** * #44655 [RPC profiling] Don't wrap toHere() calls with profiling * #44653 [RPC profiling] Allow disableProfiler() to be called from another thread. * #44646 Remove thread_local RecordFunctionGuard from profiler. server * **#44664 [RPC profiling] Extend RPC profiling to support async function execution over RPC.** * #44655 [RPC profiling] Don't wrap toHere() calls with profiling * #44653 [RPC profiling] Allow disableProfiler() to be called from another thread. * #44646 Remove thread_local RecordFunctionGuard from profiler. server * **#44664 [RPC profiling] Extend RPC profiling to support async function execution over RPC.** * #44655 [RPC profiling] Don't wrap toHere() calls with profiling * #44653 [RPC profiling] Allow disableProfiler() to be called from another thread. * #44646 Remove thread_local RecordFunctionGuard from profiler. Closes #39971. This PR adds support for functions decorated with `@rpc.functions.async_execution` to be profiled over RPC as builtins, jit functions, and blocking python UDFs currently can be. The reasoning for this is to provide complete feature support in terms of RPC profiling and the various types of functions users can run. To enable this, the PR below this enables calling `disableProfiler()` safely from another thread. We use that functionality to defer disabling the profiler on the server until the future corresponding to the RPC request completes (rather than only the blocking `processRPC` call as was done previously). Since when the future completes we've kicked off the async function and the future corresponding to it has completed, we are able to capture any RPCs the function would have called and the actual work done on the other node. For example, if the following async function is ran on a server over RPC: ``` def slow_add(x, y): time.sleep(1) return torch.add(x, y) @rpc.functions.async_execution def slow_async_add(to, x, y): return rpc.rpc_async(to, slow_add, args=(x, y)) ``` we expect to see the original RPC profiled, the nested RPC profiled, and the actual torch.add() work. All of these events should be recorded with the correct node id. Here is an example profiling output: ``` ------------------------------------------------------------------------------------------------------------------------- --------------- --------------- --------------- -------- ------- --------------- --------------- --------------- Name Self CPU total % Self CPU total CPU total % CPU total CPU time avg Number of Calls Node ID ------------------------------------------------------------------------------------------------------------------------- --------------- --------------- --------------- -------- ------- --------------- --------------- --------------- rpc_async#slow_async_add(worker1 -> worker2) 0.00% 0.000us 0 1.012s 1.012s 1 1 aten::empty 7.02% 11.519us 7.02% 11.519us 11.519us 1 1 rpc_async#slow_async_add(worker1 -> worker2)#remote_op: rpc_async#slow_add(worker2 -> worker3) 0.00% 0.000us 0 1.006s 1.006s 1 2 rpc_async#slow_async_add(worker1 -> worker2)#remote_op: aten::empty 7.21% 11.843us 7.21% 11.843us 11.843us 1 2 rpc_async#slow_async_add(worker1 -> worker2)#remote_op: rpc_async#slow_add(worker2 -> worker3)#remote_op: aten::add 71.94% 118.107us 85.77% 140.802us 140.802us 1 3 rpc_async#slow_async_add(worker1 -> worker2)#remote_op: rpc_async#slow_add(worker2 -> worker3)#remote_op: aten::empty 13.82% 22.695us 13.82% 22.695us 22.695us 1 3 ------------------------------------------------------------------------------------------------------------------------- --------------- --------------- --------------- -------- ------- --------------- --------------- --------------- Self CPU time total: 164.164us ``` This PR also moves a bunch of the profiling logic to `rpc/utils.cpp` to declutter `request_callback` code. Differential Revision: [D23638387](https://our.internmc.facebook.com/intern/diff/D23638387/) [ghstack-poisoned]
…tion execution over RPC." server * **#44664 [RPC profiling] Extend RPC profiling to support async function execution over RPC.** * #44655 [RPC profiling] Don't wrap toHere() calls with profiling * #44653 [RPC profiling] Allow disableProfiler() to be called from another thread. * #44646 Remove thread_local RecordFunctionGuard from profiler. server * **#44664 [RPC profiling] Extend RPC profiling to support async function execution over RPC.** * #44655 [RPC profiling] Don't wrap toHere() calls with profiling * #44653 [RPC profiling] Allow disableProfiler() to be called from another thread. * #44646 Remove thread_local RecordFunctionGuard from profiler. server * **#44664 [RPC profiling] Extend RPC profiling to support async function execution over RPC.** * #44655 [RPC profiling] Don't wrap toHere() calls with profiling * #44653 [RPC profiling] Allow disableProfiler() to be called from another thread. * #44646 Remove thread_local RecordFunctionGuard from profiler. server * **#44664 [RPC profiling] Extend RPC profiling to support async function execution over RPC.** * #44655 [RPC profiling] Don't wrap toHere() calls with profiling * #44653 [RPC profiling] Allow disableProfiler() to be called from another thread. * #44646 Remove thread_local RecordFunctionGuard from profiler. server * **#44664 [RPC profiling] Extend RPC profiling to support async function execution over RPC.** * #44655 [RPC profiling] Don't wrap toHere() calls with profiling * #44653 [RPC profiling] Allow disableProfiler() to be called from another thread. * #44646 Remove thread_local RecordFunctionGuard from profiler. server * **#44664 [RPC profiling] Extend RPC profiling to support async function execution over RPC.** * #44655 [RPC profiling] Don't wrap toHere() calls with profiling * #44653 [RPC profiling] Allow disableProfiler() to be called from another thread. * #44646 Remove thread_local RecordFunctionGuard from profiler. server * **#44664 [RPC profiling] Extend RPC profiling to support async function execution over RPC.** * #44655 [RPC profiling] Don't wrap toHere() calls with profiling * #44653 [RPC profiling] Allow disableProfiler() to be called from another thread. * #44646 Remove thread_local RecordFunctionGuard from profiler. server * **#44664 [RPC profiling] Extend RPC profiling to support async function execution over RPC.** * #44655 [RPC profiling] Don't wrap toHere() calls with profiling * #44653 [RPC profiling] Allow disableProfiler() to be called from another thread. * #44646 Remove thread_local RecordFunctionGuard from profiler. server * **#44664 [RPC profiling] Extend RPC profiling to support async function execution over RPC.** * #44655 [RPC profiling] Don't wrap toHere() calls with profiling * #44653 [RPC profiling] Allow disableProfiler() to be called from another thread. * #44646 Remove thread_local RecordFunctionGuard from profiler. Closes #39971. This PR adds support for functions decorated with `@rpc.functions.async_execution` to be profiled over RPC as builtins, jit functions, and blocking python UDFs currently can be. The reasoning for this is to provide complete feature support in terms of RPC profiling and the various types of functions users can run. To enable this, the PR below this enables calling `disableProfiler()` safely from another thread. We use that functionality to defer disabling the profiler on the server until the future corresponding to the RPC request completes (rather than only the blocking `processRPC` call as was done previously). Since when the future completes we've kicked off the async function and the future corresponding to it has completed, we are able to capture any RPCs the function would have called and the actual work done on the other node. For example, if the following async function is ran on a server over RPC: ``` def slow_add(x, y): time.sleep(1) return torch.add(x, y) @rpc.functions.async_execution def slow_async_add(to, x, y): return rpc.rpc_async(to, slow_add, args=(x, y)) ``` we expect to see the original RPC profiled, the nested RPC profiled, and the actual torch.add() work. All of these events should be recorded with the correct node id. Here is an example profiling output: ``` ------------------------------------------------------------------------------------------------------------------------- --------------- --------------- --------------- -------- ------- --------------- --------------- --------------- Name Self CPU total % Self CPU total CPU total % CPU total CPU time avg Number of Calls Node ID ------------------------------------------------------------------------------------------------------------------------- --------------- --------------- --------------- -------- ------- --------------- --------------- --------------- rpc_async#slow_async_add(worker1 -> worker2) 0.00% 0.000us 0 1.012s 1.012s 1 1 aten::empty 7.02% 11.519us 7.02% 11.519us 11.519us 1 1 rpc_async#slow_async_add(worker1 -> worker2)#remote_op: rpc_async#slow_add(worker2 -> worker3) 0.00% 0.000us 0 1.006s 1.006s 1 2 rpc_async#slow_async_add(worker1 -> worker2)#remote_op: aten::empty 7.21% 11.843us 7.21% 11.843us 11.843us 1 2 rpc_async#slow_async_add(worker1 -> worker2)#remote_op: rpc_async#slow_add(worker2 -> worker3)#remote_op: aten::add 71.94% 118.107us 85.77% 140.802us 140.802us 1 3 rpc_async#slow_async_add(worker1 -> worker2)#remote_op: rpc_async#slow_add(worker2 -> worker3)#remote_op: aten::empty 13.82% 22.695us 13.82% 22.695us 22.695us 1 3 ------------------------------------------------------------------------------------------------------------------------- --------------- --------------- --------------- -------- ------- --------------- --------------- --------------- Self CPU time total: 164.164us ``` This PR also moves a bunch of the profiling logic to `rpc/utils.cpp` to declutter `request_callback` code. Differential Revision: [D23638387](https://our.internmc.facebook.com/intern/diff/D23638387/) [ghstack-poisoned]
… single threaded server" server** * #44664 [RPC profiling] Extend RPC profiling to support async function execution over RPC. * #44655 [RPC profiling] Don't wrap toHere() calls with profiling * #44653 [RPC profiling] Allow disableProfiler() to be called from another thread. * #44646 Remove thread_local RecordFunctionGuard from profiler. server** * #44664 [RPC profiling] Extend RPC profiling to support async function execution over RPC. * #44655 [RPC profiling] Don't wrap toHere() calls with profiling * #44653 [RPC profiling] Allow disableProfiler() to be called from another thread. * #44646 Remove thread_local RecordFunctionGuard from profiler. server** * #44664 [RPC profiling] Extend RPC profiling to support async function execution over RPC. * #44655 [RPC profiling] Don't wrap toHere() calls with profiling * #44653 [RPC profiling] Allow disableProfiler() to be called from another thread. * #44646 Remove thread_local RecordFunctionGuard from profiler. server** * #44664 [RPC profiling] Extend RPC profiling to support async function execution over RPC. * #44655 [RPC profiling] Don't wrap toHere() calls with profiling * #44653 [RPC profiling] Allow disableProfiler() to be called from another thread. * #44646 Remove thread_local RecordFunctionGuard from profiler. server** * #44664 [RPC profiling] Extend RPC profiling to support async function execution over RPC. * #44655 [RPC profiling] Don't wrap toHere() calls with profiling * #44653 [RPC profiling] Allow disableProfiler() to be called from another thread. * #44646 Remove thread_local RecordFunctionGuard from profiler. server** * #44664 [RPC profiling] Extend RPC profiling to support async function execution over RPC. * #44655 [RPC profiling] Don't wrap toHere() calls with profiling * #44653 [RPC profiling] Allow disableProfiler() to be called from another thread. * #44646 Remove thread_local RecordFunctionGuard from profiler. server** * #44664 [RPC profiling] Extend RPC profiling to support async function execution over RPC. * #44655 [RPC profiling] Don't wrap toHere() calls with profiling * #44653 [RPC profiling] Allow disableProfiler() to be called from another thread. * #44646 Remove thread_local RecordFunctionGuard from profiler. server** * #44664 [RPC profiling] Extend RPC profiling to support async function execution over RPC. * #44655 [RPC profiling] Don't wrap toHere() calls with profiling * #44653 [RPC profiling] Allow disableProfiler() to be called from another thread. * #44646 Remove thread_local RecordFunctionGuard from profiler. server** * #44664 [RPC profiling] Extend RPC profiling to support async function execution over RPC. * #44655 [RPC profiling] Don't wrap toHere() calls with profiling * #44653 [RPC profiling] Allow disableProfiler() to be called from another thread. * #44646 Remove thread_local RecordFunctionGuard from profiler. server** * #44664 [RPC profiling] Extend RPC profiling to support async function execution over RPC. * #44655 [RPC profiling] Don't wrap toHere() calls with profiling * #44653 [RPC profiling] Allow disableProfiler() to be called from another thread. * #44646 Remove thread_local RecordFunctionGuard from profiler. server** * #44664 [RPC profiling] Extend RPC profiling to support async function execution over RPC. * #44655 [RPC profiling] Don't wrap toHere() calls with profiling * #44653 [RPC profiling] Allow disableProfiler() to be called from another thread. * #44646 Remove thread_local RecordFunctionGuard from profiler. server This ensures that RPC profiling works in single-threaded server scenarios and that we won't make the assumption that we'll have multiple threads when working on this code. For example, this assumption resulted in a bug in the previous diff (which was fixed). Differential Revision: [D23691304](https://our.internmc.facebook.com/intern/diff/D23691304/) [ghstack-poisoned]
…filing" server * #44664 [RPC profiling] Extend RPC profiling to support async function execution over RPC. * #44655 [RPC profiling] Don't wrap toHere() calls with profiling * #44653 [RPC profiling] Allow disableProfiler() to be called from another thread. * #44646 Remove thread_local RecordFunctionGuard from profiler. server * #44664 [RPC profiling] Extend RPC profiling to support async function execution over RPC. * #44655 [RPC profiling] Don't wrap toHere() calls with profiling * #44653 [RPC profiling] Allow disableProfiler() to be called from another thread. * #44646 Remove thread_local RecordFunctionGuard from profiler. server * #44664 [RPC profiling] Extend RPC profiling to support async function execution over RPC. * #44655 [RPC profiling] Don't wrap toHere() calls with profiling * #44653 [RPC profiling] Allow disableProfiler() to be called from another thread. * #44646 Remove thread_local RecordFunctionGuard from profiler. server * #44664 [RPC profiling] Extend RPC profiling to support async function execution over RPC. * #44655 [RPC profiling] Don't wrap toHere() calls with profiling * #44653 [RPC profiling] Allow disableProfiler() to be called from another thread. * #44646 Remove thread_local RecordFunctionGuard from profiler. server * #44664 [RPC profiling] Extend RPC profiling to support async function execution over RPC. * #44655 [RPC profiling] Don't wrap toHere() calls with profiling * #44653 [RPC profiling] Allow disableProfiler() to be called from another thread. * #44646 Remove thread_local RecordFunctionGuard from profiler. server * #44664 [RPC profiling] Extend RPC profiling to support async function execution over RPC. * #44655 [RPC profiling] Don't wrap toHere() calls with profiling * #44653 [RPC profiling] Allow disableProfiler() to be called from another thread. * #44646 Remove thread_local RecordFunctionGuard from profiler. server * #44664 [RPC profiling] Extend RPC profiling to support async function execution over RPC. * #44655 [RPC profiling] Don't wrap toHere() calls with profiling * #44653 [RPC profiling] Allow disableProfiler() to be called from another thread. * #44646 Remove thread_local RecordFunctionGuard from profiler. server * #44664 [RPC profiling] Extend RPC profiling to support async function execution over RPC. * #44655 [RPC profiling] Don't wrap toHere() calls with profiling * #44653 [RPC profiling] Allow disableProfiler() to be called from another thread. * #44646 Remove thread_local RecordFunctionGuard from profiler. A comment from @mrshenli on #44664 led us to the following concern: when enabling profiler on server, if it is a different machine it may not have CUDA while caller does. In this case, we would crash but now we fallback to CPU and log a warning. For testing, I forced it to return CUDA profiler state, and validated that it falls back. Not sure how to add a unittest given that we have single machine tests and the machine either has or doesn't have cuda. Differential Revision: [D23790729](https://our.internmc.facebook.com/intern/diff/D23790729/) [ghstack-poisoned]
… single threaded server" server** * #44664 [RPC profiling] Extend RPC profiling to support async function execution over RPC. server** * #44664 [RPC profiling] Extend RPC profiling to support async function execution over RPC. * #44655 [RPC profiling] Don't wrap toHere() calls with profiling * #44653 [RPC profiling] Allow disableProfiler() to be called from another thread. * #44646 Remove thread_local RecordFunctionGuard from profiler. server** * #44664 [RPC profiling] Extend RPC profiling to support async function execution over RPC. * #44655 [RPC profiling] Don't wrap toHere() calls with profiling * #44653 [RPC profiling] Allow disableProfiler() to be called from another thread. * #44646 Remove thread_local RecordFunctionGuard from profiler. server** * #44664 [RPC profiling] Extend RPC profiling to support async function execution over RPC. * #44655 [RPC profiling] Don't wrap toHere() calls with profiling * #44653 [RPC profiling] Allow disableProfiler() to be called from another thread. * #44646 Remove thread_local RecordFunctionGuard from profiler. server** * #44664 [RPC profiling] Extend RPC profiling to support async function execution over RPC. * #44655 [RPC profiling] Don't wrap toHere() calls with profiling * #44653 [RPC profiling] Allow disableProfiler() to be called from another thread. * #44646 Remove thread_local RecordFunctionGuard from profiler. server** * #44664 [RPC profiling] Extend RPC profiling to support async function execution over RPC. * #44655 [RPC profiling] Don't wrap toHere() calls with profiling * #44653 [RPC profiling] Allow disableProfiler() to be called from another thread. * #44646 Remove thread_local RecordFunctionGuard from profiler. server** * #44664 [RPC profiling] Extend RPC profiling to support async function execution over RPC. * #44655 [RPC profiling] Don't wrap toHere() calls with profiling * #44653 [RPC profiling] Allow disableProfiler() to be called from another thread. * #44646 Remove thread_local RecordFunctionGuard from profiler. server** * #44664 [RPC profiling] Extend RPC profiling to support async function execution over RPC. * #44655 [RPC profiling] Don't wrap toHere() calls with profiling * #44653 [RPC profiling] Allow disableProfiler() to be called from another thread. * #44646 Remove thread_local RecordFunctionGuard from profiler. server** * #44664 [RPC profiling] Extend RPC profiling to support async function execution over RPC. * #44655 [RPC profiling] Don't wrap toHere() calls with profiling * #44653 [RPC profiling] Allow disableProfiler() to be called from another thread. * #44646 Remove thread_local RecordFunctionGuard from profiler. server** * #44664 [RPC profiling] Extend RPC profiling to support async function execution over RPC. * #44655 [RPC profiling] Don't wrap toHere() calls with profiling * #44653 [RPC profiling] Allow disableProfiler() to be called from another thread. * #44646 Remove thread_local RecordFunctionGuard from profiler. server** * #44664 [RPC profiling] Extend RPC profiling to support async function execution over RPC. * #44655 [RPC profiling] Don't wrap toHere() calls with profiling * #44653 [RPC profiling] Allow disableProfiler() to be called from another thread. * #44646 Remove thread_local RecordFunctionGuard from profiler. server** * #44664 [RPC profiling] Extend RPC profiling to support async function execution over RPC. * #44655 [RPC profiling] Don't wrap toHere() calls with profiling * #44653 [RPC profiling] Allow disableProfiler() to be called from another thread. * #44646 Remove thread_local RecordFunctionGuard from profiler. server This ensures that RPC profiling works in single-threaded server scenarios and that we won't make the assumption that we'll have multiple threads when working on this code. For example, this assumption resulted in a bug in the previous diff (which was fixed). Differential Revision: [D23691304](https://our.internmc.facebook.com/intern/diff/D23691304/) [ghstack-poisoned]
…filing" server * #44664 [RPC profiling] Extend RPC profiling to support async function execution over RPC. server * #44664 [RPC profiling] Extend RPC profiling to support async function execution over RPC. * #44655 [RPC profiling] Don't wrap toHere() calls with profiling * #44653 [RPC profiling] Allow disableProfiler() to be called from another thread. * #44646 Remove thread_local RecordFunctionGuard from profiler. server * #44664 [RPC profiling] Extend RPC profiling to support async function execution over RPC. * #44655 [RPC profiling] Don't wrap toHere() calls with profiling * #44653 [RPC profiling] Allow disableProfiler() to be called from another thread. * #44646 Remove thread_local RecordFunctionGuard from profiler. server * #44664 [RPC profiling] Extend RPC profiling to support async function execution over RPC. * #44655 [RPC profiling] Don't wrap toHere() calls with profiling * #44653 [RPC profiling] Allow disableProfiler() to be called from another thread. * #44646 Remove thread_local RecordFunctionGuard from profiler. server * #44664 [RPC profiling] Extend RPC profiling to support async function execution over RPC. * #44655 [RPC profiling] Don't wrap toHere() calls with profiling * #44653 [RPC profiling] Allow disableProfiler() to be called from another thread. * #44646 Remove thread_local RecordFunctionGuard from profiler. server * #44664 [RPC profiling] Extend RPC profiling to support async function execution over RPC. * #44655 [RPC profiling] Don't wrap toHere() calls with profiling * #44653 [RPC profiling] Allow disableProfiler() to be called from another thread. * #44646 Remove thread_local RecordFunctionGuard from profiler. server * #44664 [RPC profiling] Extend RPC profiling to support async function execution over RPC. * #44655 [RPC profiling] Don't wrap toHere() calls with profiling * #44653 [RPC profiling] Allow disableProfiler() to be called from another thread. * #44646 Remove thread_local RecordFunctionGuard from profiler. server * #44664 [RPC profiling] Extend RPC profiling to support async function execution over RPC. * #44655 [RPC profiling] Don't wrap toHere() calls with profiling * #44653 [RPC profiling] Allow disableProfiler() to be called from another thread. * #44646 Remove thread_local RecordFunctionGuard from profiler. server * #44664 [RPC profiling] Extend RPC profiling to support async function execution over RPC. * #44655 [RPC profiling] Don't wrap toHere() calls with profiling * #44653 [RPC profiling] Allow disableProfiler() to be called from another thread. * #44646 Remove thread_local RecordFunctionGuard from profiler. A comment from @mrshenli on #44664 led us to the following concern: when enabling profiler on server, if it is a different machine it may not have CUDA while caller does. In this case, we would crash but now we fallback to CPU and log a warning. For testing, I forced it to return CUDA profiler state, and validated that it falls back. Not sure how to add a unittest given that we have single machine tests and the machine either has or doesn't have cuda. Differential Revision: [D23790729](https://our.internmc.facebook.com/intern/diff/D23790729/) [ghstack-poisoned]
…pport async function execution over RPC." server * **#44664 [RPC profiling] Extend RPC profiling to support async function execution over RPC.** server * **#44664 [RPC profiling] Extend RPC profiling to support async function execution over RPC.** server * **#44664 [RPC profiling] Extend RPC profiling to support async function execution over RPC.** * #44655 [RPC profiling] Don't wrap toHere() calls with profiling * #44653 [RPC profiling] Allow disableProfiler() to be called from another thread. * #44646 Remove thread_local RecordFunctionGuard from profiler. server * **#44664 [RPC profiling] Extend RPC profiling to support async function execution over RPC.** * #44655 [RPC profiling] Don't wrap toHere() calls with profiling * #44653 [RPC profiling] Allow disableProfiler() to be called from another thread. * #44646 Remove thread_local RecordFunctionGuard from profiler. server * **#44664 [RPC profiling] Extend RPC profiling to support async function execution over RPC.** * #44655 [RPC profiling] Don't wrap toHere() calls with profiling * #44653 [RPC profiling] Allow disableProfiler() to be called from another thread. * #44646 Remove thread_local RecordFunctionGuard from profiler. server * **#44664 [RPC profiling] Extend RPC profiling to support async function execution over RPC.** * #44655 [RPC profiling] Don't wrap toHere() calls with profiling * #44653 [RPC profiling] Allow disableProfiler() to be called from another thread. * #44646 Remove thread_local RecordFunctionGuard from profiler. server * **#44664 [RPC profiling] Extend RPC profiling to support async function execution over RPC.** * #44655 [RPC profiling] Don't wrap toHere() calls with profiling * #44653 [RPC profiling] Allow disableProfiler() to be called from another thread. * #44646 Remove thread_local RecordFunctionGuard from profiler. server * **#44664 [RPC profiling] Extend RPC profiling to support async function execution over RPC.** * #44655 [RPC profiling] Don't wrap toHere() calls with profiling * #44653 [RPC profiling] Allow disableProfiler() to be called from another thread. * #44646 Remove thread_local RecordFunctionGuard from profiler. server * **#44664 [RPC profiling] Extend RPC profiling to support async function execution over RPC.** * #44655 [RPC profiling] Don't wrap toHere() calls with profiling * #44653 [RPC profiling] Allow disableProfiler() to be called from another thread. * #44646 Remove thread_local RecordFunctionGuard from profiler. server * **#44664 [RPC profiling] Extend RPC profiling to support async function execution over RPC.** * #44655 [RPC profiling] Don't wrap toHere() calls with profiling * #44653 [RPC profiling] Allow disableProfiler() to be called from another thread. * #44646 Remove thread_local RecordFunctionGuard from profiler. server * **#44664 [RPC profiling] Extend RPC profiling to support async function execution over RPC.** * #44655 [RPC profiling] Don't wrap toHere() calls with profiling * #44653 [RPC profiling] Allow disableProfiler() to be called from another thread. * #44646 Remove thread_local RecordFunctionGuard from profiler. Closes #39971. This PR adds support for functions decorated with `@rpc.functions.async_execution` to be profiled over RPC as builtins, jit functions, and blocking python UDFs currently can be. The reasoning for this is to provide complete feature support in terms of RPC profiling and the various types of functions users can run. To enable this, the PR below this enables calling `disableProfiler()` safely from another thread. We use that functionality to defer disabling the profiler on the server until the future corresponding to the RPC request completes (rather than only the blocking `processRPC` call as was done previously). Since when the future completes we've kicked off the async function and the future corresponding to it has completed, we are able to capture any RPCs the function would have called and the actual work done on the other node. For example, if the following async function is ran on a server over RPC: ``` def slow_add(x, y): time.sleep(1) return torch.add(x, y) @rpc.functions.async_execution def slow_async_add(to, x, y): return rpc.rpc_async(to, slow_add, args=(x, y)) ``` we expect to see the original RPC profiled, the nested RPC profiled, and the actual torch.add() work. All of these events should be recorded with the correct node id. Here is an example profiling output: ``` ------------------------------------------------------------------------------------------------------------------------- --------------- --------------- --------------- -------- ------- --------------- --------------- --------------- Name Self CPU total % Self CPU total CPU total % CPU total CPU time avg Number of Calls Node ID ------------------------------------------------------------------------------------------------------------------------- --------------- --------------- --------------- -------- ------- --------------- --------------- --------------- rpc_async#slow_async_add(worker1 -> worker2) 0.00% 0.000us 0 1.012s 1.012s 1 1 aten::empty 7.02% 11.519us 7.02% 11.519us 11.519us 1 1 rpc_async#slow_async_add(worker1 -> worker2)#remote_op: rpc_async#slow_add(worker2 -> worker3) 0.00% 0.000us 0 1.006s 1.006s 1 2 rpc_async#slow_async_add(worker1 -> worker2)#remote_op: aten::empty 7.21% 11.843us 7.21% 11.843us 11.843us 1 2 rpc_async#slow_async_add(worker1 -> worker2)#remote_op: rpc_async#slow_add(worker2 -> worker3)#remote_op: aten::add 71.94% 118.107us 85.77% 140.802us 140.802us 1 3 rpc_async#slow_async_add(worker1 -> worker2)#remote_op: rpc_async#slow_add(worker2 -> worker3)#remote_op: aten::empty 13.82% 22.695us 13.82% 22.695us 22.695us 1 3 ------------------------------------------------------------------------------------------------------------------------- --------------- --------------- --------------- -------- ------- --------------- --------------- --------------- Self CPU time total: 164.164us ``` This PR also moves a bunch of the profiling logic to `rpc/utils.cpp` to declutter `request_callback` code. Differential Revision: [D23638387](https://our.internmc.facebook.com/intern/diff/D23638387/) [ghstack-poisoned]
…tion execution over RPC." server * **#44664 [RPC profiling] Extend RPC profiling to support async function execution over RPC.** server * **#44664 [RPC profiling] Extend RPC profiling to support async function execution over RPC.** server * **#44664 [RPC profiling] Extend RPC profiling to support async function execution over RPC.** * #44655 [RPC profiling] Don't wrap toHere() calls with profiling * #44653 [RPC profiling] Allow disableProfiler() to be called from another thread. * #44646 Remove thread_local RecordFunctionGuard from profiler. server * **#44664 [RPC profiling] Extend RPC profiling to support async function execution over RPC.** * #44655 [RPC profiling] Don't wrap toHere() calls with profiling * #44653 [RPC profiling] Allow disableProfiler() to be called from another thread. * #44646 Remove thread_local RecordFunctionGuard from profiler. server * **#44664 [RPC profiling] Extend RPC profiling to support async function execution over RPC.** * #44655 [RPC profiling] Don't wrap toHere() calls with profiling * #44653 [RPC profiling] Allow disableProfiler() to be called from another thread. * #44646 Remove thread_local RecordFunctionGuard from profiler. server * **#44664 [RPC profiling] Extend RPC profiling to support async function execution over RPC.** * #44655 [RPC profiling] Don't wrap toHere() calls with profiling * #44653 [RPC profiling] Allow disableProfiler() to be called from another thread. * #44646 Remove thread_local RecordFunctionGuard from profiler. server * **#44664 [RPC profiling] Extend RPC profiling to support async function execution over RPC.** * #44655 [RPC profiling] Don't wrap toHere() calls with profiling * #44653 [RPC profiling] Allow disableProfiler() to be called from another thread. * #44646 Remove thread_local RecordFunctionGuard from profiler. server * **#44664 [RPC profiling] Extend RPC profiling to support async function execution over RPC.** * #44655 [RPC profiling] Don't wrap toHere() calls with profiling * #44653 [RPC profiling] Allow disableProfiler() to be called from another thread. * #44646 Remove thread_local RecordFunctionGuard from profiler. server * **#44664 [RPC profiling] Extend RPC profiling to support async function execution over RPC.** * #44655 [RPC profiling] Don't wrap toHere() calls with profiling * #44653 [RPC profiling] Allow disableProfiler() to be called from another thread. * #44646 Remove thread_local RecordFunctionGuard from profiler. server * **#44664 [RPC profiling] Extend RPC profiling to support async function execution over RPC.** * #44655 [RPC profiling] Don't wrap toHere() calls with profiling * #44653 [RPC profiling] Allow disableProfiler() to be called from another thread. * #44646 Remove thread_local RecordFunctionGuard from profiler. server * **#44664 [RPC profiling] Extend RPC profiling to support async function execution over RPC.** * #44655 [RPC profiling] Don't wrap toHere() calls with profiling * #44653 [RPC profiling] Allow disableProfiler() to be called from another thread. * #44646 Remove thread_local RecordFunctionGuard from profiler. Closes #39971. This PR adds support for functions decorated with `@rpc.functions.async_execution` to be profiled over RPC as builtins, jit functions, and blocking python UDFs currently can be. The reasoning for this is to provide complete feature support in terms of RPC profiling and the various types of functions users can run. To enable this, the PR below this enables calling `disableProfiler()` safely from another thread. We use that functionality to defer disabling the profiler on the server until the future corresponding to the RPC request completes (rather than only the blocking `processRPC` call as was done previously). Since when the future completes we've kicked off the async function and the future corresponding to it has completed, we are able to capture any RPCs the function would have called and the actual work done on the other node. For example, if the following async function is ran on a server over RPC: ``` def slow_add(x, y): time.sleep(1) return torch.add(x, y) @rpc.functions.async_execution def slow_async_add(to, x, y): return rpc.rpc_async(to, slow_add, args=(x, y)) ``` we expect to see the original RPC profiled, the nested RPC profiled, and the actual torch.add() work. All of these events should be recorded with the correct node id. Here is an example profiling output: ``` ------------------------------------------------------------------------------------------------------------------------- --------------- --------------- --------------- -------- ------- --------------- --------------- --------------- Name Self CPU total % Self CPU total CPU total % CPU total CPU time avg Number of Calls Node ID ------------------------------------------------------------------------------------------------------------------------- --------------- --------------- --------------- -------- ------- --------------- --------------- --------------- rpc_async#slow_async_add(worker1 -> worker2) 0.00% 0.000us 0 1.012s 1.012s 1 1 aten::empty 7.02% 11.519us 7.02% 11.519us 11.519us 1 1 rpc_async#slow_async_add(worker1 -> worker2)#remote_op: rpc_async#slow_add(worker2 -> worker3) 0.00% 0.000us 0 1.006s 1.006s 1 2 rpc_async#slow_async_add(worker1 -> worker2)#remote_op: aten::empty 7.21% 11.843us 7.21% 11.843us 11.843us 1 2 rpc_async#slow_async_add(worker1 -> worker2)#remote_op: rpc_async#slow_add(worker2 -> worker3)#remote_op: aten::add 71.94% 118.107us 85.77% 140.802us 140.802us 1 3 rpc_async#slow_async_add(worker1 -> worker2)#remote_op: rpc_async#slow_add(worker2 -> worker3)#remote_op: aten::empty 13.82% 22.695us 13.82% 22.695us 22.695us 1 3 ------------------------------------------------------------------------------------------------------------------------- --------------- --------------- --------------- -------- ------- --------------- --------------- --------------- Self CPU time total: 164.164us ``` This PR also moves a bunch of the profiling logic to `rpc/utils.cpp` to declutter `request_callback` code. Differential Revision: [D23638387](https://our.internmc.facebook.com/intern/diff/D23638387/) [ghstack-poisoned]
… single threaded server" server** * #44664 [RPC profiling] Extend RPC profiling to support async function execution over RPC. server** * #44664 [RPC profiling] Extend RPC profiling to support async function execution over RPC. server** * #44664 [RPC profiling] Extend RPC profiling to support async function execution over RPC. * #44655 [RPC profiling] Don't wrap toHere() calls with profiling * #44653 [RPC profiling] Allow disableProfiler() to be called from another thread. * #44646 Remove thread_local RecordFunctionGuard from profiler. server** * #44664 [RPC profiling] Extend RPC profiling to support async function execution over RPC. * #44655 [RPC profiling] Don't wrap toHere() calls with profiling * #44653 [RPC profiling] Allow disableProfiler() to be called from another thread. * #44646 Remove thread_local RecordFunctionGuard from profiler. server** * #44664 [RPC profiling] Extend RPC profiling to support async function execution over RPC. * #44655 [RPC profiling] Don't wrap toHere() calls with profiling * #44653 [RPC profiling] Allow disableProfiler() to be called from another thread. * #44646 Remove thread_local RecordFunctionGuard from profiler. server** * #44664 [RPC profiling] Extend RPC profiling to support async function execution over RPC. * #44655 [RPC profiling] Don't wrap toHere() calls with profiling * #44653 [RPC profiling] Allow disableProfiler() to be called from another thread. * #44646 Remove thread_local RecordFunctionGuard from profiler. server** * #44664 [RPC profiling] Extend RPC profiling to support async function execution over RPC. * #44655 [RPC profiling] Don't wrap toHere() calls with profiling * #44653 [RPC profiling] Allow disableProfiler() to be called from another thread. * #44646 Remove thread_local RecordFunctionGuard from profiler. server** * #44664 [RPC profiling] Extend RPC profiling to support async function execution over RPC. * #44655 [RPC profiling] Don't wrap toHere() calls with profiling * #44653 [RPC profiling] Allow disableProfiler() to be called from another thread. * #44646 Remove thread_local RecordFunctionGuard from profiler. server** * #44664 [RPC profiling] Extend RPC profiling to support async function execution over RPC. * #44655 [RPC profiling] Don't wrap toHere() calls with profiling * #44653 [RPC profiling] Allow disableProfiler() to be called from another thread. * #44646 Remove thread_local RecordFunctionGuard from profiler. server** * #44664 [RPC profiling] Extend RPC profiling to support async function execution over RPC. * #44655 [RPC profiling] Don't wrap toHere() calls with profiling * #44653 [RPC profiling] Allow disableProfiler() to be called from another thread. * #44646 Remove thread_local RecordFunctionGuard from profiler. server** * #44664 [RPC profiling] Extend RPC profiling to support async function execution over RPC. * #44655 [RPC profiling] Don't wrap toHere() calls with profiling * #44653 [RPC profiling] Allow disableProfiler() to be called from another thread. * #44646 Remove thread_local RecordFunctionGuard from profiler. server** * #44664 [RPC profiling] Extend RPC profiling to support async function execution over RPC. * #44655 [RPC profiling] Don't wrap toHere() calls with profiling * #44653 [RPC profiling] Allow disableProfiler() to be called from another thread. * #44646 Remove thread_local RecordFunctionGuard from profiler. server** * #44664 [RPC profiling] Extend RPC profiling to support async function execution over RPC. * #44655 [RPC profiling] Don't wrap toHere() calls with profiling * #44653 [RPC profiling] Allow disableProfiler() to be called from another thread. * #44646 Remove thread_local RecordFunctionGuard from profiler. server This ensures that RPC profiling works in single-threaded server scenarios and that we won't make the assumption that we'll have multiple threads when working on this code. For example, this assumption resulted in a bug in the previous diff (which was fixed). Differential Revision: [D23691304](https://our.internmc.facebook.com/intern/diff/D23691304/) [ghstack-poisoned]
…filing" server * #44664 [RPC profiling] Extend RPC profiling to support async function execution over RPC. server * #44664 [RPC profiling] Extend RPC profiling to support async function execution over RPC. server * #44664 [RPC profiling] Extend RPC profiling to support async function execution over RPC. * #44655 [RPC profiling] Don't wrap toHere() calls with profiling * #44653 [RPC profiling] Allow disableProfiler() to be called from another thread. * #44646 Remove thread_local RecordFunctionGuard from profiler. server * #44664 [RPC profiling] Extend RPC profiling to support async function execution over RPC. * #44655 [RPC profiling] Don't wrap toHere() calls with profiling * #44653 [RPC profiling] Allow disableProfiler() to be called from another thread. * #44646 Remove thread_local RecordFunctionGuard from profiler. server * #44664 [RPC profiling] Extend RPC profiling to support async function execution over RPC. * #44655 [RPC profiling] Don't wrap toHere() calls with profiling * #44653 [RPC profiling] Allow disableProfiler() to be called from another thread. * #44646 Remove thread_local RecordFunctionGuard from profiler. server * #44664 [RPC profiling] Extend RPC profiling to support async function execution over RPC. * #44655 [RPC profiling] Don't wrap toHere() calls with profiling * #44653 [RPC profiling] Allow disableProfiler() to be called from another thread. * #44646 Remove thread_local RecordFunctionGuard from profiler. server * #44664 [RPC profiling] Extend RPC profiling to support async function execution over RPC. * #44655 [RPC profiling] Don't wrap toHere() calls with profiling * #44653 [RPC profiling] Allow disableProfiler() to be called from another thread. * #44646 Remove thread_local RecordFunctionGuard from profiler. server * #44664 [RPC profiling] Extend RPC profiling to support async function execution over RPC. * #44655 [RPC profiling] Don't wrap toHere() calls with profiling * #44653 [RPC profiling] Allow disableProfiler() to be called from another thread. * #44646 Remove thread_local RecordFunctionGuard from profiler. server * #44664 [RPC profiling] Extend RPC profiling to support async function execution over RPC. * #44655 [RPC profiling] Don't wrap toHere() calls with profiling * #44653 [RPC profiling] Allow disableProfiler() to be called from another thread. * #44646 Remove thread_local RecordFunctionGuard from profiler. server * #44664 [RPC profiling] Extend RPC profiling to support async function execution over RPC. * #44655 [RPC profiling] Don't wrap toHere() calls with profiling * #44653 [RPC profiling] Allow disableProfiler() to be called from another thread. * #44646 Remove thread_local RecordFunctionGuard from profiler. A comment from @mrshenli on #44664 led us to the following concern: when enabling profiler on server, if it is a different machine it may not have CUDA while caller does. In this case, we would crash but now we fallback to CPU and log a warning. For testing, I forced it to return CUDA profiler state, and validated that it falls back. Not sure how to add a unittest given that we have single machine tests and the machine either has or doesn't have cuda. Differential Revision: [D23790729](https://our.internmc.facebook.com/intern/diff/D23790729/) [ghstack-poisoned]
…pport async function execution over RPC." server * **#44664 [RPC profiling] Extend RPC profiling to support async function execution over RPC.** server * **#44664 [RPC profiling] Extend RPC profiling to support async function execution over RPC.** server * **#44664 [RPC profiling] Extend RPC profiling to support async function execution over RPC.** server * **#44664 [RPC profiling] Extend RPC profiling to support async function execution over RPC.** server * **#44664 [RPC profiling] Extend RPC profiling to support async function execution over RPC.** * #44655 [RPC profiling] Don't wrap toHere() calls with profiling * #44653 [RPC profiling] Allow disableProfiler() to be called from another thread. * #44646 Remove thread_local RecordFunctionGuard from profiler. server * **#44664 [RPC profiling] Extend RPC profiling to support async function execution over RPC.** * #44655 [RPC profiling] Don't wrap toHere() calls with profiling * #44653 [RPC profiling] Allow disableProfiler() to be called from another thread. * #44646 Remove thread_local RecordFunctionGuard from profiler. server * **#44664 [RPC profiling] Extend RPC profiling to support async function execution over RPC.** * #44655 [RPC profiling] Don't wrap toHere() calls with profiling * #44653 [RPC profiling] Allow disableProfiler() to be called from another thread. * #44646 Remove thread_local RecordFunctionGuard from profiler. server * **#44664 [RPC profiling] Extend RPC profiling to support async function execution over RPC.** * #44655 [RPC profiling] Don't wrap toHere() calls with profiling * #44653 [RPC profiling] Allow disableProfiler() to be called from another thread. * #44646 Remove thread_local RecordFunctionGuard from profiler. server * **#44664 [RPC profiling] Extend RPC profiling to support async function execution over RPC.** * #44655 [RPC profiling] Don't wrap toHere() calls with profiling * #44653 [RPC profiling] Allow disableProfiler() to be called from another thread. * #44646 Remove thread_local RecordFunctionGuard from profiler. server * **#44664 [RPC profiling] Extend RPC profiling to support async function execution over RPC.** * #44655 [RPC profiling] Don't wrap toHere() calls with profiling * #44653 [RPC profiling] Allow disableProfiler() to be called from another thread. * #44646 Remove thread_local RecordFunctionGuard from profiler. server * **#44664 [RPC profiling] Extend RPC profiling to support async function execution over RPC.** * #44655 [RPC profiling] Don't wrap toHere() calls with profiling * #44653 [RPC profiling] Allow disableProfiler() to be called from another thread. * #44646 Remove thread_local RecordFunctionGuard from profiler. server * **#44664 [RPC profiling] Extend RPC profiling to support async function execution over RPC.** * #44655 [RPC profiling] Don't wrap toHere() calls with profiling * #44653 [RPC profiling] Allow disableProfiler() to be called from another thread. * #44646 Remove thread_local RecordFunctionGuard from profiler. server * **#44664 [RPC profiling] Extend RPC profiling to support async function execution over RPC.** * #44655 [RPC profiling] Don't wrap toHere() calls with profiling * #44653 [RPC profiling] Allow disableProfiler() to be called from another thread. * #44646 Remove thread_local RecordFunctionGuard from profiler. Closes #39971. This PR adds support for functions decorated with `@rpc.functions.async_execution` to be profiled over RPC as builtins, jit functions, and blocking python UDFs currently can be. The reasoning for this is to provide complete feature support in terms of RPC profiling and the various types of functions users can run. To enable this, the PR below this enables calling `disableProfiler()` safely from another thread. We use that functionality to defer disabling the profiler on the server until the future corresponding to the RPC request completes (rather than only the blocking `processRPC` call as was done previously). Since when the future completes we've kicked off the async function and the future corresponding to it has completed, we are able to capture any RPCs the function would have called and the actual work done on the other node. For example, if the following async function is ran on a server over RPC: ``` def slow_add(x, y): time.sleep(1) return torch.add(x, y) @rpc.functions.async_execution def slow_async_add(to, x, y): return rpc.rpc_async(to, slow_add, args=(x, y)) ``` we expect to see the original RPC profiled, the nested RPC profiled, and the actual torch.add() work. All of these events should be recorded with the correct node id. Here is an example profiling output: ``` ------------------------------------------------------------------------------------------------------------------------- --------------- --------------- --------------- -------- ------- --------------- --------------- --------------- Name Self CPU total % Self CPU total CPU total % CPU total CPU time avg Number of Calls Node ID ------------------------------------------------------------------------------------------------------------------------- --------------- --------------- --------------- -------- ------- --------------- --------------- --------------- rpc_async#slow_async_add(worker1 -> worker2) 0.00% 0.000us 0 1.012s 1.012s 1 1 aten::empty 7.02% 11.519us 7.02% 11.519us 11.519us 1 1 rpc_async#slow_async_add(worker1 -> worker2)#remote_op: rpc_async#slow_add(worker2 -> worker3) 0.00% 0.000us 0 1.006s 1.006s 1 2 rpc_async#slow_async_add(worker1 -> worker2)#remote_op: aten::empty 7.21% 11.843us 7.21% 11.843us 11.843us 1 2 rpc_async#slow_async_add(worker1 -> worker2)#remote_op: rpc_async#slow_add(worker2 -> worker3)#remote_op: aten::add 71.94% 118.107us 85.77% 140.802us 140.802us 1 3 rpc_async#slow_async_add(worker1 -> worker2)#remote_op: rpc_async#slow_add(worker2 -> worker3)#remote_op: aten::empty 13.82% 22.695us 13.82% 22.695us 22.695us 1 3 ------------------------------------------------------------------------------------------------------------------------- --------------- --------------- --------------- -------- ------- --------------- --------------- --------------- Self CPU time total: 164.164us ``` This PR also moves a bunch of the profiling logic to `rpc/utils.cpp` to declutter `request_callback` code. Differential Revision: [D23638387](https://our.internmc.facebook.com/intern/diff/D23638387/) [ghstack-poisoned]
…tion execution over RPC." server * **#44664 [RPC profiling] Extend RPC profiling to support async function execution over RPC.** server * **#44664 [RPC profiling] Extend RPC profiling to support async function execution over RPC.** server * **#44664 [RPC profiling] Extend RPC profiling to support async function execution over RPC.** server * **#44664 [RPC profiling] Extend RPC profiling to support async function execution over RPC.** server * **#44664 [RPC profiling] Extend RPC profiling to support async function execution over RPC.** * #44655 [RPC profiling] Don't wrap toHere() calls with profiling * #44653 [RPC profiling] Allow disableProfiler() to be called from another thread. * #44646 Remove thread_local RecordFunctionGuard from profiler. server * **#44664 [RPC profiling] Extend RPC profiling to support async function execution over RPC.** * #44655 [RPC profiling] Don't wrap toHere() calls with profiling * #44653 [RPC profiling] Allow disableProfiler() to be called from another thread. * #44646 Remove thread_local RecordFunctionGuard from profiler. server * **#44664 [RPC profiling] Extend RPC profiling to support async function execution over RPC.** * #44655 [RPC profiling] Don't wrap toHere() calls with profiling * #44653 [RPC profiling] Allow disableProfiler() to be called from another thread. * #44646 Remove thread_local RecordFunctionGuard from profiler. server * **#44664 [RPC profiling] Extend RPC profiling to support async function execution over RPC.** * #44655 [RPC profiling] Don't wrap toHere() calls with profiling * #44653 [RPC profiling] Allow disableProfiler() to be called from another thread. * #44646 Remove thread_local RecordFunctionGuard from profiler. server * **#44664 [RPC profiling] Extend RPC profiling to support async function execution over RPC.** * #44655 [RPC profiling] Don't wrap toHere() calls with profiling * #44653 [RPC profiling] Allow disableProfiler() to be called from another thread. * #44646 Remove thread_local RecordFunctionGuard from profiler. server * **#44664 [RPC profiling] Extend RPC profiling to support async function execution over RPC.** * #44655 [RPC profiling] Don't wrap toHere() calls with profiling * #44653 [RPC profiling] Allow disableProfiler() to be called from another thread. * #44646 Remove thread_local RecordFunctionGuard from profiler. server * **#44664 [RPC profiling] Extend RPC profiling to support async function execution over RPC.** * #44655 [RPC profiling] Don't wrap toHere() calls with profiling * #44653 [RPC profiling] Allow disableProfiler() to be called from another thread. * #44646 Remove thread_local RecordFunctionGuard from profiler. server * **#44664 [RPC profiling] Extend RPC profiling to support async function execution over RPC.** * #44655 [RPC profiling] Don't wrap toHere() calls with profiling * #44653 [RPC profiling] Allow disableProfiler() to be called from another thread. * #44646 Remove thread_local RecordFunctionGuard from profiler. server * **#44664 [RPC profiling] Extend RPC profiling to support async function execution over RPC.** * #44655 [RPC profiling] Don't wrap toHere() calls with profiling * #44653 [RPC profiling] Allow disableProfiler() to be called from another thread. * #44646 Remove thread_local RecordFunctionGuard from profiler. Closes #39971. This PR adds support for functions decorated with `@rpc.functions.async_execution` to be profiled over RPC as builtins, jit functions, and blocking python UDFs currently can be. The reasoning for this is to provide complete feature support in terms of RPC profiling and the various types of functions users can run. To enable this, the PR below this enables calling `disableProfiler()` safely from another thread. We use that functionality to defer disabling the profiler on the server until the future corresponding to the RPC request completes (rather than only the blocking `processRPC` call as was done previously). Since when the future completes we've kicked off the async function and the future corresponding to it has completed, we are able to capture any RPCs the function would have called and the actual work done on the other node. For example, if the following async function is ran on a server over RPC: ``` def slow_add(x, y): time.sleep(1) return torch.add(x, y) @rpc.functions.async_execution def slow_async_add(to, x, y): return rpc.rpc_async(to, slow_add, args=(x, y)) ``` we expect to see the original RPC profiled, the nested RPC profiled, and the actual torch.add() work. All of these events should be recorded with the correct node id. Here is an example profiling output: ``` ------------------------------------------------------------------------------------------------------------------------- --------------- --------------- --------------- -------- ------- --------------- --------------- --------------- Name Self CPU total % Self CPU total CPU total % CPU total CPU time avg Number of Calls Node ID ------------------------------------------------------------------------------------------------------------------------- --------------- --------------- --------------- -------- ------- --------------- --------------- --------------- rpc_async#slow_async_add(worker1 -> worker2) 0.00% 0.000us 0 1.012s 1.012s 1 1 aten::empty 7.02% 11.519us 7.02% 11.519us 11.519us 1 1 rpc_async#slow_async_add(worker1 -> worker2)#remote_op: rpc_async#slow_add(worker2 -> worker3) 0.00% 0.000us 0 1.006s 1.006s 1 2 rpc_async#slow_async_add(worker1 -> worker2)#remote_op: aten::empty 7.21% 11.843us 7.21% 11.843us 11.843us 1 2 rpc_async#slow_async_add(worker1 -> worker2)#remote_op: rpc_async#slow_add(worker2 -> worker3)#remote_op: aten::add 71.94% 118.107us 85.77% 140.802us 140.802us 1 3 rpc_async#slow_async_add(worker1 -> worker2)#remote_op: rpc_async#slow_add(worker2 -> worker3)#remote_op: aten::empty 13.82% 22.695us 13.82% 22.695us 22.695us 1 3 ------------------------------------------------------------------------------------------------------------------------- --------------- --------------- --------------- -------- ------- --------------- --------------- --------------- Self CPU time total: 164.164us ``` This PR also moves a bunch of the profiling logic to `rpc/utils.cpp` to declutter `request_callback` code. Differential Revision: [D23638387](https://our.internmc.facebook.com/intern/diff/D23638387/) [ghstack-poisoned]
… single threaded server" server** * #44664 [RPC profiling] Extend RPC profiling to support async function execution over RPC. server** * #44664 [RPC profiling] Extend RPC profiling to support async function execution over RPC. server** * #44664 [RPC profiling] Extend RPC profiling to support async function execution over RPC. server** * #44664 [RPC profiling] Extend RPC profiling to support async function execution over RPC. server** * #44664 [RPC profiling] Extend RPC profiling to support async function execution over RPC. * #44655 [RPC profiling] Don't wrap toHere() calls with profiling * #44653 [RPC profiling] Allow disableProfiler() to be called from another thread. * #44646 Remove thread_local RecordFunctionGuard from profiler. server** * #44664 [RPC profiling] Extend RPC profiling to support async function execution over RPC. * #44655 [RPC profiling] Don't wrap toHere() calls with profiling * #44653 [RPC profiling] Allow disableProfiler() to be called from another thread. * #44646 Remove thread_local RecordFunctionGuard from profiler. server** * #44664 [RPC profiling] Extend RPC profiling to support async function execution over RPC. * #44655 [RPC profiling] Don't wrap toHere() calls with profiling * #44653 [RPC profiling] Allow disableProfiler() to be called from another thread. * #44646 Remove thread_local RecordFunctionGuard from profiler. server** * #44664 [RPC profiling] Extend RPC profiling to support async function execution over RPC. * #44655 [RPC profiling] Don't wrap toHere() calls with profiling * #44653 [RPC profiling] Allow disableProfiler() to be called from another thread. * #44646 Remove thread_local RecordFunctionGuard from profiler. server** * #44664 [RPC profiling] Extend RPC profiling to support async function execution over RPC. * #44655 [RPC profiling] Don't wrap toHere() calls with profiling * #44653 [RPC profiling] Allow disableProfiler() to be called from another thread. * #44646 Remove thread_local RecordFunctionGuard from profiler. server** * #44664 [RPC profiling] Extend RPC profiling to support async function execution over RPC. * #44655 [RPC profiling] Don't wrap toHere() calls with profiling * #44653 [RPC profiling] Allow disableProfiler() to be called from another thread. * #44646 Remove thread_local RecordFunctionGuard from profiler. server** * #44664 [RPC profiling] Extend RPC profiling to support async function execution over RPC. * #44655 [RPC profiling] Don't wrap toHere() calls with profiling * #44653 [RPC profiling] Allow disableProfiler() to be called from another thread. * #44646 Remove thread_local RecordFunctionGuard from profiler. server** * #44664 [RPC profiling] Extend RPC profiling to support async function execution over RPC. * #44655 [RPC profiling] Don't wrap toHere() calls with profiling * #44653 [RPC profiling] Allow disableProfiler() to be called from another thread. * #44646 Remove thread_local RecordFunctionGuard from profiler. server** * #44664 [RPC profiling] Extend RPC profiling to support async function execution over RPC. * #44655 [RPC profiling] Don't wrap toHere() calls with profiling * #44653 [RPC profiling] Allow disableProfiler() to be called from another thread. * #44646 Remove thread_local RecordFunctionGuard from profiler. server** * #44664 [RPC profiling] Extend RPC profiling to support async function execution over RPC. * #44655 [RPC profiling] Don't wrap toHere() calls with profiling * #44653 [RPC profiling] Allow disableProfiler() to be called from another thread. * #44646 Remove thread_local RecordFunctionGuard from profiler. server** * #44664 [RPC profiling] Extend RPC profiling to support async function execution over RPC. * #44655 [RPC profiling] Don't wrap toHere() calls with profiling * #44653 [RPC profiling] Allow disableProfiler() to be called from another thread. * #44646 Remove thread_local RecordFunctionGuard from profiler. server This ensures that RPC profiling works in single-threaded server scenarios and that we won't make the assumption that we'll have multiple threads when working on this code. For example, this assumption resulted in a bug in the previous diff (which was fixed). Differential Revision: [D23691304](https://our.internmc.facebook.com/intern/diff/D23691304/) [ghstack-poisoned]
…filing" server * #44664 [RPC profiling] Extend RPC profiling to support async function execution over RPC. server * #44664 [RPC profiling] Extend RPC profiling to support async function execution over RPC. server * #44664 [RPC profiling] Extend RPC profiling to support async function execution over RPC. server * #44664 [RPC profiling] Extend RPC profiling to support async function execution over RPC. server * #44664 [RPC profiling] Extend RPC profiling to support async function execution over RPC. * #44655 [RPC profiling] Don't wrap toHere() calls with profiling * #44653 [RPC profiling] Allow disableProfiler() to be called from another thread. * #44646 Remove thread_local RecordFunctionGuard from profiler. server * #44664 [RPC profiling] Extend RPC profiling to support async function execution over RPC. * #44655 [RPC profiling] Don't wrap toHere() calls with profiling * #44653 [RPC profiling] Allow disableProfiler() to be called from another thread. * #44646 Remove thread_local RecordFunctionGuard from profiler. server * #44664 [RPC profiling] Extend RPC profiling to support async function execution over RPC. * #44655 [RPC profiling] Don't wrap toHere() calls with profiling * #44653 [RPC profiling] Allow disableProfiler() to be called from another thread. * #44646 Remove thread_local RecordFunctionGuard from profiler. server * #44664 [RPC profiling] Extend RPC profiling to support async function execution over RPC. * #44655 [RPC profiling] Don't wrap toHere() calls with profiling * #44653 [RPC profiling] Allow disableProfiler() to be called from another thread. * #44646 Remove thread_local RecordFunctionGuard from profiler. server * #44664 [RPC profiling] Extend RPC profiling to support async function execution over RPC. * #44655 [RPC profiling] Don't wrap toHere() calls with profiling * #44653 [RPC profiling] Allow disableProfiler() to be called from another thread. * #44646 Remove thread_local RecordFunctionGuard from profiler. server * #44664 [RPC profiling] Extend RPC profiling to support async function execution over RPC. * #44655 [RPC profiling] Don't wrap toHere() calls with profiling * #44653 [RPC profiling] Allow disableProfiler() to be called from another thread. * #44646 Remove thread_local RecordFunctionGuard from profiler. server * #44664 [RPC profiling] Extend RPC profiling to support async function execution over RPC. * #44655 [RPC profiling] Don't wrap toHere() calls with profiling * #44653 [RPC profiling] Allow disableProfiler() to be called from another thread. * #44646 Remove thread_local RecordFunctionGuard from profiler. server * #44664 [RPC profiling] Extend RPC profiling to support async function execution over RPC. * #44655 [RPC profiling] Don't wrap toHere() calls with profiling * #44653 [RPC profiling] Allow disableProfiler() to be called from another thread. * #44646 Remove thread_local RecordFunctionGuard from profiler. A comment from @mrshenli on #44664 led us to the following concern: when enabling profiler on server, if it is a different machine it may not have CUDA while caller does. In this case, we would crash but now we fallback to CPU and log a warning. For testing, I forced it to return CUDA profiler state, and validated that it falls back. Not sure how to add a unittest given that we have single machine tests and the machine either has or doesn't have cuda. Differential Revision: [D23790729](https://our.internmc.facebook.com/intern/diff/D23790729/) [ghstack-poisoned]
…ave CUDA for profiling" server * #44664 [RPC profiling] Extend RPC profiling to support async function execution over RPC. server * #44664 [RPC profiling] Extend RPC profiling to support async function execution over RPC. server * #44664 [RPC profiling] Extend RPC profiling to support async function execution over RPC. server * #44664 [RPC profiling] Extend RPC profiling to support async function execution over RPC. server * #44664 [RPC profiling] Extend RPC profiling to support async function execution over RPC. server * #44664 [RPC profiling] Extend RPC profiling to support async function execution over RPC. * #44655 [RPC profiling] Don't wrap toHere() calls with profiling * #44653 [RPC profiling] Allow disableProfiler() to be called from another thread. * #44646 Remove thread_local RecordFunctionGuard from profiler. server * #44664 [RPC profiling] Extend RPC profiling to support async function execution over RPC. * #44655 [RPC profiling] Don't wrap toHere() calls with profiling * #44653 [RPC profiling] Allow disableProfiler() to be called from another thread. * #44646 Remove thread_local RecordFunctionGuard from profiler. server * #44664 [RPC profiling] Extend RPC profiling to support async function execution over RPC. * #44655 [RPC profiling] Don't wrap toHere() calls with profiling * #44653 [RPC profiling] Allow disableProfiler() to be called from another thread. * #44646 Remove thread_local RecordFunctionGuard from profiler. server * #44664 [RPC profiling] Extend RPC profiling to support async function execution over RPC. * #44655 [RPC profiling] Don't wrap toHere() calls with profiling * #44653 [RPC profiling] Allow disableProfiler() to be called from another thread. * #44646 Remove thread_local RecordFunctionGuard from profiler. server * #44664 [RPC profiling] Extend RPC profiling to support async function execution over RPC. * #44655 [RPC profiling] Don't wrap toHere() calls with profiling * #44653 [RPC profiling] Allow disableProfiler() to be called from another thread. * #44646 Remove thread_local RecordFunctionGuard from profiler. server * #44664 [RPC profiling] Extend RPC profiling to support async function execution over RPC. * #44655 [RPC profiling] Don't wrap toHere() calls with profiling * #44653 [RPC profiling] Allow disableProfiler() to be called from another thread. * #44646 Remove thread_local RecordFunctionGuard from profiler. server * #44664 [RPC profiling] Extend RPC profiling to support async function execution over RPC. * #44655 [RPC profiling] Don't wrap toHere() calls with profiling * #44653 [RPC profiling] Allow disableProfiler() to be called from another thread. * #44646 Remove thread_local RecordFunctionGuard from profiler. server * #44664 [RPC profiling] Extend RPC profiling to support async function execution over RPC. * #44655 [RPC profiling] Don't wrap toHere() calls with profiling * #44653 [RPC profiling] Allow disableProfiler() to be called from another thread. * #44646 Remove thread_local RecordFunctionGuard from profiler. A comment from @mrshenli on #44664 led us to the following concern: when enabling profiler on server, if it is a different machine it may not have CUDA while caller does. In this case, we would crash but now we fallback to CPU and log a warning. For testing, I forced it to return CUDA profiler state, and validated that it falls back. Not sure how to add a unittest given that we have single machine tests and the machine either has or doesn't have cuda. Differential Revision: [D23790729](https://our.internmc.facebook.com/intern/diff/D23790729/) [ghstack-poisoned]
…filing" server * #44664 [RPC profiling] Extend RPC profiling to support async function execution over RPC. server * #44664 [RPC profiling] Extend RPC profiling to support async function execution over RPC. server * #44664 [RPC profiling] Extend RPC profiling to support async function execution over RPC. server * #44664 [RPC profiling] Extend RPC profiling to support async function execution over RPC. server * #44664 [RPC profiling] Extend RPC profiling to support async function execution over RPC. server * #44664 [RPC profiling] Extend RPC profiling to support async function execution over RPC. * #44655 [RPC profiling] Don't wrap toHere() calls with profiling * #44653 [RPC profiling] Allow disableProfiler() to be called from another thread. * #44646 Remove thread_local RecordFunctionGuard from profiler. server * #44664 [RPC profiling] Extend RPC profiling to support async function execution over RPC. * #44655 [RPC profiling] Don't wrap toHere() calls with profiling * #44653 [RPC profiling] Allow disableProfiler() to be called from another thread. * #44646 Remove thread_local RecordFunctionGuard from profiler. server * #44664 [RPC profiling] Extend RPC profiling to support async function execution over RPC. * #44655 [RPC profiling] Don't wrap toHere() calls with profiling * #44653 [RPC profiling] Allow disableProfiler() to be called from another thread. * #44646 Remove thread_local RecordFunctionGuard from profiler. server * #44664 [RPC profiling] Extend RPC profiling to support async function execution over RPC. * #44655 [RPC profiling] Don't wrap toHere() calls with profiling * #44653 [RPC profiling] Allow disableProfiler() to be called from another thread. * #44646 Remove thread_local RecordFunctionGuard from profiler. server * #44664 [RPC profiling] Extend RPC profiling to support async function execution over RPC. * #44655 [RPC profiling] Don't wrap toHere() calls with profiling * #44653 [RPC profiling] Allow disableProfiler() to be called from another thread. * #44646 Remove thread_local RecordFunctionGuard from profiler. server * #44664 [RPC profiling] Extend RPC profiling to support async function execution over RPC. * #44655 [RPC profiling] Don't wrap toHere() calls with profiling * #44653 [RPC profiling] Allow disableProfiler() to be called from another thread. * #44646 Remove thread_local RecordFunctionGuard from profiler. server * #44664 [RPC profiling] Extend RPC profiling to support async function execution over RPC. * #44655 [RPC profiling] Don't wrap toHere() calls with profiling * #44653 [RPC profiling] Allow disableProfiler() to be called from another thread. * #44646 Remove thread_local RecordFunctionGuard from profiler. server * #44664 [RPC profiling] Extend RPC profiling to support async function execution over RPC. * #44655 [RPC profiling] Don't wrap toHere() calls with profiling * #44653 [RPC profiling] Allow disableProfiler() to be called from another thread. * #44646 Remove thread_local RecordFunctionGuard from profiler. A comment from @mrshenli on #44664 led us to the following concern: when enabling profiler on server, if it is a different machine it may not have CUDA while caller does. In this case, we would crash but now we fallback to CPU and log a warning. For testing, I forced it to return CUDA profiler state, and validated that it falls back. Not sure how to add a unittest given that we have single machine tests and the machine either has or doesn't have cuda. Differential Revision: [D23790729](https://our.internmc.facebook.com/intern/diff/D23790729/) [ghstack-poisoned]
…ave CUDA for profiling" server * #44664 [RPC profiling] Extend RPC profiling to support async function execution over RPC. server * #44664 [RPC profiling] Extend RPC profiling to support async function execution over RPC. server * #44664 [RPC profiling] Extend RPC profiling to support async function execution over RPC. server * #44664 [RPC profiling] Extend RPC profiling to support async function execution over RPC. server * #44664 [RPC profiling] Extend RPC profiling to support async function execution over RPC. server * #44664 [RPC profiling] Extend RPC profiling to support async function execution over RPC. * #44655 [RPC profiling] Don't wrap toHere() calls with profiling * #44653 [RPC profiling] Allow disableProfiler() to be called from another thread. * #44646 Remove thread_local RecordFunctionGuard from profiler. server * #44664 [RPC profiling] Extend RPC profiling to support async function execution over RPC. * #44655 [RPC profiling] Don't wrap toHere() calls with profiling * #44653 [RPC profiling] Allow disableProfiler() to be called from another thread. * #44646 Remove thread_local RecordFunctionGuard from profiler. server * #44664 [RPC profiling] Extend RPC profiling to support async function execution over RPC. * #44655 [RPC profiling] Don't wrap toHere() calls with profiling * #44653 [RPC profiling] Allow disableProfiler() to be called from another thread. * #44646 Remove thread_local RecordFunctionGuard from profiler. server * #44664 [RPC profiling] Extend RPC profiling to support async function execution over RPC. * #44655 [RPC profiling] Don't wrap toHere() calls with profiling * #44653 [RPC profiling] Allow disableProfiler() to be called from another thread. * #44646 Remove thread_local RecordFunctionGuard from profiler. server * #44664 [RPC profiling] Extend RPC profiling to support async function execution over RPC. * #44655 [RPC profiling] Don't wrap toHere() calls with profiling * #44653 [RPC profiling] Allow disableProfiler() to be called from another thread. * #44646 Remove thread_local RecordFunctionGuard from profiler. server * #44664 [RPC profiling] Extend RPC profiling to support async function execution over RPC. * #44655 [RPC profiling] Don't wrap toHere() calls with profiling * #44653 [RPC profiling] Allow disableProfiler() to be called from another thread. * #44646 Remove thread_local RecordFunctionGuard from profiler. server * #44664 [RPC profiling] Extend RPC profiling to support async function execution over RPC. * #44655 [RPC profiling] Don't wrap toHere() calls with profiling * #44653 [RPC profiling] Allow disableProfiler() to be called from another thread. * #44646 Remove thread_local RecordFunctionGuard from profiler. server * #44664 [RPC profiling] Extend RPC profiling to support async function execution over RPC. * #44655 [RPC profiling] Don't wrap toHere() calls with profiling * #44653 [RPC profiling] Allow disableProfiler() to be called from another thread. * #44646 Remove thread_local RecordFunctionGuard from profiler. A comment from @mrshenli on #44664 led us to the following concern: when enabling profiler on server, if it is a different machine it may not have CUDA while caller does. In this case, we would crash but now we fallback to CPU and log a warning. For testing, I forced it to return CUDA profiler state, and validated that it falls back. Not sure how to add a unittest given that we have single machine tests and the machine either has or doesn't have cuda. Differential Revision: [D23790729](https://our.internmc.facebook.com/intern/diff/D23790729/) [ghstack-poisoned]
…filing" server * #44664 [RPC profiling] Extend RPC profiling to support async function execution over RPC. server * #44664 [RPC profiling] Extend RPC profiling to support async function execution over RPC. server * #44664 [RPC profiling] Extend RPC profiling to support async function execution over RPC. server * #44664 [RPC profiling] Extend RPC profiling to support async function execution over RPC. server * #44664 [RPC profiling] Extend RPC profiling to support async function execution over RPC. server * #44664 [RPC profiling] Extend RPC profiling to support async function execution over RPC. * #44655 [RPC profiling] Don't wrap toHere() calls with profiling * #44653 [RPC profiling] Allow disableProfiler() to be called from another thread. * #44646 Remove thread_local RecordFunctionGuard from profiler. server * #44664 [RPC profiling] Extend RPC profiling to support async function execution over RPC. * #44655 [RPC profiling] Don't wrap toHere() calls with profiling * #44653 [RPC profiling] Allow disableProfiler() to be called from another thread. * #44646 Remove thread_local RecordFunctionGuard from profiler. server * #44664 [RPC profiling] Extend RPC profiling to support async function execution over RPC. * #44655 [RPC profiling] Don't wrap toHere() calls with profiling * #44653 [RPC profiling] Allow disableProfiler() to be called from another thread. * #44646 Remove thread_local RecordFunctionGuard from profiler. server * #44664 [RPC profiling] Extend RPC profiling to support async function execution over RPC. * #44655 [RPC profiling] Don't wrap toHere() calls with profiling * #44653 [RPC profiling] Allow disableProfiler() to be called from another thread. * #44646 Remove thread_local RecordFunctionGuard from profiler. server * #44664 [RPC profiling] Extend RPC profiling to support async function execution over RPC. * #44655 [RPC profiling] Don't wrap toHere() calls with profiling * #44653 [RPC profiling] Allow disableProfiler() to be called from another thread. * #44646 Remove thread_local RecordFunctionGuard from profiler. server * #44664 [RPC profiling] Extend RPC profiling to support async function execution over RPC. * #44655 [RPC profiling] Don't wrap toHere() calls with profiling * #44653 [RPC profiling] Allow disableProfiler() to be called from another thread. * #44646 Remove thread_local RecordFunctionGuard from profiler. server * #44664 [RPC profiling] Extend RPC profiling to support async function execution over RPC. * #44655 [RPC profiling] Don't wrap toHere() calls with profiling * #44653 [RPC profiling] Allow disableProfiler() to be called from another thread. * #44646 Remove thread_local RecordFunctionGuard from profiler. server * #44664 [RPC profiling] Extend RPC profiling to support async function execution over RPC. * #44655 [RPC profiling] Don't wrap toHere() calls with profiling * #44653 [RPC profiling] Allow disableProfiler() to be called from another thread. * #44646 Remove thread_local RecordFunctionGuard from profiler. A comment from @mrshenli on #44664 led us to the following concern: when enabling profiler on server, if it is a different machine it may not have CUDA while caller does. In this case, we would crash but now we fallback to CPU and log a warning. For testing, I forced it to return CUDA profiler state, and validated that it falls back. Not sure how to add a unittest given that we have single machine tests and the machine either has or doesn't have cuda. Differential Revision: [D23790729](https://our.internmc.facebook.com/intern/diff/D23790729/) [ghstack-poisoned]
Stack from ghstack:
server
server
server
server
server
server
server
server
Since
toHere()does not execute operations (torch operators) over RPC and simplytransfers the value to the local node, we don't need to enable the profiler
remotely for this message. This causes unnecessary overhead and is not needed.
Since
toHereis a blocking call, we already profile the call on the local node usingRECORD_USER_SCOPE, so this does not change the expected profiler results (validated by ensuring all remote profiling tests pass).Differential Revision: D23641466