Skip to content

Commit d92fd2d

Browse files
Taylor Robiepytorchmergebot
authored andcommitted
[Profiler] Limit calls to recordThreadInfo (#74888)
Summary: Pull Request resolved: #74888 So far as I can tell, `recordThreadInfo` only needs to be called once per thread. Once we have thread local subqueues we can easily manage this by simply calling it in the subqueue constructor. Test Plan: The effect on single threaded overhead is pretty minimal, but it improves stress test overhead from ~6.1 us to ~1.4us since we're no contending over the lock in Kineto. Reviewed By: chaekit Differential Revision: D34811694 fbshipit-source-id: da1047f7ae43af048773610a0f250fa514c67989 (cherry picked from commit 9a5b926)
1 parent f17ad06 commit d92fd2d

File tree

2 files changed

+3
-4
lines changed

2 files changed

+3
-4
lines changed

torch/csrc/autograd/profiler_kineto.cpp

Lines changed: 0 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -631,7 +631,6 @@ void pushProfilingCallbacks(const std::unordered_set<at::RecordScope>& scopes) {
631631
}
632632

633633
torch::profiler::impl::kineto::popCorrelationId();
634-
torch::profiler::impl::kineto::recordThreadInfo();
635634
})
636635
.needsInputs(registration_state_ptr->config().report_input_shapes)
637636
.scopes(scopes));
@@ -667,8 +666,6 @@ void reportBackendEventToActiveKinetoProfiler(
667666
ctx_ptr->dtypes = inputTypes(fn);
668667
}
669668
*/
670-
671-
torch::profiler::impl::kineto::recordThreadInfo();
672669
}
673670

674671
void prepareProfiler(

torch/csrc/profiler/collection.cpp

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -141,7 +141,9 @@ uint64_t Result::correlation_id() const {
141141
ThreadLocalSubqueue::ThreadLocalSubqueue(
142142
const uint64_t tid,
143143
const ProfilerConfig& config)
144-
: tid_{tid}, config_{config}, kineto_info_{kineto::kineto_ids()} {}
144+
: tid_{tid}, config_{config}, kineto_info_{kineto::kineto_ids()} {
145+
torch::profiler::impl::kineto::recordThreadInfo();
146+
}
145147

146148
std::unique_ptr<KinetoObserverContext> ThreadLocalSubqueue::begin_op(
147149
const at::RecordFunction& fn,

0 commit comments

Comments
 (0)