-
Notifications
You must be signed in to change notification settings - Fork 26.3k
Source code level attribution in profiler #43898
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Summary: Adding with_source parameter to enable tracking source code (filename and line) in profiler for eager, torchscript and autograd modes Test Plan: python test/test_profiler.py Name Self CPU total % Self CPU total CPU total % CPU total CPU time avg Number of Calls Source Location ----------------------------------- --------------- --------------- --------------- --------------- --------------- --------------- -------------------------------------------- ts_method_1 10.43% 235.364us 36.46% 822.920us 822.920us 1 test/test_profiler.py(70): test_source aten::add 7.52% 169.833us 8.88% 200.439us 200.439us 1 test/test_profiler.py(69): test_source aten::normal_ 6.26% 141.380us 6.26% 141.380us 141.380us 1 test/test_profiler.py(67): test_source aten::add 5.80% 130.830us 8.41% 189.800us 63.267us 3 test/test_profiler.py(72): test_source aten::sum 5.02% 113.340us 8.39% 189.475us 189.475us 1 test/test_profiler.py(64): ts_method_1 aten::add 4.58% 103.346us 6.33% 142.847us 142.847us 1 test/test_profiler.py(62): ts_method_1 aten::mul 4.05% 91.498us 9.62% 217.113us 217.113us 1 test/test_profiler.py(71): test_source aten::add 4.03% 90.880us 5.60% 126.405us 126.405us 1 test/test_profiler.py(58): ts_method_2 aten::empty 3.49% 78.735us 3.49% 78.735us 19.684us 4 test/test_profiler.py(72): test_source Reviewers:
💊 CI failures summary and remediationsAs of commit 196ba66 (more details on the Dr. CI page):
❄️ 1 failure tentatively classified as flakybut reruns have not yet been triggered to confirm:
|
ngimel
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please benchmark the added overheads and make sure it's acceptable with disabled profiling/profiling w/o source attribution.
dzhulgakov
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
torch/csrc/autograd/function.h
Outdated
| uint64_t sequence_nr, | ||
| edge_list&& next_edges = edge_list()) | ||
| : sequence_nr_(sequence_nr), | ||
| thread_id_(at::RecordFunction::currentThreadId()), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cc @albanD - is it a concern that we increase size of Node? Maybe we could do some packing of both indices in a single int64_t? I'm not sure how in general sequence_nrs work - are they thread local?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not something I like to see for sure. Though I never actually did a full benchmark of the impact on memory and speed.
Do you actually always need this thread id or just when the profiler is running?
If so, I think a nice way to guard this to only run when the profiler runs and store it in the metadata struct that we also use for anomaly mode to store this (you can make the python dict initialization lazy to improve perf when you just store and read the thread_id).
I'm not sure how in general sequence_nrs work - are they thread local?
Yes these are just thread local variables (not propagated by TLSState)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we then should either set or not set both of them, because seq. number without thread is not very useful;
if extra int64 is too much I'd better go with packing though - less runtime checks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The autograd engine make heavy use of the seq number (that does not require the thread). So we have to set this one.
[wip] Source code level attribution in profiler Summary: Adding with_source parameter to enable tracking source code (filename and line) in profiler for eager, torchscript and autograd modes Test Plan: python test/test_profiler.py Name Self CPU total % Self CPU total CPU total % CPU total CPU time avg Number of Calls Source Location ----------------------------------- --------------- --------------- --------------- --------------- --------------- --------------- -------------------------------------------- ts_method_1 10.43% 235.364us 36.46% 822.920us 822.920us 1 test/test_profiler.py(70): test_source aten::add 7.52% 169.833us 8.88% 200.439us 200.439us 1 test/test_profiler.py(69): test_source aten::normal_ 6.26% 141.380us 6.26% 141.380us 141.380us 1 test/test_profiler.py(67): test_source aten::add 5.80% 130.830us 8.41% 189.800us 63.267us 3 test/test_profiler.py(72): test_source aten::sum 5.02% 113.340us 8.39% 189.475us 189.475us 1 test/test_profiler.py(64): ts_method_1 aten::add 4.58% 103.346us 6.33% 142.847us 142.847us 1 test/test_profiler.py(62): ts_method_1 aten::mul 4.05% 91.498us 9.62% 217.113us 217.113us 1 test/test_profiler.py(71): test_source aten::add 4.03% 90.880us 5.60% 126.405us 126.405us 1 test/test_profiler.py(58): ts_method_2 aten::empty 3.49% 78.735us 3.49% 78.735us 19.684us 4 test/test_profiler.py(72): test_source Reviewers: gh-metadata: pytorch pytorch 43898 gh/ilia-cher/40/head
[wip] Source code level attribution in profiler Summary: Adding with_source parameter to enable tracking source code (filename and line) in profiler for eager, torchscript and autograd modes Test Plan: python test/test_profiler.py Name Self CPU total % Self CPU total CPU total % CPU total CPU time avg Number of Calls Source Location ----------------------------------- --------------- --------------- --------------- --------------- --------------- --------------- -------------------------------------------- ts_method_1 10.43% 235.364us 36.46% 822.920us 822.920us 1 test/test_profiler.py(70): test_source aten::add 7.52% 169.833us 8.88% 200.439us 200.439us 1 test/test_profiler.py(69): test_source aten::normal_ 6.26% 141.380us 6.26% 141.380us 141.380us 1 test/test_profiler.py(67): test_source aten::add 5.80% 130.830us 8.41% 189.800us 63.267us 3 test/test_profiler.py(72): test_source aten::sum 5.02% 113.340us 8.39% 189.475us 189.475us 1 test/test_profiler.py(64): ts_method_1 aten::add 4.58% 103.346us 6.33% 142.847us 142.847us 1 test/test_profiler.py(62): ts_method_1 aten::mul 4.05% 91.498us 9.62% 217.113us 217.113us 1 test/test_profiler.py(71): test_source aten::add 4.03% 90.880us 5.60% 126.405us 126.405us 1 test/test_profiler.py(58): ts_method_2 aten::empty 3.49% 78.735us 3.49% 78.735us 19.684us 4 test/test_profiler.py(72): test_source Reviewers: gh-metadata: pytorch pytorch 43898 gh/ilia-cher/40/head
[wip] Source code level attribution in profiler Summary: Adding with_source parameter to enable tracking source code (filename and line) in profiler for eager, torchscript and autograd modes Test Plan: python test/test_profiler.py Output: https://gist.github.com/ilia-cher/bd13a3ab0c4905ce58b0079df66711fa Reviewers: gh-metadata: pytorch pytorch 43898 gh/ilia-cher/40/head
Source code level attribution in profiler Summary: Adding with_source parameter to enable tracking source code (filename and line) in profiler for eager, torchscript and autograd modes Test Plan: python test/test_profiler.py Output: https://gist.github.com/ilia-cher/bd13a3ab0c4905ce58b0079df66711fa Reviewers: gh-metadata: pytorch pytorch 43898 gh/ilia-cher/40/head
Source code level attribution in profiler Summary: Adding with_source parameter to enable tracking source code (filename and line) in profiler for eager, torchscript and autograd modes Test Plan: python test/test_profiler.py Output: https://gist.github.com/ilia-cher/bd13a3ab0c4905ce58b0079df66711fa Reviewers: gh-metadata: pytorch pytorch 43898 gh/ilia-cher/40/head
Source code level attribution in profiler Summary: Adding with_source parameter to enable tracking source code (filename and line) in profiler for eager, torchscript and autograd modes Test Plan: python test/test_profiler.py Output: https://gist.github.com/ilia-cher/bd13a3ab0c4905ce58b0079df66711fa Reviewers: gh-metadata: pytorch pytorch 43898 gh/ilia-cher/40/head
|
addressed most comments, will upd with a new round of benchmarking |
dzhulgakov
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These changes to TS look better, but @suo to double check
Source code level attribution in profiler Summary: Adding with_source parameter to enable tracking source code (filename and line) in profiler for eager, torchscript and autograd modes Test Plan: python test/test_profiler.py Output: https://gist.github.com/ilia-cher/bd13a3ab0c4905ce58b0079df66711fa Reviewers: gh-metadata: pytorch pytorch 43898 gh/ilia-cher/40/head
Source code level attribution in profiler Summary: Adding with_source parameter to enable tracking source code (filename and line) in profiler for eager, torchscript and autograd modes Test Plan: python test/test_profiler.py Output: https://gist.github.com/ilia-cher/bd13a3ab0c4905ce58b0079df66711fa Reviewers: gh-metadata: pytorch pytorch 43898 gh/ilia-cher/40/head
Source code level attribution in profiler Summary: Adding with_source parameter to enable tracking source code (filename and line) in profiler for eager, torchscript and autograd modes Test Plan: python test/test_profiler.py Output: https://gist.github.com/ilia-cher/bd13a3ab0c4905ce58b0079df66711fa Reviewers: gh-metadata: pytorch pytorch 43898 gh/ilia-cher/40/head
|
added profiler benchmark results - as expected when using with_stack=True it is slower, and the slow down is less when using TorchScript |
Summary: Adding with_source parameter to enable tracking source code (filename and line) in profiler for eager, torchscript and autograd modes Test Plan: python test/test_profiler.py Output: https://gist.github.com/ilia-cher/bd13a3ab0c4905ce58b0079df66711fa Benchmark: updated benchmark: profiler https://gist.github.com/ilia-cher/ff7064a311790590ffa8445c95d4d45b as expected, no change when no stacks are used, slower when stacks are used (in particular python is slower than TS callstack) overhead: https://gist.github.com/ilia-cher/8a2861863d3033b56181d23ba4cfbda1 overall as expected: small increase in cpu instructions (0.7%), small increase in tls accesses expected (bc of function.h change) Differential Revision: [D23432664](https://our.internmc.facebook.com/intern/diff/D23432664) [ghstack-poisoned]
|
lint, |
Summary: Adding with_source parameter to enable tracking source code (filename and line) in profiler for eager, torchscript and autograd modes Test Plan: python test/test_profiler.py Output: https://gist.github.com/ilia-cher/bd13a3ab0c4905ce58b0079df66711fa Benchmark: updated benchmark: profiler https://gist.github.com/ilia-cher/ff7064a311790590ffa8445c95d4d45b as expected, no change when no stacks are used, slower when stacks are used (in particular python is slower than TS callstack) overhead: https://gist.github.com/ilia-cher/8a2861863d3033b56181d23ba4cfbda1 overall as expected: small increase in cpu instructions (0.7%), small increase in tls accesses expected (bc of function.h change) Differential Revision: [D23432664](https://our.internmc.facebook.com/intern/diff/D23432664) [ghstack-poisoned]
dzhulgakov
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, your overhead measurements are actually show 6-8% slow perf with no profiling:
eager: 0.895ms -> 0.965ms
script: 0.585ms -> 0.625ms
That feels actually like a lot. Did you bisect where is it coming from? For eager mode there's even no interpreter changes - it must be from the RecordFunction itself then?
| size_t source_line = 0; | ||
| std::stringstream stack_trace; | ||
|
|
||
| std::vector<StackEntry> pythonCallstack() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: it's a bit weird that it lives in python_tracer (kind of unrelated), maybe move to some more generic file? (not critical)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i'll check if there's a better location available, if not we'd prob. better not create a new one..
ngimel
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have small docs suggestions, other than that 6% overhead does seem like a lot, but it's not obvious where it's coming from.
Re. runtimes - I included them mainly to show the rough difference between with stack / without stack profiling, and python vs TS callstacks, but for other purposes it is noisy, so I included valgrind based comparison for base vs trunk default mode. Also reran benchmark (1000 times, no profiling, PR vs base, eager/script, different machine): So there's no stat. sig. information about runtime for base vs PR, though callgrind stats seem to give reproducible stats (https://gist.github.com/ilia-cher/8a2861863d3033b56181d23ba4cfbda1) e.g. they show increase in tls accesses (expected bc of |
Summary: Adding with_source parameter to enable tracking source code (filename and line) in profiler for eager, torchscript and autograd modes Test Plan: python test/test_profiler.py Output: https://gist.github.com/ilia-cher/bd13a3ab0c4905ce58b0079df66711fa Benchmark: updated benchmark: profiler https://gist.github.com/ilia-cher/ff7064a311790590ffa8445c95d4d45b as expected, no change when no stacks are used, slower when stacks are used (in particular python is slower than TS callstack) overhead: https://gist.github.com/ilia-cher/8a2861863d3033b56181d23ba4cfbda1 overall as expected: small increase in cpu instructions (0.7%), small increase in tls accesses expected (bc of function.h change) Differential Revision: [D23432664](https://our.internmc.facebook.com/intern/diff/D23432664) [ghstack-poisoned]
Summary: Adding with_source parameter to enable tracking source code (filename and line) in profiler for eager, torchscript and autograd modes Test Plan: python test/test_profiler.py Output: https://gist.github.com/ilia-cher/bd13a3ab0c4905ce58b0079df66711fa Benchmark: updated benchmark: profiler https://gist.github.com/ilia-cher/ff7064a311790590ffa8445c95d4d45b as expected, no change when no stacks are used, slower when stacks are used (in particular python is slower than TS callstack) overhead: https://gist.github.com/ilia-cher/8a2861863d3033b56181d23ba4cfbda1 overall as expected: small increase in cpu instructions (0.7%), small increase in tls accesses expected (bc of function.h change) Differential Revision: [D23432664](https://our.internmc.facebook.com/intern/diff/D23432664) [ghstack-poisoned]
|
rest of comments + added support for @robieta Timer into profiler_bench.py |
Summary: Adding with_source parameter to enable tracking source code (filename and line) in profiler for eager, torchscript and autograd modes Test Plan: python test/test_profiler.py Output: https://gist.github.com/ilia-cher/bd13a3ab0c4905ce58b0079df66711fa Benchmark: updated benchmark: profiler https://gist.github.com/ilia-cher/ff7064a311790590ffa8445c95d4d45b as expected, no change when no stacks are used, slower when stacks are used (in particular python is slower than TS callstack) overhead: https://gist.github.com/ilia-cher/8a2861863d3033b56181d23ba4cfbda1 overall as expected: small increase in cpu instructions (0.7%), small increase in tls accesses expected (bc of function.h change) Differential Revision: [D23432664](https://our.internmc.facebook.com/intern/diff/D23432664) [ghstack-poisoned]
Summary: Adding with_source parameter to enable tracking source code (filename and line) in profiler for eager, torchscript and autograd modes Test Plan: python test/test_profiler.py Output: https://gist.github.com/ilia-cher/bd13a3ab0c4905ce58b0079df66711fa Benchmark: updated benchmark: profiler https://gist.github.com/ilia-cher/ff7064a311790590ffa8445c95d4d45b as expected, no change when no stacks are used, slower when stacks are used (in particular python is slower than TS callstack) overhead: https://gist.github.com/ilia-cher/8a2861863d3033b56181d23ba4cfbda1 overall as expected: small increase in cpu instructions (0.7%), small increase in tls accesses expected (bc of function.h change) Differential Revision: [D23432664](https://our.internmc.facebook.com/intern/diff/D23432664) [ghstack-poisoned]
|
example of benchmarking with |
And these are 100 second runs each? If so, that's a noisy system. |
yes, it is 100s |
Summary: Adding with_source parameter to enable tracking source code (filename and line) in profiler for eager, torchscript and autograd modes Test Plan: python test/test_profiler.py Output: https://gist.github.com/ilia-cher/bd13a3ab0c4905ce58b0079df66711fa Benchmark: updated benchmark: profiler https://gist.github.com/ilia-cher/ff7064a311790590ffa8445c95d4d45b as expected, no change when no stacks are used, slower when stacks are used (in particular python is slower than TS callstack) overhead: https://gist.github.com/ilia-cher/8a2861863d3033b56181d23ba4cfbda1 overall as expected: small increase in cpu instructions (0.7%), small increase in tls accesses expected (bc of function.h change) https://gist.github.com/ilia-cher/9c7901670fb48082af92386b1889e2fe Differential Revision: [D23432664](https://our.internmc.facebook.com/intern/diff/D23432664) [ghstack-poisoned]
Summary: Adding with_source parameter to enable tracking source code (filename and line) in profiler for eager, torchscript and autograd modes Test Plan: python test/test_profiler.py Output: https://gist.github.com/ilia-cher/bd13a3ab0c4905ce58b0079df66711fa Benchmark: updated benchmark: profiler https://gist.github.com/ilia-cher/ff7064a311790590ffa8445c95d4d45b as expected, no change when no stacks are used, slower when stacks are used (in particular python is slower than TS callstack) overhead: https://gist.github.com/ilia-cher/8a2861863d3033b56181d23ba4cfbda1 overall as expected: small increase in cpu instructions (0.7%), small increase in tls accesses expected (bc of function.h change) https://gist.github.com/ilia-cher/9c7901670fb48082af92386b1889e2fe Differential Revision: [D23432664](https://our.internmc.facebook.com/intern/diff/D23432664) [ghstack-poisoned]
robieta
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perf looks good; variation seems to just be noise. I'll defer to others about the rest of the changes.
Summary: Adding with_source parameter to enable tracking source code (filename and line) in profiler for eager, torchscript and autograd modes Test Plan: python test/test_profiler.py Output: https://gist.github.com/ilia-cher/bd13a3ab0c4905ce58b0079df66711fa Benchmark: updated benchmark: profiler https://gist.github.com/ilia-cher/ff7064a311790590ffa8445c95d4d45b as expected, no change when no stacks are used, slower when stacks are used (in particular python is slower than TS callstack) overhead: https://gist.github.com/ilia-cher/8a2861863d3033b56181d23ba4cfbda1 overall as expected: small increase in cpu instructions (0.7%), small increase in tls accesses expected (bc of function.h change) https://gist.github.com/ilia-cher/9c7901670fb48082af92386b1889e2fe Differential Revision: [D23432664](https://our.internmc.facebook.com/intern/diff/D23432664) [ghstack-poisoned]
Codecov Report
@@ Coverage Diff @@
## gh/ilia-cher/40/base #43898 +/- ##
=======================================================
Coverage ? 68.60%
=======================================================
Files ? 405
Lines ? 52021
Branches ? 0
=======================================================
Hits ? 35687
Misses ? 16334
Partials ? 0 Continue to review full report at Codecov.
|
Summary: Adding with_source parameter to enable tracking source code (filename and line) in profiler for eager, torchscript and autograd modes Test Plan: python test/test_profiler.py Output: https://gist.github.com/ilia-cher/bd13a3ab0c4905ce58b0079df66711fa Benchmark: updated benchmark: profiler https://gist.github.com/ilia-cher/ff7064a311790590ffa8445c95d4d45b as expected, no change when no stacks are used, slower when stacks are used (in particular python is slower than TS callstack) overhead: https://gist.github.com/ilia-cher/8a2861863d3033b56181d23ba4cfbda1 overall as expected: small increase in cpu instructions (0.7%), small increase in tls accesses expected (bc of function.h change) https://gist.github.com/ilia-cher/9c7901670fb48082af92386b1889e2fe Differential Revision: [D23432664](https://our.internmc.facebook.com/intern/diff/D23432664) [ghstack-poisoned]
|
@ilia-cher merged this pull request in f5c95d5. |
|
@ilia-cher merged this pull request in f5c95d5. |
Stack from ghstack:
Summary:
Adding with_source parameter to enable tracking source code
(filename and line) in profiler for eager, torchscript and autograd
modes
Test Plan:
python test/test_profiler.py
Output: https://gist.github.com/ilia-cher/bd13a3ab0c4905ce58b0079df66711fa
Benchmark:
updated benchmark: profiler https://gist.github.com/ilia-cher/ff7064a311790590ffa8445c95d4d45b
as expected, no change when no stacks are used, slower when stacks are used (in particular python is slower than TS callstack)
overhead: https://gist.github.com/ilia-cher/8a2861863d3033b56181d23ba4cfbda1
overall as expected:
small increase in cpu instructions (0.7%), small increase in tls accesses expected (bc of function.h change)
https://gist.github.com/ilia-cher/9c7901670fb48082af92386b1889e2fe
https://gist.github.com/ilia-cher/69a01a6ef294511a797e8cf4a1e6db75
Differential Revision: D23432664