Skip to content

Conversation

@ilia-cher
Copy link
Contributor

@ilia-cher ilia-cher commented Aug 31, 2020

Stack from ghstack:

Summary:
Adding with_source parameter to enable tracking source code
(filename and line) in profiler for eager, torchscript and autograd
modes

Test Plan:
python test/test_profiler.py

Output: https://gist.github.com/ilia-cher/bd13a3ab0c4905ce58b0079df66711fa

Benchmark:
updated benchmark: profiler https://gist.github.com/ilia-cher/ff7064a311790590ffa8445c95d4d45b
as expected, no change when no stacks are used, slower when stacks are used (in particular python is slower than TS callstack)

overhead: https://gist.github.com/ilia-cher/8a2861863d3033b56181d23ba4cfbda1
overall as expected:
small increase in cpu instructions (0.7%), small increase in tls accesses expected (bc of function.h change)

https://gist.github.com/ilia-cher/9c7901670fb48082af92386b1889e2fe
https://gist.github.com/ilia-cher/69a01a6ef294511a797e8cf4a1e6db75

Differential Revision: D23432664

Summary:
Adding with_source parameter to enable tracking source code
(filename and line) in profiler for eager, torchscript and autograd
modes

Test Plan:
python test/test_profiler.py

Name                                 Self CPU total %  Self CPU total   CPU total %      CPU total        CPU time avg     Number of Calls  Source Location
-----------------------------------  ---------------  ---------------  ---------------  ---------------  ---------------  ---------------  --------------------------------------------
ts_method_1                          10.43%           235.364us        36.46%           822.920us        822.920us        1                test/test_profiler.py(70): test_source
aten::add                            7.52%            169.833us        8.88%            200.439us        200.439us        1                test/test_profiler.py(69): test_source
aten::normal_                        6.26%            141.380us        6.26%            141.380us        141.380us        1                test/test_profiler.py(67): test_source
aten::add                            5.80%            130.830us        8.41%            189.800us        63.267us         3                test/test_profiler.py(72): test_source
aten::sum                            5.02%            113.340us        8.39%            189.475us        189.475us        1                test/test_profiler.py(64): ts_method_1
aten::add                            4.58%            103.346us        6.33%            142.847us        142.847us        1                test/test_profiler.py(62): ts_method_1
aten::mul                            4.05%            91.498us         9.62%            217.113us        217.113us        1                test/test_profiler.py(71): test_source
aten::add                            4.03%            90.880us         5.60%            126.405us        126.405us        1                test/test_profiler.py(58): ts_method_2
aten::empty                          3.49%            78.735us         3.49%            78.735us         19.684us         4                test/test_profiler.py(72): test_source
Reviewers:
@facebook-github-bot facebook-github-bot added the oncall: jit Add this issue/PR to JIT oncall triage queue label Aug 31, 2020
@dr-ci
Copy link

dr-ci bot commented Aug 31, 2020

💊 CI failures summary and remediations

As of commit 196ba66 (more details on the Dr. CI page):



❄️ 1 failure tentatively classified as flaky

but reruns have not yet been triggered to confirm:

See CircleCI build pytorch_macos_10_13_py3_test (1/1)

Step: "Update Homebrew" (full log | diagnosis details | 🔁 rerun) ❄️

fatal: Could not read from remote repository.
Receiving objects:  98% (175/178) Receiving objects:  99% (177/178) Receiving objects: 100% (178/178) Receiving objects: 100% (178/178), 63.90 KiB | 10.65 MiB/s, done. 
Resolving deltas:  96% (89/92) Resolving deltas:  97% (90/92) Resolving deltas: 100% (92/92) Resolving deltas: 100% (92/92), completed with 85 local objects. 
From ssh://github.com/Homebrew/homebrew-cask-versions 
 + 15f6b44...90ed6b8 master     -> origin/master  (forced update) 
+ git reset --hard origin/master 
HEAD is now at 90ed6b8 Update microsoft-edge-beta from 86.0.622.19 to 86.0.622.28 (#9686) 
+ for path in '$(find /usr/local/Homebrew -type d -name .git)' 
+ cd /usr/local/Homebrew/Library/Taps/homebrew/homebrew-core/.git/.. 
+ git fetch --depth=1 origin 
Connection to github.com closed by remote host.  
fatal: Could not read from remote repository. 
 
Please make sure you have the correct access rights 
and the repository exists. 

ci.pytorch.org: 1 failed


This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions on the GitHub issue tracker or post in the (internal) Dr. CI Users group.

See how this bot performed.

This comment has been revised 140 times.

Copy link
Collaborator

@ngimel ngimel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please benchmark the added overheads and make sure it's acceptable with disabled profiling/profiling w/o source attribution.

Copy link
Collaborator

@dzhulgakov dzhulgakov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@suo @zdevito on TorchScript changes. I left some comments, but better suggestions are welcomed

Perf measurements are absolutely a must. Both when source attribution is off and when it's on

uint64_t sequence_nr,
edge_list&& next_edges = edge_list())
: sequence_nr_(sequence_nr),
thread_id_(at::RecordFunction::currentThreadId()),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc @albanD - is it a concern that we increase size of Node? Maybe we could do some packing of both indices in a single int64_t? I'm not sure how in general sequence_nrs work - are they thread local?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not something I like to see for sure. Though I never actually did a full benchmark of the impact on memory and speed.

Do you actually always need this thread id or just when the profiler is running?
If so, I think a nice way to guard this to only run when the profiler runs and store it in the metadata struct that we also use for anomaly mode to store this (you can make the python dict initialization lazy to improve perf when you just store and read the thread_id).

I'm not sure how in general sequence_nrs work - are they thread local?

Yes these are just thread local variables (not propagated by TLSState)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we then should either set or not set both of them, because seq. number without thread is not very useful;
if extra int64 is too much I'd better go with packing though - less runtime checks

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The autograd engine make heavy use of the seq number (that does not require the thread). So we have to set this one.

ilia-cher added 4 commits September 15, 2020 17:12
[wip] Source code level attribution in profiler

Summary:
Adding with_source parameter to enable tracking source code
(filename and line) in profiler for eager, torchscript and autograd
modes

Test Plan:
python test/test_profiler.py

Name                                 Self CPU total %  Self CPU total   CPU total %      CPU total        CPU time avg     Number of Calls  Source Location
-----------------------------------  ---------------  ---------------  ---------------  ---------------  ---------------  ---------------  --------------------------------------------
ts_method_1                          10.43%           235.364us        36.46%           822.920us        822.920us        1                test/test_profiler.py(70): test_source
aten::add                            7.52%            169.833us        8.88%            200.439us        200.439us        1                test/test_profiler.py(69): test_source
aten::normal_                        6.26%            141.380us        6.26%            141.380us        141.380us        1                test/test_profiler.py(67): test_source
aten::add                            5.80%            130.830us        8.41%            189.800us        63.267us         3                test/test_profiler.py(72): test_source
aten::sum                            5.02%            113.340us        8.39%            189.475us        189.475us        1                test/test_profiler.py(64): ts_method_1
aten::add                            4.58%            103.346us        6.33%            142.847us        142.847us        1                test/test_profiler.py(62): ts_method_1
aten::mul                            4.05%            91.498us         9.62%            217.113us        217.113us        1                test/test_profiler.py(71): test_source
aten::add                            4.03%            90.880us         5.60%            126.405us        126.405us        1                test/test_profiler.py(58): ts_method_2
aten::empty                          3.49%            78.735us         3.49%            78.735us         19.684us         4                test/test_profiler.py(72): test_source
Reviewers:

gh-metadata: pytorch pytorch 43898 gh/ilia-cher/40/head
[wip] Source code level attribution in profiler

Summary:
Adding with_source parameter to enable tracking source code
(filename and line) in profiler for eager, torchscript and autograd
modes

Test Plan:
python test/test_profiler.py

Name                                 Self CPU total %  Self CPU total   CPU total %      CPU total        CPU time avg     Number of Calls  Source Location
-----------------------------------  ---------------  ---------------  ---------------  ---------------  ---------------  ---------------  --------------------------------------------
ts_method_1                          10.43%           235.364us        36.46%           822.920us        822.920us        1                test/test_profiler.py(70): test_source
aten::add                            7.52%            169.833us        8.88%            200.439us        200.439us        1                test/test_profiler.py(69): test_source
aten::normal_                        6.26%            141.380us        6.26%            141.380us        141.380us        1                test/test_profiler.py(67): test_source
aten::add                            5.80%            130.830us        8.41%            189.800us        63.267us         3                test/test_profiler.py(72): test_source
aten::sum                            5.02%            113.340us        8.39%            189.475us        189.475us        1                test/test_profiler.py(64): ts_method_1
aten::add                            4.58%            103.346us        6.33%            142.847us        142.847us        1                test/test_profiler.py(62): ts_method_1
aten::mul                            4.05%            91.498us         9.62%            217.113us        217.113us        1                test/test_profiler.py(71): test_source
aten::add                            4.03%            90.880us         5.60%            126.405us        126.405us        1                test/test_profiler.py(58): ts_method_2
aten::empty                          3.49%            78.735us         3.49%            78.735us         19.684us         4                test/test_profiler.py(72): test_source
Reviewers:

gh-metadata: pytorch pytorch 43898 gh/ilia-cher/40/head
[wip] Source code level attribution in profiler

Summary:
Adding with_source parameter to enable tracking source code
(filename and line) in profiler for eager, torchscript and autograd
modes

Test Plan:
python test/test_profiler.py

Output:
https://gist.github.com/ilia-cher/bd13a3ab0c4905ce58b0079df66711fa

Reviewers:

gh-metadata: pytorch pytorch 43898 gh/ilia-cher/40/head
Source code level attribution in profiler

Summary:
Adding with_source parameter to enable tracking source code
(filename and line) in profiler for eager, torchscript and autograd
modes

Test Plan:
python test/test_profiler.py

Output:
https://gist.github.com/ilia-cher/bd13a3ab0c4905ce58b0079df66711fa

Reviewers:

gh-metadata: pytorch pytorch 43898 gh/ilia-cher/40/head
@ilia-cher ilia-cher changed the title [wip] Source code level attribution in profiler Source code level attribution in profiler Sep 16, 2020
ilia-cher added 2 commits September 16, 2020 02:35
Source code level attribution in profiler

Summary:
Adding with_source parameter to enable tracking source code
(filename and line) in profiler for eager, torchscript and autograd
modes

Test Plan:
python test/test_profiler.py

Output:
https://gist.github.com/ilia-cher/bd13a3ab0c4905ce58b0079df66711fa

Reviewers:

gh-metadata: pytorch pytorch 43898 gh/ilia-cher/40/head
Source code level attribution in profiler

Summary:
Adding with_source parameter to enable tracking source code
(filename and line) in profiler for eager, torchscript and autograd
modes

Test Plan:
python test/test_profiler.py

Output:
https://gist.github.com/ilia-cher/bd13a3ab0c4905ce58b0079df66711fa

Reviewers:

gh-metadata: pytorch pytorch 43898 gh/ilia-cher/40/head
@ilia-cher
Copy link
Contributor Author

addressed most comments, will upd with a new round of benchmarking

Copy link
Collaborator

@dzhulgakov dzhulgakov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These changes to TS look better, but @suo to double check

Source code level attribution in profiler

Summary:
Adding with_source parameter to enable tracking source code
(filename and line) in profiler for eager, torchscript and autograd
modes

Test Plan:
python test/test_profiler.py

Output:
https://gist.github.com/ilia-cher/bd13a3ab0c4905ce58b0079df66711fa

Reviewers:

gh-metadata: pytorch pytorch 43898 gh/ilia-cher/40/head
ilia-cher added 2 commits September 18, 2020 01:39
Source code level attribution in profiler

Summary:
Adding with_source parameter to enable tracking source code
(filename and line) in profiler for eager, torchscript and autograd
modes

Test Plan:
python test/test_profiler.py

Output:
https://gist.github.com/ilia-cher/bd13a3ab0c4905ce58b0079df66711fa

Reviewers:

gh-metadata: pytorch pytorch 43898 gh/ilia-cher/40/head
Source code level attribution in profiler

Summary:
Adding with_source parameter to enable tracking source code
(filename and line) in profiler for eager, torchscript and autograd
modes

Test Plan:
python test/test_profiler.py

Output:
https://gist.github.com/ilia-cher/bd13a3ab0c4905ce58b0079df66711fa

Reviewers:

gh-metadata: pytorch pytorch 43898 gh/ilia-cher/40/head
@ilia-cher
Copy link
Contributor Author

ilia-cher commented Sep 18, 2020

added profiler benchmark results - as expected when using with_stack=True it is slower, and the slow down is less when using TorchScript

Summary:
Adding with_source parameter to enable tracking source code
(filename and line) in profiler for eager, torchscript and autograd
modes

Test Plan:
python test/test_profiler.py

Output: https://gist.github.com/ilia-cher/bd13a3ab0c4905ce58b0079df66711fa

Benchmark:
updated benchmark: profiler https://gist.github.com/ilia-cher/ff7064a311790590ffa8445c95d4d45b
as expected, no change when no stacks are used, slower when stacks are used (in particular python is slower than TS callstack)

overhead: https://gist.github.com/ilia-cher/8a2861863d3033b56181d23ba4cfbda1
overall as expected:
small increase in cpu instructions (0.7%), small increase in tls accesses expected (bc of function.h change)

Differential Revision: [D23432664](https://our.internmc.facebook.com/intern/diff/D23432664)

[ghstack-poisoned]
@ilia-cher
Copy link
Contributor Author

lint,
also cc. @ngimel @dzhulgakov @robieta @suo @zdevito @jamesr66a - ready for another look

Summary:
Adding with_source parameter to enable tracking source code
(filename and line) in profiler for eager, torchscript and autograd
modes

Test Plan:
python test/test_profiler.py

Output: https://gist.github.com/ilia-cher/bd13a3ab0c4905ce58b0079df66711fa

Benchmark:
updated benchmark: profiler https://gist.github.com/ilia-cher/ff7064a311790590ffa8445c95d4d45b
as expected, no change when no stacks are used, slower when stacks are used (in particular python is slower than TS callstack)

overhead: https://gist.github.com/ilia-cher/8a2861863d3033b56181d23ba4cfbda1
overall as expected:
small increase in cpu instructions (0.7%), small increase in tls accesses expected (bc of function.h change)

Differential Revision: [D23432664](https://our.internmc.facebook.com/intern/diff/D23432664)

[ghstack-poisoned]
Copy link
Collaborator

@dzhulgakov dzhulgakov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, your overhead measurements are actually show 6-8% slow perf with no profiling:
eager: 0.895ms -> 0.965ms
script: 0.585ms -> 0.625ms

That feels actually like a lot. Did you bisect where is it coming from? For eager mode there's even no interpreter changes - it must be from the RecordFunction itself then?

size_t source_line = 0;
std::stringstream stack_trace;

std::vector<StackEntry> pythonCallstack() {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: it's a bit weird that it lives in python_tracer (kind of unrelated), maybe move to some more generic file? (not critical)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'll check if there's a better location available, if not we'd prob. better not create a new one..

Copy link
Collaborator

@ngimel ngimel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have small docs suggestions, other than that 6% overhead does seem like a lot, but it's not obvious where it's coming from.

@ilia-cher
Copy link
Contributor Author

Hm, your overhead measurements are actually show 6-8% slow perf with no profiling:
eager: 0.895ms -> 0.965ms
script: 0.585ms -> 0.625ms

That feels actually like a lot. Did you bisect where is it coming from? For eager mode there's even no interpreter changes - it must be from the RecordFunction itself then?

Re. runtimes - I included them mainly to show the rough difference between with stack / without stack profiling, and python vs TS callstacks, but for other purposes it is noisy, so I included valgrind based comparison for base vs trunk default mode.

Also reran benchmark (1000 times, no profiling, PR vs base, eager/script, different machine):

USE_MKLDNN=1 BLAS=MKL BUILD_BINARY=1 USE_CUDA=0 python setup.py develop install --cmake

pr_eager: (each line - 1000 runs of benchmark)
Overall: avg 0.811 ms, median: 0.773, min: 0.739, stddev: 0.136 ms
Overall: avg 0.774 ms, median: 0.768, min: 0.746, stddev: 0.030 ms
Overall: avg 0.837 ms, median: 0.742, min: 0.716, stddev: 0.226 ms
Overall: avg 0.870 ms, median: 0.736, min: 0.712, stddev: 0.266 ms
Overall: avg 0.786 ms, median: 0.779, min: 0.720, stddev: 0.031 ms

pr_script:
Overall: avg 0.649 ms, median: 0.552, min: 0.527, stddev: 0.156 ms
Overall: avg 0.546 ms, median: 0.540, min: 0.510, stddev: 0.025 ms
Overall: avg 0.575 ms, median: 0.537, min: 0.520, stddev: 0.108 ms
Overall: avg 0.542 ms, median: 0.535, min: 0.521, stddev: 0.028 ms
Overall: avg 0.585 ms, median: 0.543, min: 0.521, stddev: 0.110 ms

base_eager:
Overall: avg 0.764 ms, median: 0.751, min: 0.731, stddev: 0.042 ms
Overall: avg 0.808 ms, median: 0.782, min: 0.765, stddev: 0.115 ms
Overall: avg 0.884 ms, median: 0.844, min: 0.811, stddev: 0.151 ms
Overall: avg 0.802 ms, median: 0.791, min: 0.775, stddev: 0.033 ms
Overall: avg 0.815 ms, median: 0.806, min: 0.776, stddev: 0.031 ms

base_script:
Overall: avg 0.706 ms, median: 0.588, min: 0.571, stddev: 0.184 ms
Overall: avg 0.590 ms, median: 0.582, min: 0.571, stddev: 0.031 ms
Overall: avg 0.587 ms, median: 0.580, min: 0.570, stddev: 0.030 ms
Overall: avg 0.637 ms, median: 0.596, min: 0.583, stddev: 0.109 ms
Overall: avg 0.702 ms, median: 0.612, min: 0.584, stddev: 0.162 ms

https://gist.github.com/ilia-cher/a3974d77664628f0f42d27b616738013

So there's no stat. sig. information about runtime for base vs PR, though callgrind stats seem to give reproducible stats (https://gist.github.com/ilia-cher/8a2861863d3033b56181d23ba4cfbda1) e.g. they show increase in tls accesses (expected bc of if (profiler::profilerEnabled()) { thread_id_ = at::RecordFunction::currentThreadId(); } and some rel. small increase in cpu instructions.

Summary:
Adding with_source parameter to enable tracking source code
(filename and line) in profiler for eager, torchscript and autograd
modes

Test Plan:
python test/test_profiler.py

Output: https://gist.github.com/ilia-cher/bd13a3ab0c4905ce58b0079df66711fa

Benchmark:
updated benchmark: profiler https://gist.github.com/ilia-cher/ff7064a311790590ffa8445c95d4d45b
as expected, no change when no stacks are used, slower when stacks are used (in particular python is slower than TS callstack)

overhead: https://gist.github.com/ilia-cher/8a2861863d3033b56181d23ba4cfbda1
overall as expected:
small increase in cpu instructions (0.7%), small increase in tls accesses expected (bc of function.h change)

Differential Revision: [D23432664](https://our.internmc.facebook.com/intern/diff/D23432664)

[ghstack-poisoned]
@ilia-cher ilia-cher mentioned this pull request Sep 29, 2020
Summary:
Adding with_source parameter to enable tracking source code
(filename and line) in profiler for eager, torchscript and autograd
modes

Test Plan:
python test/test_profiler.py

Output: https://gist.github.com/ilia-cher/bd13a3ab0c4905ce58b0079df66711fa

Benchmark:
updated benchmark: profiler https://gist.github.com/ilia-cher/ff7064a311790590ffa8445c95d4d45b
as expected, no change when no stacks are used, slower when stacks are used (in particular python is slower than TS callstack)

overhead: https://gist.github.com/ilia-cher/8a2861863d3033b56181d23ba4cfbda1
overall as expected:
small increase in cpu instructions (0.7%), small increase in tls accesses expected (bc of function.h change)

Differential Revision: [D23432664](https://our.internmc.facebook.com/intern/diff/D23432664)

[ghstack-poisoned]
@ilia-cher
Copy link
Contributor Author

rest of comments + added support for @robieta Timer into profiler_bench.py

ilia-cher added 2 commits September 29, 2020 02:43
Summary:
Adding with_source parameter to enable tracking source code
(filename and line) in profiler for eager, torchscript and autograd
modes

Test Plan:
python test/test_profiler.py

Output: https://gist.github.com/ilia-cher/bd13a3ab0c4905ce58b0079df66711fa

Benchmark:
updated benchmark: profiler https://gist.github.com/ilia-cher/ff7064a311790590ffa8445c95d4d45b
as expected, no change when no stacks are used, slower when stacks are used (in particular python is slower than TS callstack)

overhead: https://gist.github.com/ilia-cher/8a2861863d3033b56181d23ba4cfbda1
overall as expected:
small increase in cpu instructions (0.7%), small increase in tls accesses expected (bc of function.h change)

Differential Revision: [D23432664](https://our.internmc.facebook.com/intern/diff/D23432664)

[ghstack-poisoned]
Summary:
Adding with_source parameter to enable tracking source code
(filename and line) in profiler for eager, torchscript and autograd
modes

Test Plan:
python test/test_profiler.py

Output: https://gist.github.com/ilia-cher/bd13a3ab0c4905ce58b0079df66711fa

Benchmark:
updated benchmark: profiler https://gist.github.com/ilia-cher/ff7064a311790590ffa8445c95d4d45b
as expected, no change when no stacks are used, slower when stacks are used (in particular python is slower than TS callstack)

overhead: https://gist.github.com/ilia-cher/8a2861863d3033b56181d23ba4cfbda1
overall as expected:
small increase in cpu instructions (0.7%), small increase in tls accesses expected (bc of function.h change)

Differential Revision: [D23432664](https://our.internmc.facebook.com/intern/diff/D23432664)

[ghstack-poisoned]
@ilia-cher
Copy link
Contributor Author

example of benchmarking with --use_timer (also varies)

<torch.utils._benchmark.utils.common.Measurement object at 0x7f2cc961ee80>
payload()
  Median: 787.50 us
  IQR:    62.62 us (766.31 to 828.93)
  111 measurements, 1000 runs per measurement, 1 thread
<torch.utils._benchmark.utils.common.Measurement object at 0x7fd6fde9fe80>
payload()
  Median: 913.64 us
  IQR:    33.24 us (903.73 to 936.96)
  105 measurements, 1000 runs per measurement, 1 thread
<torch.utils._benchmark.utils.common.Measurement object at 0x7f721a1e0e80>
payload()
  Median: 768.92 us
  IQR:    22.46 us (759.44 to 781.90)
  127 measurements, 1000 runs per measurement, 1 thread
<torch.utils._benchmark.utils.common.Measurement object at 0x7f6d0490de80>
payload()
  Median: 821.17 us
  IQR:    12.48 us (815.65 to 828.13)
  122 measurements, 1000 runs per measurement, 1 thread
<torch.utils._benchmark.utils.common.Measurement object at 0x7f038ce2de80>
payload()
  Median: 762.76 us
  IQR:    38.57 us (752.31 to 790.88)
  122 measurements, 1000 runs per measurement, 1 thread

@robieta
Copy link
Contributor

robieta commented Sep 29, 2020

example of benchmarking with --use_timer (also varies)

<torch.utils._benchmark.utils.common.Measurement object at 0x7f2cc961ee80>
payload()
  Median: 787.50 us
  IQR:    62.62 us (766.31 to 828.93)
  111 measurements, 1000 runs per measurement, 1 thread
<torch.utils._benchmark.utils.common.Measurement object at 0x7fd6fde9fe80>
payload()
  Median: 913.64 us
  IQR:    33.24 us (903.73 to 936.96)
  105 measurements, 1000 runs per measurement, 1 thread
<torch.utils._benchmark.utils.common.Measurement object at 0x7f721a1e0e80>
payload()
  Median: 768.92 us
  IQR:    22.46 us (759.44 to 781.90)
  127 measurements, 1000 runs per measurement, 1 thread
<torch.utils._benchmark.utils.common.Measurement object at 0x7f6d0490de80>
payload()
  Median: 821.17 us
  IQR:    12.48 us (815.65 to 828.13)
  122 measurements, 1000 runs per measurement, 1 thread
<torch.utils._benchmark.utils.common.Measurement object at 0x7f038ce2de80>
payload()
  Median: 762.76 us
  IQR:    38.57 us (752.31 to 790.88)
  122 measurements, 1000 runs per measurement, 1 thread

And these are 100 second runs each? If so, that's a noisy system.

@ilia-cher
Copy link
Contributor Author

example of benchmarking with --use_timer (also varies)

<torch.utils._benchmark.utils.common.Measurement object at 0x7f2cc961ee80>
payload()
  Median: 787.50 us
  IQR:    62.62 us (766.31 to 828.93)
  111 measurements, 1000 runs per measurement, 1 thread
<torch.utils._benchmark.utils.common.Measurement object at 0x7fd6fde9fe80>
payload()
  Median: 913.64 us
  IQR:    33.24 us (903.73 to 936.96)
  105 measurements, 1000 runs per measurement, 1 thread
<torch.utils._benchmark.utils.common.Measurement object at 0x7f721a1e0e80>
payload()
  Median: 768.92 us
  IQR:    22.46 us (759.44 to 781.90)
  127 measurements, 1000 runs per measurement, 1 thread
<torch.utils._benchmark.utils.common.Measurement object at 0x7f6d0490de80>
payload()
  Median: 821.17 us
  IQR:    12.48 us (815.65 to 828.13)
  122 measurements, 1000 runs per measurement, 1 thread
<torch.utils._benchmark.utils.common.Measurement object at 0x7f038ce2de80>
payload()
  Median: 762.76 us
  IQR:    38.57 us (752.31 to 790.88)
  122 measurements, 1000 runs per measurement, 1 thread

And these are 100 second runs each? If so, that's a noisy system.

yes, it is 100s

ilia-cher added 2 commits September 29, 2020 13:54
Summary:
Adding with_source parameter to enable tracking source code
(filename and line) in profiler for eager, torchscript and autograd
modes

Test Plan:
python test/test_profiler.py

Output: https://gist.github.com/ilia-cher/bd13a3ab0c4905ce58b0079df66711fa

Benchmark:
updated benchmark: profiler https://gist.github.com/ilia-cher/ff7064a311790590ffa8445c95d4d45b
as expected, no change when no stacks are used, slower when stacks are used (in particular python is slower than TS callstack)

overhead: https://gist.github.com/ilia-cher/8a2861863d3033b56181d23ba4cfbda1
overall as expected:
small increase in cpu instructions (0.7%), small increase in tls accesses expected (bc of function.h change)

https://gist.github.com/ilia-cher/9c7901670fb48082af92386b1889e2fe

Differential Revision: [D23432664](https://our.internmc.facebook.com/intern/diff/D23432664)

[ghstack-poisoned]
Summary:
Adding with_source parameter to enable tracking source code
(filename and line) in profiler for eager, torchscript and autograd
modes

Test Plan:
python test/test_profiler.py

Output: https://gist.github.com/ilia-cher/bd13a3ab0c4905ce58b0079df66711fa

Benchmark:
updated benchmark: profiler https://gist.github.com/ilia-cher/ff7064a311790590ffa8445c95d4d45b
as expected, no change when no stacks are used, slower when stacks are used (in particular python is slower than TS callstack)

overhead: https://gist.github.com/ilia-cher/8a2861863d3033b56181d23ba4cfbda1
overall as expected:
small increase in cpu instructions (0.7%), small increase in tls accesses expected (bc of function.h change)

https://gist.github.com/ilia-cher/9c7901670fb48082af92386b1889e2fe

Differential Revision: [D23432664](https://our.internmc.facebook.com/intern/diff/D23432664)

[ghstack-poisoned]
Copy link
Contributor

@robieta robieta left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perf looks good; variation seems to just be noise. I'll defer to others about the rest of the changes.

Summary:
Adding with_source parameter to enable tracking source code
(filename and line) in profiler for eager, torchscript and autograd
modes

Test Plan:
python test/test_profiler.py

Output: https://gist.github.com/ilia-cher/bd13a3ab0c4905ce58b0079df66711fa

Benchmark:
updated benchmark: profiler https://gist.github.com/ilia-cher/ff7064a311790590ffa8445c95d4d45b
as expected, no change when no stacks are used, slower when stacks are used (in particular python is slower than TS callstack)

overhead: https://gist.github.com/ilia-cher/8a2861863d3033b56181d23ba4cfbda1
overall as expected:
small increase in cpu instructions (0.7%), small increase in tls accesses expected (bc of function.h change)

https://gist.github.com/ilia-cher/9c7901670fb48082af92386b1889e2fe

Differential Revision: [D23432664](https://our.internmc.facebook.com/intern/diff/D23432664)

[ghstack-poisoned]
@codecov
Copy link

codecov bot commented Sep 30, 2020

Codecov Report

❗ No coverage uploaded for pull request base (gh/ilia-cher/40/base@eea4a04). Click here to learn what that means.
The diff coverage is n/a.

Impacted file tree graph

@@                   Coverage Diff                   @@
##             gh/ilia-cher/40/base   #43898   +/-   ##
=======================================================
  Coverage                        ?   68.60%           
=======================================================
  Files                           ?      405           
  Lines                           ?    52021           
  Branches                        ?        0           
=======================================================
  Hits                            ?    35687           
  Misses                          ?    16334           
  Partials                        ?        0           

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update eea4a04...196ba66. Read the comment docs.

Summary:
Adding with_source parameter to enable tracking source code
(filename and line) in profiler for eager, torchscript and autograd
modes

Test Plan:
python test/test_profiler.py

Output: https://gist.github.com/ilia-cher/bd13a3ab0c4905ce58b0079df66711fa

Benchmark:
updated benchmark: profiler https://gist.github.com/ilia-cher/ff7064a311790590ffa8445c95d4d45b
as expected, no change when no stacks are used, slower when stacks are used (in particular python is slower than TS callstack)

overhead: https://gist.github.com/ilia-cher/8a2861863d3033b56181d23ba4cfbda1
overall as expected:
small increase in cpu instructions (0.7%), small increase in tls accesses expected (bc of function.h change)

https://gist.github.com/ilia-cher/9c7901670fb48082af92386b1889e2fe

Differential Revision: [D23432664](https://our.internmc.facebook.com/intern/diff/D23432664)

[ghstack-poisoned]
@facebook-github-bot
Copy link
Contributor

@ilia-cher merged this pull request in f5c95d5.

@facebook-github-bot
Copy link
Contributor

@ilia-cher merged this pull request in f5c95d5.

@facebook-github-bot facebook-github-bot deleted the gh/ilia-cher/40/head branch October 3, 2020 14:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Merged oncall: jit Add this issue/PR to JIT oncall triage queue

Projects

None yet

Development

Successfully merging this pull request may close these issues.

10 participants