add self cuda time to avoid double/quadruple counting #45209

ngimel · 2020-09-23T17:17:12Z

In profiler, cuda did not report self time, so for composite functions there was no way to determine which function is really taking time. In addition, "total cuda time" reported was frequently more than total wallclock time. This PR adds "self CUDA time" in profiler, and computes total cuda time based on self cuda time, similar to how it's done for CPU. Also, slight formatting changes to make table more compact. Before:

--------------------  ---------------  ---------------  ---------------  ---------------  ---------------  ---------------  ---------------  ---------------  ---------------  
Name                  Self CPU total %  Self CPU total   CPU total %      CPU total        CPU time avg     CUDA total %     CUDA total       CUDA time avg    Number of Calls  
--------------------  ---------------  ---------------  ---------------  ---------------  ---------------  ---------------  ---------------  ---------------  ---------------  
aten::matmul          0.17%            890.805us        99.05%           523.401ms        5.234ms          49.91%           791.184ms        7.912ms          100              
aten::mm              98.09%           518.336ms        98.88%           522.511ms        5.225ms          49.89%           790.885ms        7.909ms          100              
aten::t               0.29%            1.530ms          0.49%            2.588ms          25.882us         0.07%            1.058ms          10.576us         100              
aten::view            0.46%            2.448ms          0.46%            2.448ms          12.238us         0.06%            918.936us        4.595us          200              
aten::transpose       0.13%            707.204us        0.20%            1.058ms          10.581us         0.03%            457.802us        4.578us          100              
aten::empty           0.14%            716.056us        0.14%            716.056us        7.161us          0.01%            185.694us        1.857us          100              
aten::as_strided      0.07%            350.935us        0.07%            350.935us        3.509us          0.01%            156.380us        1.564us          100              
aten::stride          0.65%            3.458ms          0.65%            3.458ms          11.527us         0.03%            441.258us        1.471us          300              
--------------------  ---------------  ---------------  ---------------  ---------------  ---------------  ---------------  ---------------  ---------------  ---------------  
Self CPU time total: 528.437ms
CUDA time total: 1.585s

Recorded timeit time:  789.0814 ms

Note recorded timeit time (with proper cuda syncs) is 2 times smaller than "CUDA time total" reported by profiler

After

--------------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  
                Name    Self CPU %      Self CPU   CPU total %     CPU total  CPU time avg     Self CUDA   Self CUDA %    CUDA total  CUDA time avg    # of Calls  
--------------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  
        aten::matmul         0.15%     802.716us        99.06%     523.548ms       5.235ms     302.451us         0.04%     791.151ms       7.912ms           100  
            aten::mm        98.20%     519.007ms        98.91%     522.745ms       5.227ms     790.225ms        99.63%     790.848ms       7.908ms           100  
             aten::t         0.27%       1.406ms         0.49%       2.578ms      25.783us     604.964us         0.08%       1.066ms      10.662us           100  
          aten::view         0.45%       2.371ms         0.45%       2.371ms      11.856us     926.281us         0.12%     926.281us       4.631us           200  
     aten::transpose         0.15%     783.462us         0.22%       1.173ms      11.727us     310.016us         0.04%     461.282us       4.613us           100  
         aten::empty         0.11%     591.603us         0.11%     591.603us       5.916us     176.566us         0.02%     176.566us       1.766us           100  
    aten::as_strided         0.07%     389.270us         0.07%     389.270us       3.893us     151.266us         0.02%     151.266us       1.513us           100  
        aten::stride         0.60%       3.147ms         0.60%       3.147ms      10.489us     446.451us         0.06%     446.451us       1.488us           300  
--------------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  
Self CPU time total: 528.498ms
CUDA time total: 793.143ms

Recorded timeit time:  788.9832 ms

dr-ci · 2020-09-23T17:27:01Z

💊 CI failures summary and remediations

As of commit 2edc11c (more details on the Dr. CI page):

2/2 failures possibly* introduced in this PR
- 2/2 non-CircleCI failure(s)

Extra GitHub checks: 1 failed

Failed: Codecov - codecov/patch

codecov.io: 1 failed

Failed: codecov/patch

This comment was automatically generated by Dr. CI (expand for details).

Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions on the GitHub issue tracker or post in the (internal) Dr. CI Users group.

See how this bot performed.

This comment has been revised 17 times.

codecov · 2020-09-23T21:24:50Z

Codecov Report

Merging #45209 into master will increase coverage by 0.00%.
The diff coverage is 45.00%.

@@           Coverage Diff           @@
##           master   #45209   +/-   ##
=======================================
  Coverage   68.05%   68.05%           
=======================================
  Files         396      396           
  Lines       51235    51242    +7     
=======================================
+ Hits        34867    34873    +6     
- Misses      16368    16369    +1

Impacted Files	Coverage Δ
...orch/testing/_internal/distributed/rpc/rpc_test.py	`26.12% <0.00%> (-0.02%)`	⬇️
torch/autograd/profiler.py	`78.55% <100.00%> (+0.25%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 993628c...2edc11c. Read the comment docs.

ngimel · 2020-09-24T17:20:24Z

torch/csrc/autograd/profiler.cpp

sorry about formatting changes, pretty much the only substantial change is record_cuda argument here and in popRange

ilia-cher

thanks! the change LG, some minor indent comment below

ilia-cher · 2020-09-25T02:53:49Z

torch/csrc/autograd/profiler.cpp

we might then later consolidate lists like this in one place (e.g. another list in ObservedOperators.cpp)

ilia-cher · 2020-09-25T02:59:29Z

torch/csrc/autograd/profiler.cpp

i've been meaning to fix the indent in my current pr, but now it seems here it causes everything below to have an extra indent, could you just shift this and all below back to the left?

Ok, I've undone changes to indentation, some formatting changes still remain.

facebook-github-bot

@ngimel has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

…ely cpu ops

facebook-github-bot

@ngimel has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

facebook-github-bot · 2020-09-29T06:16:37Z

@ngimel merged this pull request in 50b9110.

ngimel requested review from albanD and apaszke as code owners September 23, 2020 17:17

ngimel requested a review from ilia-cher September 23, 2020 17:17

ngimel commented Sep 24, 2020

View reviewed changes

ilia-cher approved these changes Sep 25, 2020

View reviewed changes

ngimel force-pushed the self_cuda_time branch from 897e0b7 to 71df32b Compare September 25, 2020 04:16

facebook-github-bot reviewed Sep 25, 2020

View reviewed changes

Natalia Gimelshein added 6 commits September 28, 2020 11:48

add self cuda time to avoid double/quadruple counting

61bcf84

lint

ebe642c

reduce overhead of cuda profiling by not creating cuda events for pur…

b15e92a

…ely cpu ops

clang-tidy fix

cdb97c0

indent

82e6d0b

fix rpc test to check only real cuda events

2edc11c

ngimel force-pushed the self_cuda_time branch from 71df32b to 2edc11c Compare September 28, 2020 19:31

ngimel requested review from mrshenli, pritamdamania87, rohan-varma and zhaojuanmao as code owners September 28, 2020 19:31

facebook-github-bot reviewed Sep 28, 2020

View reviewed changes

facebook-github-bot closed this in 50b9110 Sep 29, 2020

facebook-github-bot added the merged label Sep 29, 2020

mruberry added the Merged label Oct 28, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

add self cuda time to avoid double/quadruple counting #45209

add self cuda time to avoid double/quadruple counting #45209

Uh oh!

ngimel commented Sep 23, 2020 •

edited

Loading

Uh oh!

dr-ci bot commented Sep 23, 2020 •

edited

Loading

Uh oh!

codecov bot commented Sep 23, 2020 •

edited

Loading

Uh oh!

ngimel Sep 24, 2020

Uh oh!

ilia-cher left a comment

Uh oh!

ilia-cher Sep 25, 2020

Uh oh!

ilia-cher Sep 25, 2020 •

edited

Loading

Uh oh!

ngimel Sep 25, 2020

Uh oh!

facebook-github-bot left a comment

Uh oh!

facebook-github-bot left a comment

Uh oh!

facebook-github-bot commented Sep 29, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

add self cuda time to avoid double/quadruple counting #45209

add self cuda time to avoid double/quadruple counting #45209

Uh oh!

Conversation

ngimel commented Sep 23, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dr-ci bot commented Sep 23, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

💊 CI failures summary and remediations

Extra GitHub checks: 1 failed

codecov.io: 1 failed

Uh oh!

codecov bot commented Sep 23, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

ngimel Sep 24, 2020

Choose a reason for hiding this comment

Uh oh!

ilia-cher left a comment

Choose a reason for hiding this comment

Uh oh!

ilia-cher Sep 25, 2020

Choose a reason for hiding this comment

Uh oh!

ilia-cher Sep 25, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ngimel Sep 25, 2020

Choose a reason for hiding this comment

Uh oh!

facebook-github-bot left a comment

Choose a reason for hiding this comment

Uh oh!

facebook-github-bot left a comment

Choose a reason for hiding this comment

Uh oh!

facebook-github-bot commented Sep 29, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

ngimel commented Sep 23, 2020 •

edited

Loading

dr-ci bot commented Sep 23, 2020 •

edited

Loading

codecov bot commented Sep 23, 2020 •

edited

Loading

ilia-cher Sep 25, 2020 •

edited

Loading