Skip to content

Conversation

@mwootton
Copy link
Contributor

Changes in the ROCm runtime have improved hipEventRecord. The events no longer take ~4 usec to execute on the gpu stream, instead they appear instantaneous. If you record two events, with no other activity in between, then they will have the same timestamp and the elapsed duration will be 0.

The profiler uses hip/cuda event pairs to infer gpu execution times. It wraps functions whether they send work to the gpu or not. Functions that send no gpu work will show as having zero duration. Also they will show as running at the same time as neighboring functions. On a trace, all those functions combine into a 'call stack' that can be tens of functions tall (when indeed they should be sequential).

This patch suppresses recording the zero duration 'kernel' events, leaving only the CPU execution part. This means functions that do not use the GPU do not get an entry for how long they were using the GPU, which seams reasonable. This fixes the 'stacking' on traces. It also improves the signal to noise of the GPU trace beyond what was available previously.

This patch will not effect CUDA or legacy ROCm as those are not able to 'execute' eventRecord markers instantaneously.

@mwootton mwootton requested review from albanD and apaszke as code owners July 16, 2020 17:04
@smessmer smessmer added oncall: profiler profiler-related issues (cpu, gpu, kineto) triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module labels Jul 16, 2020
@mwootton
Copy link
Contributor Author

@sunway513
@jeffdaily

@jeffdaily jeffdaily added the module: rocm AMD GPU support for Pytorch label Jul 16, 2020
@jeffdaily
Copy link
Collaborator

@apaszke, @albanD, as owners of the file we're touching, we'd appreciate a review. The change is quite small.

Copy link
Collaborator

@albanD albanD left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Thanks for the PR!

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@albanD has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@albanD merged this pull request in 7eb71b4.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Merged module: rocm AMD GPU support for Pytorch oncall: profiler profiler-related issues (cpu, gpu, kineto) open source triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants