Skip to content

Conversation

@louisfeng
Copy link
Contributor

Summary: Add tracing to DPP client. Because DPP requests are async, we need to be able to start a trace event in one thread and potentially end in a different thread. RecordFunction and LibgpumonObserver previously assume each trace event starts and finishes in the same thread. So they use a thread local context to track enter and exit call backs. Async events breaks this assumption. This change attaches the event context to the RecordFunction object so we do not need to use thread local context.

Test Plan:
Tested with dpp perf test and able to collect trace.

{F307824044}

Differential Revision: D23323486

Summary: Add tracing to DPP client. Because DPP requests are async, we need to be able to start a trace event in one thread and potentially end in a different thread. RecordFunction and LibgpumonObserver previously assume each trace event starts and finishes in the same thread. So they use a thread local context to track enter and exit call backs. Async events breaks this assumption. This change attaches the event context to the RecordFunction object so we do not need to use thread local context.

Test Plan:
Tested with dpp perf test and able to collect trace.

{F307824044}

Differential Revision: D23323486

fbshipit-source-id: 7700fddee73c0e6e89d2e8b3a863cf56316999d6
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D23323486

@dr-ci
Copy link

dr-ci bot commented Sep 5, 2020

💊 CI failures summary and remediations

As of commit 78bce48 (more details on the Dr. CI page):


  • 1/3 failures possibly* introduced in this PR
    • 1/1 non-CircleCI failure(s)
  • 2/3 broken upstream at merge base 15a7368 since Sep 04

🚧 2 ongoing upstream failures:

These were probably caused by upstream breakages that are not fixed yet:


ci.pytorch.org: 1 failed


This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions on the GitHub issue tracker or post in the (internal) Dr. CI Users group.

See how this bot performed.

This comment has been revised 2 times.

@louisfeng louisfeng changed the title DPP Async Tracing RecordFunction Async Tracing Sep 6, 2020
@facebook-github-bot
Copy link
Contributor

This pull request has been merged in 71673b3.

@mruberry
Copy link
Collaborator

Unlanding because I suspect this broke several Windows builds.

..\test\cpp\jit\test_misc.cpp(1051): error C3493: 'test_val' cannot be implicitly captured because no default capture mode has been specified
..\test\cpp\jit\test_misc.cpp(1060): error C3493: 'test_val' cannot be implicitly captured because no default capture mode has been specified
..\test\cpp\jit\test_misc.cpp(1048): error C2440: '<function-style-cast>': cannot convert from 'initializer list' to 'at::RecordFunctionCallback'
..\test\cpp\jit\test_misc.cpp(1062): note: No constructor could take the source type, or constructor overload resolution was ambiguous
..\test\cpp\jit\test_misc.cpp(1048): error C2064: term does not evaluate to a function taking 2 arguments
..\test\cpp\jit\test_misc.cpp(1077): error C3493: 'test_val' cannot be implicitly captured because no default capture mode has been specified
..\test\cpp\jit\test_misc.cpp(1086): error C3493: 'test_val' cannot be implicitly captured because no default capture mode has been specified
..\test\cpp\jit\test_misc.cpp(1074): error C2440: '<function-style-cast>': cannot convert from 'initializer list' to 'at::RecordFunctionCallback'
..\test\cpp\jit\test_misc.cpp(1088): note: No constructor could take the source type, or constructor overload resolution was ambiguous
..\test\cpp\jit\test_misc.cpp(1074): error C2064: term does not evaluate to a function taking 2 arguments

The PR doesn't have the changes to test_misc, but it looks like the diff does. Maybe all the diff changes weren't exported properly to Github for testing?

xuzhao9 pushed a commit that referenced this pull request Sep 18, 2020
Summary:
Pull Request resolved: #44252

Add tracing to DPP client. Because DPP requests are async, we need to be able to start a trace event in one thread and potentially end in a different thread. RecordFunction and LibgpumonObserver previously assume each trace event starts and finishes in the same thread. So they use a thread local context to track enter and exit call backs. Async events breaks this assumption. This change attaches the event context to the RecordFunction object so we do not need to use thread local context.

Test Plan:
Tested with dpp perf test and able to collect trace.

{F307824044}

Reviewed By: ilia-cher

Differential Revision: D23323486

fbshipit-source-id: 4b6ca6c0e32028fb38a476cd1f44c17a001fc03b
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants