-
Notifications
You must be signed in to change notification settings - Fork 26.3k
track number of cpp->python exceptions thrown in torch.compile benchmark suite #131481
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…ark suite [ghstack-poisoned]
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/131481
Note: Links to docs will display an error until the docs builds have been completed. ❌ 42 New Failures, 1 Cancelled Job, 14 Unrelated FailuresAs of commit 09b09fb with merge base 932ae13 ( NEW FAILURES - The following jobs have failed:
CANCELLED JOB - The following job was cancelled. Please retry:
BROKEN TRUNK - The following jobs failed but were present on the merge base:👉 Rebase onto the `viable/strict` branch to avoid these failures
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
…pile benchmark suite" cpp->python exceptions can be slow - **especially* if you are running pytorch with infra that tries to symbolize C++ stacktraces. This PR attempts to: (1) add a counter every time pybind catches a C++ exception that enters python (2) track the value of this counter in our benchmark suite, so we can hopefully drive it to zero. It might be the case that we have very few (or even zero) of these cases showing up in torchbench. Emprically from internal, it also appears that this can show up when torch.compiling custom ops. So if few-or-zero exceptions are encountered in torchbench, we should also add some tests for avoiding C++ exceptions in the common path of torch.compile usage with custom ops. [ghstack-poisoned]
| TORCH_API void runJITCPPTests(); | ||
| #endif | ||
|
|
||
| static thread_local uint64_t cpp_to_python_translated_exception_count{0}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My thought process here was:
(1) I figured this can't be a plain uint64_t, since multiple threads may be raising exceptions and can increment the counter.
(2) I originally made it a std::atomic<uint64_t>, but then I didn't want to risk the increment code being very slow, and exacerbating slowness problems in cases where there are many cpp-to-python exceptions. Alternatively, we could make it slow but gate it behind a config / env var, but this seems like a nice metric to be able to track in an always-on way.
(3) thread_local will under-count if multiple threads are all raising exceptions. My hope is that we are very unlikely to hit a situation where the main thread raises no cpp-to-python exception, but a side thread raises a large number (although if this happens, then... we will not be able to track it in metrics).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't worry about (2), you're already in slow pain with exceptions, the atomic is a rounding error.
…pile benchmark suite" cpp->python exceptions can be slow - **especially* if you are running pytorch with infra that tries to symbolize C++ stacktraces. This PR attempts to: (1) add a counter every time pybind catches a C++ exception that enters python (2) track the value of this counter in our benchmark suite, so we can hopefully drive it to zero. It might be the case that we have very few (or even zero) of these cases showing up in torchbench. Emprically from internal, it also appears that this can show up when torch.compiling custom ops. So if few-or-zero exceptions are encountered in torchbench, we should also add some tests for avoiding C++ exceptions in the common path of torch.compile usage with custom ops. [ghstack-poisoned]
|
I gave up looking for a way to grab the output from CI, so I just ran a script that adds the extra column to every existing csv we check in (set to zero - if there are any exceptions, I'll expect CI to fail) |
…pile benchmark suite" cpp->python exceptions can be slow - **especially* if you are running pytorch with infra that tries to symbolize C++ stacktraces. This PR attempts to: (1) add a counter every time pybind catches a C++ exception that enters python (2) track the value of this counter in our benchmark suite, so we can hopefully drive it to zero. It might be the case that we have very few (or even zero) of these cases showing up in torchbench. Emprically from internal, it also appears that this can show up when torch.compiling custom ops. So if few-or-zero exceptions are encountered in torchbench, we should also add some tests for avoiding C++ exceptions in the common path of torch.compile usage with custom ops. [ghstack-poisoned]
…pile benchmark suite" cpp->python exceptions can be slow - **especially* if you are running pytorch with infra that tries to symbolize C++ stacktraces. This PR attempts to: (1) add a counter every time pybind catches a C++ exception that enters python (2) track the value of this counter in our benchmark suite, so we can hopefully drive it to zero. It might be the case that we have very few (or even zero) of these cases showing up in torchbench. Emprically from internal, it also appears that this can show up when torch.compiling custom ops. So if few-or-zero exceptions are encountered in torchbench, we should also add some tests for avoiding C++ exceptions in the common path of torch.compile usage with custom ops. [ghstack-poisoned]
…pile benchmark suite" cpp->python exceptions can be slow - **especially* if you are running pytorch with infra that tries to symbolize C++ stacktraces. This PR attempts to: (1) add a counter every time pybind catches a C++ exception that enters python (2) track the value of this counter in our benchmark suite, so we can hopefully drive it to zero. It might be the case that we have very few (or even zero) of these cases showing up in torchbench. Emprically from internal, it also appears that this can show up when torch.compiling custom ops. So if few-or-zero exceptions are encountered in torchbench, we should also add some tests for avoiding C++ exceptions in the common path of torch.compile usage with custom ops. [ghstack-poisoned]
…pile benchmark suite" cpp->python exceptions can be slow - **especially* if you are running pytorch with infra that tries to symbolize C++ stacktraces. This PR attempts to: (1) add a counter every time pybind catches a C++ exception that enters python (2) track the value of this counter in our benchmark suite, so we can hopefully drive it to zero. It might be the case that we have very few (or even zero) of these cases showing up in torchbench. Emprically from internal, it also appears that this can show up when torch.compiling custom ops. So if few-or-zero exceptions are encountered in torchbench, we should also add some tests for avoiding C++ exceptions in the common path of torch.compile usage with custom ops. [ghstack-poisoned]
…pile benchmark suite" cpp->python exceptions can be slow - **especially* if you are running pytorch with infra that tries to symbolize C++ stacktraces. This PR attempts to: (1) add a counter every time pybind catches a C++ exception that enters python (2) track the value of this counter in our benchmark suite, so we can hopefully drive it to zero. It might be the case that we have very few (or even zero) of these cases showing up in torchbench. Emprically from internal, it also appears that this can show up when torch.compiling custom ops. So if few-or-zero exceptions are encountered in torchbench, we should also add some tests for avoiding C++ exceptions in the common path of torch.compile usage with custom ops. [ghstack-poisoned]
…pile benchmark suite" cpp->python exceptions can be slow - **especially* if you are running pytorch with infra that tries to symbolize C++ stacktraces. This PR attempts to: (1) add a counter every time pybind catches a C++ exception that enters python (2) track the value of this counter in our benchmark suite, so we can hopefully drive it to zero. It might be the case that we have very few (or even zero) of these cases showing up in torchbench. Emprically from internal, it also appears that this can show up when torch.compiling custom ops. So if few-or-zero exceptions are encountered in torchbench, we should also add some tests for avoiding C++ exceptions in the common path of torch.compile usage with custom ops. [ghstack-poisoned]
…pile benchmark suite" cpp->python exceptions can be slow - **especially* if you are running pytorch with infra that tries to symbolize C++ stacktraces. This PR attempts to: (1) add a counter every time pybind catches a C++ exception that enters python (2) track the value of this counter in our benchmark suite, so we can hopefully drive it to zero. It might be the case that we have very few (or even zero) of these cases showing up in torchbench. Emprically from internal, it also appears that this can show up when torch.compiling custom ops. So if few-or-zero exceptions are encountered in torchbench, we should also add some tests for avoiding C++ exceptions in the common path of torch.compile usage with custom ops. cc albanD voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx chenyang78 kadeng chauhang amjames [ghstack-poisoned]
…pile benchmark suite" cpp->python exceptions can be slow - **especially* if you are running pytorch with infra that tries to symbolize C++ stacktraces. This PR attempts to: (1) add a counter every time pybind catches a C++ exception that enters python (2) track the value of this counter in our benchmark suite, so we can hopefully drive it to zero. It might be the case that we have very few (or even zero) of these cases showing up in torchbench. Emprically from internal, it also appears that this can show up when torch.compiling custom ops. So if few-or-zero exceptions are encountered in torchbench, we should also add some tests for avoiding C++ exceptions in the common path of torch.compile usage with custom ops. cc albanD voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx chenyang78 kadeng chauhang amjames [ghstack-poisoned]
…pile benchmark suite" cpp->python exceptions can be slow - **especially* if you are running pytorch with infra that tries to symbolize C++ stacktraces. This PR attempts to: (1) add a counter every time pybind catches a C++ exception that enters python (2) track the value of this counter in our benchmark suite, so we can hopefully drive it to zero. It might be the case that we have very few (or even zero) of these cases showing up in torchbench. Emprically from internal, it also appears that this can show up when torch.compiling custom ops. So if few-or-zero exceptions are encountered in torchbench, we should also add some tests for avoiding C++ exceptions in the common path of torch.compile usage with custom ops. cc albanD voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx chenyang78 kadeng chauhang amjames [ghstack-poisoned]
…pile benchmark suite" cpp->python exceptions can be slow - **especially* if you are running pytorch with infra that tries to symbolize C++ stacktraces. This PR attempts to: (1) add a counter every time pybind catches a C++ exception that enters python (2) track the value of this counter in our benchmark suite, so we can hopefully drive it to zero. It might be the case that we have very few (or even zero) of these cases showing up in torchbench. Emprically from internal, it also appears that this can show up when torch.compiling custom ops. So if few-or-zero exceptions are encountered in torchbench, we should also add some tests for avoiding C++ exceptions in the common path of torch.compile usage with custom ops. cc albanD voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx chenyang78 kadeng chauhang amjames [ghstack-poisoned]
…pile benchmark suite" cpp->python exceptions can be slow - **especially* if you are running pytorch with infra that tries to symbolize C++ stacktraces. This PR attempts to: (1) add a counter every time pybind catches a C++ exception that enters python (2) track the value of this counter in our benchmark suite, so we can hopefully drive it to zero. It might be the case that we have very few (or even zero) of these cases showing up in torchbench. Emprically from internal, it also appears that this can show up when torch.compiling custom ops. So if few-or-zero exceptions are encountered in torchbench, we should also add some tests for avoiding C++ exceptions in the common path of torch.compile usage with custom ops. cc albanD voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx chenyang78 kadeng chauhang amjames [ghstack-poisoned]
…pile benchmark suite" cpp->python exceptions can be slow - **especially* if you are running pytorch with infra that tries to symbolize C++ stacktraces. This PR attempts to: (1) add a counter every time pybind catches a C++ exception that enters python (2) track the value of this counter in our benchmark suite, so we can hopefully drive it to zero. It might be the case that we have very few (or even zero) of these cases showing up in torchbench. Emprically from internal, it also appears that this can show up when torch.compiling custom ops. So if few-or-zero exceptions are encountered in torchbench, we should also add some tests for avoiding C++ exceptions in the common path of torch.compile usage with custom ops. cc albanD voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx chenyang78 kadeng chauhang amjames [ghstack-poisoned]
…pile benchmark suite" cpp->python exceptions can be slow - **especially* if you are running pytorch with infra that tries to symbolize C++ stacktraces. This PR attempts to: (1) add a counter every time pybind catches a C++ exception that enters python (2) track the value of this counter in our benchmark suite, so we can hopefully drive it to zero. It might be the case that we have very few (or even zero) of these cases showing up in torchbench. Emprically from internal, it also appears that this can show up when torch.compiling custom ops. So if few-or-zero exceptions are encountered in torchbench, we should also add some tests for avoiding C++ exceptions in the common path of torch.compile usage with custom ops. cc albanD voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx chenyang78 kadeng chauhang amjames [ghstack-poisoned]
…pile benchmark suite" cpp->python exceptions can be slow - **especially* if you are running pytorch with infra that tries to symbolize C++ stacktraces. This PR attempts to: (1) add a counter every time pybind catches a C++ exception that enters python (2) track the value of this counter in our benchmark suite, so we can hopefully drive it to zero. It might be the case that we have very few (or even zero) of these cases showing up in torchbench. Emprically from internal, it also appears that this can show up when torch.compiling custom ops. So if few-or-zero exceptions are encountered in torchbench, we should also add some tests for avoiding C++ exceptions in the common path of torch.compile usage with custom ops. cc albanD voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx chenyang78 kadeng chauhang amjames [ghstack-poisoned]
…pile benchmark suite" cpp->python exceptions can be slow - **especially* if you are running pytorch with infra that tries to symbolize C++ stacktraces. This PR attempts to: (1) add a counter every time pybind catches a C++ exception that enters python (2) track the value of this counter in our benchmark suite, so we can hopefully drive it to zero. It might be the case that we have very few (or even zero) of these cases showing up in torchbench. Emprically from internal, it also appears that this can show up when torch.compiling custom ops. So if few-or-zero exceptions are encountered in torchbench, we should also add some tests for avoiding C++ exceptions in the common path of torch.compile usage with custom ops. cc albanD voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx chenyang78 kadeng chauhang amjames [ghstack-poisoned]
…pile benchmark suite" cpp->python exceptions can be slow - **especially* if you are running pytorch with infra that tries to symbolize C++ stacktraces. This PR attempts to: (1) add a counter every time pybind catches a C++ exception that enters python (2) track the value of this counter in our benchmark suite, so we can hopefully drive it to zero. It might be the case that we have very few (or even zero) of these cases showing up in torchbench. Emprically from internal, it also appears that this can show up when torch.compiling custom ops. So if few-or-zero exceptions are encountered in torchbench, we should also add some tests for avoiding C++ exceptions in the common path of torch.compile usage with custom ops. cc albanD voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx chenyang78 kadeng chauhang amjames [ghstack-poisoned]
…pile benchmark suite" cpp->python exceptions can be slow - **especially* if you are running pytorch with infra that tries to symbolize C++ stacktraces. This PR attempts to: (1) add a counter every time pybind catches a C++ exception that enters python (2) track the value of this counter in our benchmark suite, so we can hopefully drive it to zero. It might be the case that we have very few (or even zero) of these cases showing up in torchbench. Emprically from internal, it also appears that this can show up when torch.compiling custom ops. So if few-or-zero exceptions are encountered in torchbench, we should also add some tests for avoiding C++ exceptions in the common path of torch.compile usage with custom ops. cc albanD voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx chenyang78 kadeng chauhang amjames [ghstack-poisoned]
|
Looks like this PR hasn't been updated in a while so we're going to go ahead and mark this as |
cpp->python exceptions can be slow - *especially if you are running pytorch with infra that tries to symbolize C++ stacktraces.
This PR attempts to:
(1) add a counter every time pybind catches a C++ exception that enters python
(2) track the value of this counter in our benchmark suite, so we can hopefully drive it to zero.
It might be the case that we have very few (or even zero) of these cases showing up in torchbench. Emprically from internal, it also appears that this can show up when torch.compiling custom ops. So if few-or-zero exceptions are encountered in torchbench, we should also add some tests for avoiding C++ exceptions in the common path of torch.compile usage with custom ops.
Stack from ghstack (oldest at bottom):
cc @albanD @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @chenyang78 @kadeng @chauhang @amjames