-
Notifications
You must be signed in to change notification settings - Fork 26.3k
[JIT] Optionally validate nvfuser outputs after execution #74361
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This adds an optional validation after executing an NVFuser node, which checks that the output is the same as the unfused implementation.o
```python
import torch
def callback(amt, x, y):
for i in range(len(x)-amt, len(x)):
print(x[i])
print(y[i])
with torch.jit.fuser("fuser2"):
torch._C._jit_nvfuser_set_comparison_callback(callback)
@torch.jit.script
def g(x, y):
z = torch.add(x, y)
return torch.sin(z)
def f(x, y, a):
z = torch.add(x, y)
return g(torch.relu(z), a)
f_s = torch.jit.script(f)
x = torch.rand((10, 10), dtype=torch.half).cuda()
y = torch.rand((10, 10), dtype=torch.half).cuda()
a = torch.rand((10, 10), dtype=torch.half).cuda()
f_s(x, y, a)
f_s(x, y, a)
f_s(x, y, a)
```
[ghstack-poisoned]
CI Flow Status⚛️ CI FlowRuleset - Version:
|
🔗 Helpful links
💊 CI failures summary and remediationsAs of commit 478c54b (more details on the Dr. CI page):
🕵️♀️ 2 failures not recognized by patterns:The following CI failures may be due to changes from the PR
This comment was automatically generated by Dr. CI (expand for details).Please report bugs/suggestions to the (internal) Dr. CI Users group. |
This adds an optional validation after executing an NVFuser node, which checks that the output is the same as the unfused implementation.o
```python
import torch
def callback(amt, x, y):
for i in range(len(x)-amt, len(x)):
print(x[i])
print(y[i])
with torch.jit.fuser("fuser2"):
torch._C._jit_nvfuser_set_comparison_callback(callback)
torch.jit.script
def g(x, y):
z = torch.add(x, y)
return torch.sin(z)
def f(x, y, a):
z = torch.add(x, y)
return g(torch.relu(z), a)
f_s = torch.jit.script(f)
x = torch.rand((10, 10), dtype=torch.half).cuda()
y = torch.rand((10, 10), dtype=torch.half).cuda()
a = torch.rand((10, 10), dtype=torch.half).cuda()
f_s(x, y, a)
f_s(x, y, a)
f_s(x, y, a)
```
ghstack-source-id: d41601c
Pull Request resolved: #74361
…tion"
This adds an optional validation after executing an NVFuser node, which checks that the output is the same as the unfused implementation.o
```python
import torch
def callback(amt, x, y):
for i in range(len(x)-amt, len(x)):
print(x[i])
print(y[i])
with torch.jit.fuser("fuser2"):
torch._C._jit_nvfuser_set_comparison_callback(callback)
torch.jit.script
def g(x, y):
z = torch.add(x, y)
return torch.sin(z)
def f(x, y, a):
z = torch.add(x, y)
return g(torch.relu(z), a)
f_s = torch.jit.script(f)
x = torch.rand((10, 10), dtype=torch.half).cuda()
y = torch.rand((10, 10), dtype=torch.half).cuda()
a = torch.rand((10, 10), dtype=torch.half).cuda()
f_s(x, y, a)
f_s(x, y, a)
f_s(x, y, a)
```
[ghstack-poisoned]
|
@davidberard98 has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator. |
…tion"
This adds an optional validation after executing an NVFuser node, which checks that the output is the same as the unfused implementation.o
```python
import torch
def callback(amt, x, y):
for i in range(len(x)-amt, len(x)):
print(x[i])
print(y[i])
with torch.jit.fuser("fuser2"):
torch._C._jit_nvfuser_set_comparison_callback(callback)
torch.jit.script
def g(x, y):
z = torch.add(x, y)
return torch.sin(z)
def f(x, y, a):
z = torch.add(x, y)
return g(torch.relu(z), a)
f_s = torch.jit.script(f)
x = torch.rand((10, 10), dtype=torch.half).cuda()
y = torch.rand((10, 10), dtype=torch.half).cuda()
a = torch.rand((10, 10), dtype=torch.half).cuda()
f_s(x, y, a)
f_s(x, y, a)
f_s(x, y, a)
```
Differential Revision: [D34975310](https://our.internmc.facebook.com/intern/diff/D34975310)
[ghstack-poisoned]
This adds an optional validation after executing an NVFuser node, which checks that the output is the same as the unfused implementation.o
```python
import torch
def callback(amt, x, y):
for i in range(len(x)-amt, len(x)):
print(x[i])
print(y[i])
with torch.jit.fuser("fuser2"):
torch._C._jit_nvfuser_set_comparison_callback(callback)
torch.jit.script
def g(x, y):
z = torch.add(x, y)
return torch.sin(z)
def f(x, y, a):
z = torch.add(x, y)
return g(torch.relu(z), a)
f_s = torch.jit.script(f)
x = torch.rand((10, 10), dtype=torch.half).cuda()
y = torch.rand((10, 10), dtype=torch.half).cuda()
a = torch.rand((10, 10), dtype=torch.half).cuda()
f_s(x, y, a)
f_s(x, y, a)
f_s(x, y, a)
```
ghstack-source-id: 86899be
Pull Request resolved: #74361
This adds an optional validation after executing an NVFuser node, which checks that the output is the same as the unfused implementation. Then the outputs and the graph are reported via a callback.
```python
import torch
def callback(x, y, graph):
for i in range(len(x)-amt, len(x)):
print(x[i])
print(y[i])
print(graph)
with torch.jit.fuser("fuser2"):
torch._C._jit_nvfuser_set_comparison_callback(True, callback)
torch.jit.script
def g(x, y):
z = torch.add(x, y)
return torch.sin(z)
def f(x, y, a):
z = torch.add(x, y)
return g(torch.relu(z), a)
f_s = torch.jit.script(f)
x = torch.rand((10, 10), dtype=torch.half).cuda()
y = torch.rand((10, 10), dtype=torch.half).cuda()
a = torch.rand((10, 10), dtype=torch.half).cuda()
f_s(x, y, a)
f_s(x, y, a)
f_s(x, y, a)
```
Differential Revision: [D34975310](https://our.internmc.facebook.com/intern/diff/D34975310)
[ghstack-poisoned]
|
@davidberard98 has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator. |
This adds an optional validation after executing an NVFuser node, which checks that the output is the same as the unfused implementation. Then the outputs and the graph are reported via a callback.
```python
import torch
def callback(x, y, graph):
for i in range(len(x)-amt, len(x)):
print(x[i])
print(y[i])
print(graph)
with torch.jit.fuser("fuser2"):
torch._C._jit_nvfuser_set_comparison_callback(True, callback)
torch.jit.script
def g(x, y):
z = torch.add(x, y)
return torch.sin(z)
def f(x, y, a):
z = torch.add(x, y)
return g(torch.relu(z), a)
f_s = torch.jit.script(f)
x = torch.rand((10, 10), dtype=torch.half).cuda()
y = torch.rand((10, 10), dtype=torch.half).cuda()
a = torch.rand((10, 10), dtype=torch.half).cuda()
f_s(x, y, a)
f_s(x, y, a)
f_s(x, y, a)
```
Differential Revision: [D34975310](https://our.internmc.facebook.com/intern/diff/D34975310)
[ghstack-poisoned]
This adds an optional validation after executing an NVFuser node, which checks that the output is the same as the unfused implementation.o
```python
import torch
def callback(amt, x, y):
for i in range(len(x)-amt, len(x)):
print(x[i])
print(y[i])
with torch.jit.fuser("fuser2"):
torch._C._jit_nvfuser_set_comparison_callback(callback)
torch.jit.script
def g(x, y):
z = torch.add(x, y)
return torch.sin(z)
def f(x, y, a):
z = torch.add(x, y)
return g(torch.relu(z), a)
f_s = torch.jit.script(f)
x = torch.rand((10, 10), dtype=torch.half).cuda()
y = torch.rand((10, 10), dtype=torch.half).cuda()
a = torch.rand((10, 10), dtype=torch.half).cuda()
f_s(x, y, a)
f_s(x, y, a)
f_s(x, y, a)
```
ghstack-source-id: 6224fcd
Pull Request resolved: #74361
|
@davidberard98 has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator. |
eellison
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sweet!! LGTM - just a couple small comments. CC @jjsjann123 @kevinstephano may want to review as well
| auto copied_graph = fusion_node->g(attr::Subgraph)->copy(); | ||
| EraseShapeInformation(copied_graph); | ||
| enableAliasCopyNodes(copied_graph, copied_graph->block()); | ||
| InterpreterState{Code(copied_graph, "fallback_cuda_fuser")}.run(stack); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should really cache and save the InterpreterState code here. This fallback actually does get run in real networks, and this whole processes happens on each invocation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That probably shouldn't be in this pr - but something to do soon
| struct CudaFuserComparisonCallback { | ||
| using callback_type = | ||
| std::function<void(const Stack&, const Stack&, const std::string&)>; | ||
| bool run_fallback; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When would this ever be false ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@eellison if you just want to collect the graph (e.g. for logging fusion groups), without running the fallback
This adds an optional validation after executing an NVFuser node, which checks that the output is the same as the unfused implementation. Then the outputs and the graph are reported via a callback.
```python
import torch
def callback(x, y, graph):
for i in range(len(x)-amt, len(x)):
print(x[i])
print(y[i])
print(graph)
with torch.jit.fuser("fuser2"):
torch._C._jit_nvfuser_set_comparison_callback(True, callback)
torch.jit.script
def g(x, y):
z = torch.add(x, y)
return torch.sin(z)
def f(x, y, a):
z = torch.add(x, y)
return g(torch.relu(z), a)
f_s = torch.jit.script(f)
x = torch.rand((10, 10), dtype=torch.half).cuda()
y = torch.rand((10, 10), dtype=torch.half).cuda()
a = torch.rand((10, 10), dtype=torch.half).cuda()
f_s(x, y, a)
f_s(x, y, a)
f_s(x, y, a)
```
Differential Revision: [D34975310](https://our.internmc.facebook.com/intern/diff/D34975310)
[ghstack-poisoned]
This adds an optional validation after executing an NVFuser node, which checks that the output is the same as the unfused implementation.o
```python
import torch
def callback(amt, x, y):
for i in range(len(x)-amt, len(x)):
print(x[i])
print(y[i])
with torch.jit.fuser("fuser2"):
torch._C._jit_nvfuser_set_comparison_callback(callback)
torch.jit.script
def g(x, y):
z = torch.add(x, y)
return torch.sin(z)
def f(x, y, a):
z = torch.add(x, y)
return g(torch.relu(z), a)
f_s = torch.jit.script(f)
x = torch.rand((10, 10), dtype=torch.half).cuda()
y = torch.rand((10, 10), dtype=torch.half).cuda()
a = torch.rand((10, 10), dtype=torch.half).cuda()
f_s(x, y, a)
f_s(x, y, a)
f_s(x, y, a)
```
ghstack-source-id: dbd66e3
Pull Request resolved: #74361
|
@davidberard98 has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator. |
| TORCH_INTERNAL_ASSERT(stack.size() >= inputs_size); | ||
| stack_copy = Stack(); | ||
| stack_copy->insert( | ||
| stack_copy->end(), stack.begin(), stack.end() - inputs_size); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we assert on !stack_copy.empty()? Since we later (L297) only use to determine whether we re-run fallback.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
stack_copy is a c10::optional<> so conversion to bool will return true as long as it has a value - I can change the later if statement to if (stack_copy.has_value()) for clarity
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry I missed the optional part... code looks fine as-is.
jjsjann123
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
| TORCH_INTERNAL_ASSERT(stack.size() >= inputs_size); | ||
| stack_copy = Stack(); | ||
| stack_copy->insert( | ||
| stack_copy->end(), stack.begin(), stack.end() - inputs_size); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry I missed the optional part... code looks fine as-is.
This adds an optional validation after executing an NVFuser node, which checks that the output is the same as the unfused implementation. Then the outputs and the graph are reported via a callback.
```python
import torch
def callback(x, y, graph):
for i in range(len(x)-amt, len(x)):
print(x[i])
print(y[i])
print(graph)
with torch.jit.fuser("fuser2"):
torch._C._jit_nvfuser_set_comparison_callback(True, callback)
torch.jit.script
def g(x, y):
z = torch.add(x, y)
return torch.sin(z)
def f(x, y, a):
z = torch.add(x, y)
return g(torch.relu(z), a)
f_s = torch.jit.script(f)
x = torch.rand((10, 10), dtype=torch.half).cuda()
y = torch.rand((10, 10), dtype=torch.half).cuda()
a = torch.rand((10, 10), dtype=torch.half).cuda()
f_s(x, y, a)
f_s(x, y, a)
f_s(x, y, a)
```
Differential Revision: [D34975310](https://our.internmc.facebook.com/intern/diff/D34975310)
[ghstack-poisoned]
This adds an optional validation after executing an NVFuser node, which checks that the output is the same as the unfused implementation.o
```python
import torch
def callback(amt, x, y):
for i in range(len(x)-amt, len(x)):
print(x[i])
print(y[i])
with torch.jit.fuser("fuser2"):
torch._C._jit_nvfuser_set_comparison_callback(callback)
torch.jit.script
def g(x, y):
z = torch.add(x, y)
return torch.sin(z)
def f(x, y, a):
z = torch.add(x, y)
return g(torch.relu(z), a)
f_s = torch.jit.script(f)
x = torch.rand((10, 10), dtype=torch.half).cuda()
y = torch.rand((10, 10), dtype=torch.half).cuda()
a = torch.rand((10, 10), dtype=torch.half).cuda()
f_s(x, y, a)
f_s(x, y, a)
f_s(x, y, a)
```
ghstack-source-id: 51f3a0e
Pull Request resolved: #74361
|
@davidberard98 has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator. |
This adds an optional validation after executing an NVFuser node, which checks that the output is the same as the unfused implementation. Then the outputs and the graph are reported via a callback.
```python
import torch
def callback(x, y, graph):
for i in range(len(x)-amt, len(x)):
print(x[i])
print(y[i])
print(graph)
with torch.jit.fuser("fuser2"):
torch._C._jit_nvfuser_set_comparison_callback(True, callback)
torch.jit.script
def g(x, y):
z = torch.add(x, y)
return torch.sin(z)
def f(x, y, a):
z = torch.add(x, y)
return g(torch.relu(z), a)
f_s = torch.jit.script(f)
x = torch.rand((10, 10), dtype=torch.half).cuda()
y = torch.rand((10, 10), dtype=torch.half).cuda()
a = torch.rand((10, 10), dtype=torch.half).cuda()
f_s(x, y, a)
f_s(x, y, a)
f_s(x, y, a)
```
Differential Revision: [D34975310](https://our.internmc.facebook.com/intern/diff/D34975310)
[ghstack-poisoned]
|
@davidberard98 has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator. |
Summary: Pull Request resolved: #74361 This adds an optional validation after executing an NVFuser node, which checks that the output is the same as the unfused implementation. Then the outputs and the graph are reported via a callback. ```python import torch def callback(x, y, graph): for i in range(len(x)-amt, len(x)): print(x[i]) print(y[i]) print(graph) with torch.jit.fuser("fuser2"): torch._C._jit_nvfuser_set_comparison_callback(True, callback) torch.jit.script def g(x, y): z = torch.add(x, y) return torch.sin(z) def f(x, y, a): z = torch.add(x, y) return g(torch.relu(z), a) f_s = torch.jit.script(f) x = torch.rand((10, 10), dtype=torch.half).cuda() y = torch.rand((10, 10), dtype=torch.half).cuda() a = torch.rand((10, 10), dtype=torch.half).cuda() f_s(x, y, a) f_s(x, y, a) f_s(x, y, a) ``` Test Plan: Imported from OSS Reviewed By: eellison Differential Revision: D34975310 Pulled By: davidberard98 fbshipit-source-id: 2379c9a6f371cd58da6a187c1f16882f3923ab24
Summary: Pull Request resolved: pytorch/pytorch#74361 This adds an optional validation after executing an NVFuser node, which checks that the output is the same as the unfused implementation. Then the outputs and the graph are reported via a callback. ```python import torch def callback(x, y, graph): for i in range(len(x)-amt, len(x)): print(x[i]) print(y[i]) print(graph) with torch.jit.fuser("fuser2"): torch._C._jit_nvfuser_set_comparison_callback(True, callback) torch.jit.script def g(x, y): z = torch.add(x, y) return torch.sin(z) def f(x, y, a): z = torch.add(x, y) return g(torch.relu(z), a) f_s = torch.jit.script(f) x = torch.rand((10, 10), dtype=torch.half).cuda() y = torch.rand((10, 10), dtype=torch.half).cuda() a = torch.rand((10, 10), dtype=torch.half).cuda() f_s(x, y, a) f_s(x, y, a) f_s(x, y, a) ``` Test Plan: Imported from OSS Reviewed By: eellison Differential Revision: D34975310 Pulled By: davidberard98 fbshipit-source-id: 2379c9a6f371cd58da6a187c1f16882f3923ab24 (cherry picked from commit 96c87992c65f5e6bb1bdd51791682dd837af99b4)
Summary: Pull Request resolved: pytorch/pytorch#74361 This adds an optional validation after executing an NVFuser node, which checks that the output is the same as the unfused implementation. Then the outputs and the graph are reported via a callback. ```python import torch def callback(x, y, graph): for i in range(len(x)-amt, len(x)): print(x[i]) print(y[i]) print(graph) with torch.jit.fuser("fuser2"): torch._C._jit_nvfuser_set_comparison_callback(True, callback) torch.jit.script def g(x, y): z = torch.add(x, y) return torch.sin(z) def f(x, y, a): z = torch.add(x, y) return g(torch.relu(z), a) f_s = torch.jit.script(f) x = torch.rand((10, 10), dtype=torch.half).cuda() y = torch.rand((10, 10), dtype=torch.half).cuda() a = torch.rand((10, 10), dtype=torch.half).cuda() f_s(x, y, a) f_s(x, y, a) f_s(x, y, a) ``` Test Plan: Imported from OSS Reviewed By: eellison Differential Revision: D34975310 Pulled By: davidberard98 fbshipit-source-id: 2379c9a6f371cd58da6a187c1f16882f3923ab24 (cherry picked from commit 96c87992c65f5e6bb1bdd51791682dd837af99b4)
Stack from ghstack:
This adds an optional validation after executing an NVFuser node, which checks that the output is the same as the unfused implementation. Then the outputs and the graph are reported via a callback.
Differential Revision: D34975310