[JIT] Optionally validate nvfuser outputs after execution #74361

davidberard98 · 2022-03-17T02:32:01Z

Stack from ghstack:

This adds an optional validation after executing an NVFuser node, which checks that the output is the same as the unfused implementation. Then the outputs and the graph are reported via a callback.

import torch

def callback(x, y, graph):
    for i in range(len(x)-amt, len(x)):
        print(x[i])
        print(y[i])
    print(graph)


with torch.jit.fuser("fuser2"):
    torch._C._jit_nvfuser_set_comparison_callback(True, callback)

    @torch.jit.script
    def g(x, y):
        z = torch.add(x, y)
        return torch.sin(z)

    def f(x, y, a):
        z = torch.add(x, y)
        return g(torch.relu(z), a)

    f_s = torch.jit.script(f)
    x = torch.rand((10, 10), dtype=torch.half).cuda()
    y = torch.rand((10, 10), dtype=torch.half).cuda()
    a = torch.rand((10, 10), dtype=torch.half).cuda()
    f_s(x, y, a)
    f_s(x, y, a)
    f_s(x, y, a)

Differential Revision: D34975310

This adds an optional validation after executing an NVFuser node, which checks that the output is the same as the unfused implementation.o ```python import torch def callback(amt, x, y): for i in range(len(x)-amt, len(x)): print(x[i]) print(y[i]) with torch.jit.fuser("fuser2"): torch._C._jit_nvfuser_set_comparison_callback(callback) @torch.jit.script def g(x, y): z = torch.add(x, y) return torch.sin(z) def f(x, y, a): z = torch.add(x, y) return g(torch.relu(z), a) f_s = torch.jit.script(f) x = torch.rand((10, 10), dtype=torch.half).cuda() y = torch.rand((10, 10), dtype=torch.half).cuda() a = torch.rand((10, 10), dtype=torch.half).cuda() f_s(x, y, a) f_s(x, y, a) f_s(x, y, a) ``` [ghstack-poisoned]

pytorch-bot · 2022-03-17T02:32:06Z

CI Flow Status

⚛️ CI Flow

Ruleset - Version: v1
Ruleset - File: https://github.com/pytorch/pytorch/blob/5294b91738129be3e4917842b412181afd978eb7/.github/generated-ciflow-ruleset.json
PR ciflow labels: ciflow/default
Add ciflow labels to this PR to trigger more builds:

Workflows	Labels (bold enabled)	Status
Triggered Workflows
linux-binary-conda	`ciflow/binaries`, `ciflow/binaries_conda`, `ciflow/default`	✅ triggered
linux-binary-libtorch-cxx11-abi	`ciflow/all`, `ciflow/binaries`, `ciflow/binaries_libtorch`, `ciflow/default`, `ciflow/trunk`	✅ triggered
linux-binary-libtorch-pre-cxx11	`ciflow/all`, `ciflow/binaries`, `ciflow/binaries_libtorch`, `ciflow/default`, `ciflow/trunk`	✅ triggered
linux-binary-manywheel	`ciflow/all`, `ciflow/binaries`, `ciflow/binaries_wheel`, `ciflow/default`, `ciflow/trunk`	✅ triggered
linux-bionic-py3.7-clang9	`ciflow/all`, `ciflow/cpu`, `ciflow/default`, `ciflow/linux`, `ciflow/noarch`, `ciflow/trunk`	✅ triggered
linux-bionic-rocm4.5-py3.7	`ciflow/all`, `ciflow/default`, `ciflow/linux`, `ciflow/rocm`, `ciflow/trunk`	✅ triggered
linux-docs	`ciflow/all`, `ciflow/cpu`, `ciflow/default`, `ciflow/docs`, `ciflow/linux`, `ciflow/trunk`	✅ triggered
linux-vulkan-bionic-py3.7-clang9	`ciflow/all`, `ciflow/cpu`, `ciflow/default`, `ciflow/linux`, `ciflow/trunk`, `ciflow/vulkan`	✅ triggered
linux-xenial-cuda11.3-py3.7-gcc7	`ciflow/all`, `ciflow/cuda`, `ciflow/default`, `ciflow/linux`, `ciflow/trunk`	✅ triggered
linux-xenial-cuda11.3-py3.7-gcc7-bazel-test	`ciflow/all`, `ciflow/bazel`, `ciflow/cpu`, `ciflow/default`, `ciflow/linux`, `ciflow/trunk`	✅ triggered
linux-xenial-py3-clang5-mobile-build	`ciflow/all`, `ciflow/default`, `ciflow/linux`, `ciflow/mobile`, `ciflow/trunk`	✅ triggered
linux-xenial-py3-clang5-mobile-custom-build-static	`ciflow/all`, `ciflow/default`, `ciflow/linux`, `ciflow/mobile`, `ciflow/trunk`	✅ triggered
linux-xenial-py3.7-clang7-asan	`ciflow/all`, `ciflow/cpu`, `ciflow/default`, `ciflow/linux`, `ciflow/sanitizers`, `ciflow/trunk`	✅ triggered
linux-xenial-py3.7-clang7-onnx	`ciflow/all`, `ciflow/cpu`, `ciflow/default`, `ciflow/linux`, `ciflow/onnx`, `ciflow/trunk`	✅ triggered
linux-xenial-py3.7-gcc5.4	`ciflow/all`, `ciflow/cpu`, `ciflow/default`, `ciflow/linux`, `ciflow/trunk`	✅ triggered
linux-xenial-py3.7-gcc5.4-mobile-lightweight-dispatch-build	`ciflow/all`, `ciflow/cpu`, `ciflow/default`, `ciflow/libtorch`, `ciflow/linux`, `ciflow/mobile`, `ciflow/trunk`	✅ triggered
linux-xenial-py3.7-gcc7	`ciflow/all`, `ciflow/cpu`, `ciflow/default`, `ciflow/linux`, `ciflow/trunk`	✅ triggered
linux-xenial-py3.7-gcc7-no-ops	`ciflow/all`, `ciflow/cpu`, `ciflow/default`, `ciflow/linux`, `ciflow/trunk`	✅ triggered
macos-arm64-binary-conda	`ciflow/binaries`, `ciflow/binaries_conda`, `ciflow/default`	✅ triggered
macos-arm64-binary-wheel	`ciflow/binaries`, `ciflow/binaries_wheel`, `ciflow/default`	✅ triggered
macos-binary-conda	`ciflow/binaries`, `ciflow/binaries_conda`, `ciflow/default`	✅ triggered
macos-binary-libtorch-cxx11-abi	`ciflow/binaries`, `ciflow/binaries_libtorch`, `ciflow/default`	✅ triggered
macos-binary-libtorch-pre-cxx11	`ciflow/binaries`, `ciflow/binaries_libtorch`, `ciflow/default`	✅ triggered
macos-binary-wheel	`ciflow/binaries`, `ciflow/binaries_wheel`, `ciflow/default`	✅ triggered
pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single	`ciflow/all`, `ciflow/android`, `ciflow/cpu`, `ciflow/default`, `ciflow/linux`, `ciflow/trunk`	✅ triggered
pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single-full-jit	`ciflow/all`, `ciflow/android`, `ciflow/cpu`, `ciflow/default`, `ciflow/linux`, `ciflow/trunk`	✅ triggered
win-vs2019-cpu-py3	`ciflow/all`, `ciflow/cpu`, `ciflow/default`, `ciflow/trunk`, `ciflow/win`	✅ triggered
win-vs2019-cuda11.3-py3	`ciflow/all`, `ciflow/cuda`, `ciflow/default`, `ciflow/trunk`, `ciflow/win`	✅ triggered
windows-binary-conda	`ciflow/binaries`, `ciflow/binaries_conda`, `ciflow/default`	✅ triggered
windows-binary-libtorch-debug	`ciflow/all`, `ciflow/binaries`, `ciflow/binaries_libtorch`, `ciflow/default`, `ciflow/trunk`	✅ triggered
windows-binary-libtorch-release	`ciflow/all`, `ciflow/binaries`, `ciflow/binaries_libtorch`, `ciflow/default`, `ciflow/trunk`	✅ triggered
windows-binary-wheel	`ciflow/all`, `ciflow/binaries`, `ciflow/binaries_wheel`, `ciflow/default`, `ciflow/trunk`	✅ triggered
Skipped Workflows
caffe2-linux-xenial-py3.7-gcc5.4	`ciflow/all`, `ciflow/cpu`, `ciflow/linux`, `ciflow/trunk`	🚫 skipped
docker-builds	`ciflow/all`, `ciflow/trunk`	🚫 skipped
ios-12-5-1-arm64	`ciflow/all`, `ciflow/ios`, `ciflow/macos`, `ciflow/scheduled`	🚫 skipped
ios-12-5-1-arm64-coreml	`ciflow/all`, `ciflow/ios`, `ciflow/macos`, `ciflow/scheduled`	🚫 skipped
ios-12-5-1-arm64-custom-ops	`ciflow/all`, `ciflow/ios`, `ciflow/macos`, `ciflow/scheduled`	🚫 skipped
ios-12-5-1-arm64-metal	`ciflow/all`, `ciflow/ios`, `ciflow/macos`, `ciflow/scheduled`	🚫 skipped
ios-12-5-1-x86-64	`ciflow/all`, `ciflow/ios`, `ciflow/macos`, `ciflow/trunk`	🚫 skipped
ios-12-5-1-x86-64-coreml	`ciflow/all`, `ciflow/ios`, `ciflow/macos`, `ciflow/trunk`	🚫 skipped
libtorch-linux-xenial-cuda10.2-py3.7-gcc7	`ciflow/all`, `ciflow/cuda`, `ciflow/libtorch`, `ciflow/linux`, `ciflow/trunk`	🚫 skipped
libtorch-linux-xenial-cuda11.3-py3.7-gcc7	`ciflow/all`, `ciflow/cuda`, `ciflow/libtorch`, `ciflow/linux`, `ciflow/trunk`	🚫 skipped
linux-bionic-cuda10.2-py3.9-gcc7	`ciflow/all`, `ciflow/cuda`, `ciflow/linux`, `ciflow/slow`, `ciflow/trunk`	🚫 skipped
linux-bionic-rocm4.5-py3.7-distributed	`ciflow/all`, `ciflow/linux`, `ciflow/rocm`, `ciflow/trunk`	🚫 skipped
linux-docs-push	`ciflow/all`, `ciflow/cpu`, `ciflow/linux`, `ciflow/scheduled`	🚫 skipped
linux-xenial-cuda11.3-py3.7-gcc7-no-ops	`ciflow/all`, `ciflow/cuda`, `ciflow/linux`, `ciflow/trunk`	🚫 skipped
macos-10-15-py3-arm64	`ciflow/all`, `ciflow/macos`, `ciflow/trunk`	🚫 skipped
macos-10-15-py3-lite-interpreter-x86-64	`ciflow/all`, `ciflow/macos`, `ciflow/trunk`	🚫 skipped
macos-11-py3-x86-64	`ciflow/all`, `ciflow/macos`, `ciflow/trunk`	🚫 skipped
parallelnative-linux-xenial-py3.7-gcc5.4	`ciflow/all`, `ciflow/cpu`, `ciflow/linux`, `ciflow/trunk`	🚫 skipped
periodic-libtorch-linux-bionic-cuda11.5-py3.7-gcc7	`ciflow/all`, `ciflow/cuda`, `ciflow/libtorch`, `ciflow/linux`, `ciflow/scheduled`	🚫 skipped
periodic-linux-bionic-cuda11.5-py3.7-gcc7	`ciflow/all`, `ciflow/cuda`, `ciflow/linux`, `ciflow/scheduled`	🚫 skipped
periodic-linux-xenial-cuda10.2-py3-gcc7-slow-gradcheck	`ciflow/all`, `ciflow/cuda`, `ciflow/linux`, `ciflow/scheduled`, `ciflow/slow`, `ciflow/slow-gradcheck`	🚫 skipped
periodic-linux-xenial-cuda11.3-py3.7-gcc7-debug	`ciflow/all`, `ciflow/cuda`, `ciflow/linux`, `ciflow/scheduled`	🚫 skipped
periodic-win-vs2019-cuda11.5-py3	`ciflow/all`, `ciflow/cuda`, `ciflow/scheduled`, `ciflow/win`	🚫 skipped
pytorch-linux-xenial-py3-clang5-android-ndk-r19c-build	`ciflow/all`, `ciflow/android`, `ciflow/cpu`, `ciflow/linux`, `ciflow/trunk`	🚫 skipped
pytorch-xla-linux-bionic-py3.7-clang8	`ciflow/all`, `ciflow/cpu`, `ciflow/linux`, `ciflow/trunk`, `ciflow/xla`	🚫 skipped

facebook-github-bot · 2022-03-17T02:32:08Z

🔗 Helpful links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/74361
📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
↩️ [fb-only] Re-run with SSH instructions
❓Need help or want to give feedback on the CI? Visit our office hours

💊 CI failures summary and remediations

As of commit 478c54b (more details on the Dr. CI page):

2/2 failures introduced in this PR

🕵️‍♀️ 2 failures not recognized by patterns:

The following CI failures may be due to changes from the PR

Job	Step	Action
^{pull / linux-bionic-rocm5.0-py3.7 / test (default, 2, 2, linux.rocm.gpu)}	^{Checkout PyTorch}	🔁 rerun
^{pull / linux-bionic-rocm5.0-py3.7 / test (default, 1, 2, linux.rocm.gpu)}	^{Checkout PyTorch}	🔁 rerun

This comment was automatically generated by Dr. CI (expand for details).

Please report bugs/suggestions to the (internal) Dr. CI Users group.

Click here to manually regenerate this comment.

This adds an optional validation after executing an NVFuser node, which checks that the output is the same as the unfused implementation.o ```python import torch def callback(amt, x, y): for i in range(len(x)-amt, len(x)): print(x[i]) print(y[i]) with torch.jit.fuser("fuser2"): torch._C._jit_nvfuser_set_comparison_callback(callback) torch.jit.script def g(x, y): z = torch.add(x, y) return torch.sin(z) def f(x, y, a): z = torch.add(x, y) return g(torch.relu(z), a) f_s = torch.jit.script(f) x = torch.rand((10, 10), dtype=torch.half).cuda() y = torch.rand((10, 10), dtype=torch.half).cuda() a = torch.rand((10, 10), dtype=torch.half).cuda() f_s(x, y, a) f_s(x, y, a) f_s(x, y, a) ``` ghstack-source-id: d41601c Pull Request resolved: #74361

…tion" This adds an optional validation after executing an NVFuser node, which checks that the output is the same as the unfused implementation.o ```python import torch def callback(amt, x, y): for i in range(len(x)-amt, len(x)): print(x[i]) print(y[i]) with torch.jit.fuser("fuser2"): torch._C._jit_nvfuser_set_comparison_callback(callback) torch.jit.script def g(x, y): z = torch.add(x, y) return torch.sin(z) def f(x, y, a): z = torch.add(x, y) return g(torch.relu(z), a) f_s = torch.jit.script(f) x = torch.rand((10, 10), dtype=torch.half).cuda() y = torch.rand((10, 10), dtype=torch.half).cuda() a = torch.rand((10, 10), dtype=torch.half).cuda() f_s(x, y, a) f_s(x, y, a) f_s(x, y, a) ``` [ghstack-poisoned]

davidberard98 · 2022-03-17T21:28:05Z

@davidberard98 has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

…tion" This adds an optional validation after executing an NVFuser node, which checks that the output is the same as the unfused implementation.o ```python import torch def callback(amt, x, y): for i in range(len(x)-amt, len(x)): print(x[i]) print(y[i]) with torch.jit.fuser("fuser2"): torch._C._jit_nvfuser_set_comparison_callback(callback) torch.jit.script def g(x, y): z = torch.add(x, y) return torch.sin(z) def f(x, y, a): z = torch.add(x, y) return g(torch.relu(z), a) f_s = torch.jit.script(f) x = torch.rand((10, 10), dtype=torch.half).cuda() y = torch.rand((10, 10), dtype=torch.half).cuda() a = torch.rand((10, 10), dtype=torch.half).cuda() f_s(x, y, a) f_s(x, y, a) f_s(x, y, a) ``` Differential Revision: [D34975310](https://our.internmc.facebook.com/intern/diff/D34975310) [ghstack-poisoned]

This adds an optional validation after executing an NVFuser node, which checks that the output is the same as the unfused implementation.o ```python import torch def callback(amt, x, y): for i in range(len(x)-amt, len(x)): print(x[i]) print(y[i]) with torch.jit.fuser("fuser2"): torch._C._jit_nvfuser_set_comparison_callback(callback) torch.jit.script def g(x, y): z = torch.add(x, y) return torch.sin(z) def f(x, y, a): z = torch.add(x, y) return g(torch.relu(z), a) f_s = torch.jit.script(f) x = torch.rand((10, 10), dtype=torch.half).cuda() y = torch.rand((10, 10), dtype=torch.half).cuda() a = torch.rand((10, 10), dtype=torch.half).cuda() f_s(x, y, a) f_s(x, y, a) f_s(x, y, a) ``` ghstack-source-id: 86899be Pull Request resolved: #74361

This adds an optional validation after executing an NVFuser node, which checks that the output is the same as the unfused implementation. Then the outputs and the graph are reported via a callback. ```python import torch def callback(x, y, graph): for i in range(len(x)-amt, len(x)): print(x[i]) print(y[i]) print(graph) with torch.jit.fuser("fuser2"): torch._C._jit_nvfuser_set_comparison_callback(True, callback) torch.jit.script def g(x, y): z = torch.add(x, y) return torch.sin(z) def f(x, y, a): z = torch.add(x, y) return g(torch.relu(z), a) f_s = torch.jit.script(f) x = torch.rand((10, 10), dtype=torch.half).cuda() y = torch.rand((10, 10), dtype=torch.half).cuda() a = torch.rand((10, 10), dtype=torch.half).cuda() f_s(x, y, a) f_s(x, y, a) f_s(x, y, a) ``` Differential Revision: [D34975310](https://our.internmc.facebook.com/intern/diff/D34975310) [ghstack-poisoned]

davidberard98 · 2022-03-21T19:06:48Z

@davidberard98 has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

This adds an optional validation after executing an NVFuser node, which checks that the output is the same as the unfused implementation. Then the outputs and the graph are reported via a callback. ```python import torch def callback(x, y, graph): for i in range(len(x)-amt, len(x)): print(x[i]) print(y[i]) print(graph) with torch.jit.fuser("fuser2"): torch._C._jit_nvfuser_set_comparison_callback(True, callback) torch.jit.script def g(x, y): z = torch.add(x, y) return torch.sin(z) def f(x, y, a): z = torch.add(x, y) return g(torch.relu(z), a) f_s = torch.jit.script(f) x = torch.rand((10, 10), dtype=torch.half).cuda() y = torch.rand((10, 10), dtype=torch.half).cuda() a = torch.rand((10, 10), dtype=torch.half).cuda() f_s(x, y, a) f_s(x, y, a) f_s(x, y, a) ``` Differential Revision: [D34975310](https://our.internmc.facebook.com/intern/diff/D34975310) [ghstack-poisoned]

This adds an optional validation after executing an NVFuser node, which checks that the output is the same as the unfused implementation.o ```python import torch def callback(amt, x, y): for i in range(len(x)-amt, len(x)): print(x[i]) print(y[i]) with torch.jit.fuser("fuser2"): torch._C._jit_nvfuser_set_comparison_callback(callback) torch.jit.script def g(x, y): z = torch.add(x, y) return torch.sin(z) def f(x, y, a): z = torch.add(x, y) return g(torch.relu(z), a) f_s = torch.jit.script(f) x = torch.rand((10, 10), dtype=torch.half).cuda() y = torch.rand((10, 10), dtype=torch.half).cuda() a = torch.rand((10, 10), dtype=torch.half).cuda() f_s(x, y, a) f_s(x, y, a) f_s(x, y, a) ``` ghstack-source-id: 6224fcd Pull Request resolved: #74361

davidberard98 · 2022-03-21T20:32:04Z

@davidberard98 has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

eellison

Sweet!! LGTM - just a couple small comments. CC @jjsjann123 @kevinstephano may want to review as well

eellison · 2022-03-21T21:24:43Z

torch/csrc/jit/codegen/cuda/manager.cpp

    auto copied_graph = fusion_node->g(attr::Subgraph)->copy();
    EraseShapeInformation(copied_graph);
    enableAliasCopyNodes(copied_graph, copied_graph->block());
    InterpreterState{Code(copied_graph, "fallback_cuda_fuser")}.run(stack);


We should really cache and save the InterpreterState code here. This fallback actually does get run in real networks, and this whole processes happens on each invocation

That probably shouldn't be in this pr - but something to do soon

eellison · 2022-03-21T21:25:39Z

torch/csrc/jit/passes/cuda_graph_fuser.h

+struct CudaFuserComparisonCallback {
+  using callback_type =
+      std::function<void(const Stack&, const Stack&, const std::string&)>;
+  bool run_fallback;


When would this ever be false ?

@eellison if you just want to collect the graph (e.g. for logging fusion groups), without running the fallback

This adds an optional validation after executing an NVFuser node, which checks that the output is the same as the unfused implementation. Then the outputs and the graph are reported via a callback. ```python import torch def callback(x, y, graph): for i in range(len(x)-amt, len(x)): print(x[i]) print(y[i]) print(graph) with torch.jit.fuser("fuser2"): torch._C._jit_nvfuser_set_comparison_callback(True, callback) torch.jit.script def g(x, y): z = torch.add(x, y) return torch.sin(z) def f(x, y, a): z = torch.add(x, y) return g(torch.relu(z), a) f_s = torch.jit.script(f) x = torch.rand((10, 10), dtype=torch.half).cuda() y = torch.rand((10, 10), dtype=torch.half).cuda() a = torch.rand((10, 10), dtype=torch.half).cuda() f_s(x, y, a) f_s(x, y, a) f_s(x, y, a) ``` Differential Revision: [D34975310](https://our.internmc.facebook.com/intern/diff/D34975310) [ghstack-poisoned]

This adds an optional validation after executing an NVFuser node, which checks that the output is the same as the unfused implementation.o ```python import torch def callback(amt, x, y): for i in range(len(x)-amt, len(x)): print(x[i]) print(y[i]) with torch.jit.fuser("fuser2"): torch._C._jit_nvfuser_set_comparison_callback(callback) torch.jit.script def g(x, y): z = torch.add(x, y) return torch.sin(z) def f(x, y, a): z = torch.add(x, y) return g(torch.relu(z), a) f_s = torch.jit.script(f) x = torch.rand((10, 10), dtype=torch.half).cuda() y = torch.rand((10, 10), dtype=torch.half).cuda() a = torch.rand((10, 10), dtype=torch.half).cuda() f_s(x, y, a) f_s(x, y, a) f_s(x, y, a) ``` ghstack-source-id: dbd66e3 Pull Request resolved: #74361

davidberard98 · 2022-03-25T19:06:50Z

@davidberard98 has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

jjsjann123 · 2022-03-28T18:47:07Z

torch/csrc/jit/codegen/cuda/manager.cpp

+    TORCH_INTERNAL_ASSERT(stack.size() >= inputs_size);
+    stack_copy = Stack();
+    stack_copy->insert(
+        stack_copy->end(), stack.begin(), stack.end() - inputs_size);


Should we assert on !stack_copy.empty()? Since we later (L297) only use to determine whether we re-run fallback.

stack_copy is a c10::optional<> so conversion to bool will return true as long as it has a value - I can change the later if statement to if (stack_copy.has_value()) for clarity

sorry I missed the optional part... code looks fine as-is.

jjsjann123

LGTM

jjsjann123 · 2022-03-28T20:09:30Z

torch/csrc/jit/codegen/cuda/manager.cpp

+    TORCH_INTERNAL_ASSERT(stack.size() >= inputs_size);
+    stack_copy = Stack();
+    stack_copy->insert(
+        stack_copy->end(), stack.begin(), stack.end() - inputs_size);


sorry I missed the optional part... code looks fine as-is.

This adds an optional validation after executing an NVFuser node, which checks that the output is the same as the unfused implementation. Then the outputs and the graph are reported via a callback. ```python import torch def callback(x, y, graph): for i in range(len(x)-amt, len(x)): print(x[i]) print(y[i]) print(graph) with torch.jit.fuser("fuser2"): torch._C._jit_nvfuser_set_comparison_callback(True, callback) torch.jit.script def g(x, y): z = torch.add(x, y) return torch.sin(z) def f(x, y, a): z = torch.add(x, y) return g(torch.relu(z), a) f_s = torch.jit.script(f) x = torch.rand((10, 10), dtype=torch.half).cuda() y = torch.rand((10, 10), dtype=torch.half).cuda() a = torch.rand((10, 10), dtype=torch.half).cuda() f_s(x, y, a) f_s(x, y, a) f_s(x, y, a) ``` Differential Revision: [D34975310](https://our.internmc.facebook.com/intern/diff/D34975310) [ghstack-poisoned]

This adds an optional validation after executing an NVFuser node, which checks that the output is the same as the unfused implementation.o ```python import torch def callback(amt, x, y): for i in range(len(x)-amt, len(x)): print(x[i]) print(y[i]) with torch.jit.fuser("fuser2"): torch._C._jit_nvfuser_set_comparison_callback(callback) torch.jit.script def g(x, y): z = torch.add(x, y) return torch.sin(z) def f(x, y, a): z = torch.add(x, y) return g(torch.relu(z), a) f_s = torch.jit.script(f) x = torch.rand((10, 10), dtype=torch.half).cuda() y = torch.rand((10, 10), dtype=torch.half).cuda() a = torch.rand((10, 10), dtype=torch.half).cuda() f_s(x, y, a) f_s(x, y, a) f_s(x, y, a) ``` ghstack-source-id: 51f3a0e Pull Request resolved: #74361

davidberard98 · 2022-03-31T15:43:42Z

@davidberard98 has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

This adds an optional validation after executing an NVFuser node, which checks that the output is the same as the unfused implementation. Then the outputs and the graph are reported via a callback. ```python import torch def callback(x, y, graph): for i in range(len(x)-amt, len(x)): print(x[i]) print(y[i]) print(graph) with torch.jit.fuser("fuser2"): torch._C._jit_nvfuser_set_comparison_callback(True, callback) torch.jit.script def g(x, y): z = torch.add(x, y) return torch.sin(z) def f(x, y, a): z = torch.add(x, y) return g(torch.relu(z), a) f_s = torch.jit.script(f) x = torch.rand((10, 10), dtype=torch.half).cuda() y = torch.rand((10, 10), dtype=torch.half).cuda() a = torch.rand((10, 10), dtype=torch.half).cuda() f_s(x, y, a) f_s(x, y, a) f_s(x, y, a) ``` Differential Revision: [D34975310](https://our.internmc.facebook.com/intern/diff/D34975310) [ghstack-poisoned]

davidberard98 · 2022-04-01T02:52:55Z

@davidberard98 has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

Summary: Pull Request resolved: #74361 This adds an optional validation after executing an NVFuser node, which checks that the output is the same as the unfused implementation. Then the outputs and the graph are reported via a callback. ```python import torch def callback(x, y, graph): for i in range(len(x)-amt, len(x)): print(x[i]) print(y[i]) print(graph) with torch.jit.fuser("fuser2"): torch._C._jit_nvfuser_set_comparison_callback(True, callback) torch.jit.script def g(x, y): z = torch.add(x, y) return torch.sin(z) def f(x, y, a): z = torch.add(x, y) return g(torch.relu(z), a) f_s = torch.jit.script(f) x = torch.rand((10, 10), dtype=torch.half).cuda() y = torch.rand((10, 10), dtype=torch.half).cuda() a = torch.rand((10, 10), dtype=torch.half).cuda() f_s(x, y, a) f_s(x, y, a) f_s(x, y, a) ``` Test Plan: Imported from OSS Reviewed By: eellison Differential Revision: D34975310 Pulled By: davidberard98 fbshipit-source-id: 2379c9a6f371cd58da6a187c1f16882f3923ab24

Summary: Pull Request resolved: pytorch/pytorch#74361 This adds an optional validation after executing an NVFuser node, which checks that the output is the same as the unfused implementation. Then the outputs and the graph are reported via a callback. ```python import torch def callback(x, y, graph): for i in range(len(x)-amt, len(x)): print(x[i]) print(y[i]) print(graph) with torch.jit.fuser("fuser2"): torch._C._jit_nvfuser_set_comparison_callback(True, callback) torch.jit.script def g(x, y): z = torch.add(x, y) return torch.sin(z) def f(x, y, a): z = torch.add(x, y) return g(torch.relu(z), a) f_s = torch.jit.script(f) x = torch.rand((10, 10), dtype=torch.half).cuda() y = torch.rand((10, 10), dtype=torch.half).cuda() a = torch.rand((10, 10), dtype=torch.half).cuda() f_s(x, y, a) f_s(x, y, a) f_s(x, y, a) ``` Test Plan: Imported from OSS Reviewed By: eellison Differential Revision: D34975310 Pulled By: davidberard98 fbshipit-source-id: 2379c9a6f371cd58da6a187c1f16882f3923ab24 (cherry picked from commit 96c87992c65f5e6bb1bdd51791682dd837af99b4)

pytorch-bot bot added the ciflow/default label Mar 17, 2022

facebook-github-bot added the cla signed label Mar 17, 2022

facebook-github-bot added the oncall: jit Add this issue/PR to JIT oncall triage queue label Mar 17, 2022

davidberard98 mentioned this pull request Mar 17, 2022

[WIP][JIT] Optionally validate nvfuser outputs after execution #72952

Closed

davidberard98 changed the title ~~[WIP][JIT] Optionally validate nvfuser outputs after execution~~ [JIT] Optionally validate nvfuser outputs after execution Mar 21, 2022

eellison approved these changes Mar 21, 2022

View reviewed changes

eellison requested a review from jjsjann123 March 21, 2022 21:27

davidberard98 mentioned this pull request Mar 22, 2022

Improve NVFuser fallback performance #74559

Closed

suo removed the ciflow/default label Mar 22, 2022

jjsjann123 reviewed Mar 28, 2022

View reviewed changes

jjsjann123 approved these changes Mar 28, 2022

View reviewed changes

This was referenced Mar 31, 2022

[JIT] Enable NVFuser tests in CI #73322

Closed

[JIT] OpInfo tests for nvfuser #71299

Closed

davidberard98 mentioned this pull request Apr 1, 2022

[JIT] nvfuser CI fixes #75116

Closed

pytorchmergebot closed this in e9e7521 Apr 1, 2022

facebook-github-bot deleted the gh/davidberard98/65/head branch April 5, 2022 14:17

davidberard98 mentioned this pull request Apr 19, 2022

Build a default NVFuser comparison callback, e.g. for use with torchbench #76007

Open

davidberard98 added the topic: not user facing topic category label May 26, 2022

WBobby mentioned this pull request Aug 17, 2022

Add ROCm5.2.3/AMDGPU support for PyTorch WBobby/pytorch#2

Closed

[JIT] Optionally validate nvfuser outputs after execution #74361

[JIT] Optionally validate nvfuser outputs after execution #74361

Uh oh!

Conversation

davidberard98 commented Mar 17, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Mar 17, 2022

⚛️ CI Flow

Uh oh!

facebook-github-bot commented Mar 17, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful links

💊 CI failures summary and remediations

🕵️‍♀️ 2 failures not recognized by patterns:

Uh oh!

davidberard98 commented Mar 17, 2022

Uh oh!

davidberard98 commented Mar 21, 2022

Uh oh!

davidberard98 commented Mar 21, 2022

Uh oh!

eellison left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

davidberard98 commented Mar 25, 2022

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jjsjann123 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

davidberard98 commented Mar 31, 2022

Uh oh!

davidberard98 commented Apr 1, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

davidberard98 commented Mar 17, 2022 •

edited

Loading

facebook-github-bot commented Mar 17, 2022 •

edited

Loading