Skip to content

Conversation

@IvanKobzarev
Copy link
Contributor

@IvanKobzarev IvanKobzarev commented Apr 13, 2020

This PR contains the initial version of Vulkan (GPU) Backend integration.
The primary target environment is Android, but the desktop build is also supported.

CMake

Introducing three cmake options:
USE_VULKAN:
The main switch, if it is off, all other options do not affect.
USE_VULKAN_WRAPPER:
ON - Vulkan will be used loading it at runtime as "libvulkan.so" using libdl, every function call is wrapped in vulkan_wrapper.h.
OFF - linking with libvulkan.so directly
USE_VULKAN_SHADERC_RUNTIME:
ON - Shader compilation library will be linked, and shaders will be compiled runtime.
OFF - Shaders will be precompiled and shader compilation library is not included.

Codegen

if USE_VULKAN_SHADERC_RUNTIME is ON:
Shaders precompilation () starts in cmake/VulkanCodegen.cmake, which calls aten/src/ATen/native/vulkan/gen_glsl.py or aten/src/ATen/native/vulkan/gen_spv.py to include shaders source or SPIR-V bytecode inside binary as uint32_t array in spv.h,spv.cpp.
if USE_VULKAN_SHADERC_RUNTIME is OFF:
The source of shaders is included as glsl.h,glsl.cpp.

All codegen results happen in the build directory.

Build dependencies

cmake/Dependencies.cmake
If the target platform is Android - vulkan library, headers, Vulkan wrapper will be used from ANDROID_NDK.
Desktop build requires the VULKAN_SDK environment variable, and all vulkan dependencies will be used from it.
(Desktop build was tested only on Linux).

Pytorch integration:

Adding 'Vulkan" as new Backend, DispatchKey, DeviceType.
We are using Strided layout without supporting strides at the moment, but we plan to support them in the future.
Using OpaqueTensorImpl where OpaqueHandle is copyable VulkanTensor,
more details in comments in aten/src/ATen/native/vulkan/Vulkan.h

Main code location: aten/src/ATen/native/vulkan
aten/src/ATen/native/vulkan/VulkanAten.cpp - connection link between ATen and Vulkan api (Vulkan.h) that converts at::Tensor to VulkanTensor.

aten/src/ATen/native/Vulkan/Vulkan.h - Vulkan API that contains VulkanTensor representation and functions to work with it. Plan to expose it for clients to be able to write their own Vulkan Ops.

aten/src/ATen/native/vulkan/VulkanOps.cpp - Vulkan Operations Implementations that uses Vulkan.h API

GLSL shaders

Located in aten/src/ATen/native/vulkan/glsl as *.glsl files.
All shaders use Vulkan specialized constants for workgroup sizes with ids 1, 2, 3

Supported operations

Code point:
conv2d no-groups
conv2d depthwise
addmm
upsample nearest 2d
clamp
hardtanh

Testing

aten/src/ATen/test/vulkan_test.cpp - contains tests for
copy from CPU to Vulkan and back
all supported operations
Desktop builds supported, and testing can be done on a desktop that has Vulkan supported GPU or with installed software implementation of Vulkan, like https://github.com/google/swiftshader

Vulkan execution

The initial implementation is trivial and waits every operator's execution.

@dr-ci
Copy link

dr-ci bot commented Apr 13, 2020

💊 CI failures summary and remediations

As of commit b2d5f0e (more details on the Dr. CI page):



🕵️ 9 new failures recognized by patterns

The following CI failures do not appear to be due to upstream breakages:

See CircleCI build pytorch_bazel_test (1/9)

Step: "Test" (full log | diagnosis details | 🔁 rerun)

May 23 01:32:05 FAILED: Build did NOT complete successfully
May 23 01:32:05 //:ordered_dict_test                                                  NO STATUS 
May 23 01:32:05 //:rnn_test                                                           NO STATUS 
May 23 01:32:05 //:sequential_test                                                    NO STATUS 
May 23 01:32:05 //:serialize_test                                                     NO STATUS 
May 23 01:32:05 //:static_test                                                        NO STATUS 
May 23 01:32:05 //:tensor_options_test                                                NO STATUS 
May 23 01:32:05 //:tensor_test                                                        NO STATUS 
May 23 01:32:05 //:torch_include_test                                                 NO STATUS 
May 23 01:32:05  
May 23 01:32:05 Executed 0 out of 25 tests: 25 were skipped. 
May 23 01:32:05 FAILED: Build did NOT complete successfully 
May 23 01:32:05 + cleanup 
May 23 01:32:05 + retcode=1 
May 23 01:32:05 + set +x 
May 23 01:32:05 =================== sccache compilation log =================== 
May 23 01:32:05 =========== If your build fails, please take a look at the log above for possible reasons =========== 
May 23 01:32:05 Compile requests                 92 
May 23 01:32:05 Compile requests executed        66 
May 23 01:32:05 Cache hits                       55 
May 23 01:32:05 Cache misses                      8 
May 23 01:32:05 Cache timeouts                    0 

See CircleCI build pytorch_windows_vs2019_py36_cpu_test2 (2/9)

Step: "Test" (full log | diagnosis details | 🔁 rerun)

AssertionError: The following functions are not tested for __torch_function__ support, please ensure there is an entry in the dict returned by torch._overrides.get_testing_overrides for this function or if a __torch_function__ override does not make sense, add an entry to the tuple returned by torch._overrides.get_ignored_functions.
OK 
 
Generating XML reports... 
Generated XML report: test-reports\python-unittest\TEST-TestFunctionSchema-20200523014711.xml 
Running test_overrides ... [2020-05-23 01:47:12.485073] 
Traceback (most recent call last): 
  File "test_overrides.py", line 307, in <module> 
    generate_tensor_like_torch_implementations() 
  File "test_overrides.py", line 302, in generate_tensor_like_torch_implementations 
    assert len(untested_funcs) == 0, msg.format(pprint.pformat(untested_funcs)) 
AssertionError: The following functions are not tested for __torch_function__ support, please ensure there is an entry in the dict returned by torch._overrides.get_testing_overrides for this function or if a __torch_function__ override does not make sense, add an entry to the tuple returned by torch._overrides.get_ignored_functions. 
 
["<module 'torch' from " 
 "'C:\\\\Users\\\\circleci\\\\project\\\\build\\\\win_tmp\\\\build\\\\torch\\\\__init__.py'>.is_vulkan_available", 
 "<module 'torch' from " 
 "'C:\\\\Users\\\\circleci\\\\project\\\\build\\\\win_tmp\\\\build\\\\torch\\\\__init__.py'>.is_vulkan_available"] 
Traceback (most recent call last): 
  File "run_test.py", line 691, in <module> 
    main() 
  File "run_test.py", line 684, in main 
    raise RuntimeError(message) 

See CircleCI build pytorch_windows_vs2019_py36_cuda10.1_on_cpu_test2 (3/9)

Step: "Test" (full log | diagnosis details | 🔁 rerun)

AssertionError: The following functions are not tested for __torch_function__ support, please ensure there is an entry in the dict returned by torch._overrides.get_testing_overrides for this function or if a __torch_function__ override does not make sense, add an entry to the tuple returned by torch._overrides.get_ignored_functions.
OK 
 
Generating XML reports... 
Generated XML report: test-reports\python-unittest\TEST-TestFunctionSchema-20200523023301.xml 
Running test_overrides ... [2020-05-23 02:33:03.340731] 
Traceback (most recent call last): 
  File "test_overrides.py", line 307, in <module> 
    generate_tensor_like_torch_implementations() 
  File "test_overrides.py", line 302, in generate_tensor_like_torch_implementations 
    assert len(untested_funcs) == 0, msg.format(pprint.pformat(untested_funcs)) 
AssertionError: The following functions are not tested for __torch_function__ support, please ensure there is an entry in the dict returned by torch._overrides.get_testing_overrides for this function or if a __torch_function__ override does not make sense, add an entry to the tuple returned by torch._overrides.get_ignored_functions. 
 
["<module 'torch' from " 
 "'C:\\\\Users\\\\circleci\\\\project\\\\build\\\\win_tmp\\\\build\\\\torch\\\\__init__.py'>.is_vulkan_available", 
 "<module 'torch' from " 
 "'C:\\\\Users\\\\circleci\\\\project\\\\build\\\\win_tmp\\\\build\\\\torch\\\\__init__.py'>.is_vulkan_available"] 
Traceback (most recent call last): 
  File "run_test.py", line 691, in <module> 
    main() 
  File "run_test.py", line 684, in main 
    raise RuntimeError(message) 

See CircleCI build pytorch_linux_xenial_py3_6_gcc5_4_ge_config_simple_test (4/9)

Step: "Run tests" (full log | diagnosis details | 🔁 rerun)

May 23 02:47:57 ERROR:sccache::server: Compilation failed: Output { status: ExitStatus(ExitStatus(256)), stdout: "", stderr: "/var/lib/jenkins/.cache/torch_extensions/test_compilation_error_formatting/main.cpp: In function \'int main()\':\n/var/lib/jenkins/.cache/torch_extensions/test_compilation_error_formatting/main.cpp:2:23: error: expected \';\' before \'}\' token\n int main() { return 0 }\n ^\n" }
May 23 02:47:57 Traceback (most recent call last): 
May 23 02:47:57   File "test/run_test.py", line 691, in <module> 
May 23 02:47:57     main() 
May 23 02:47:57   File "test/run_test.py", line 684, in main 
May 23 02:47:57     raise RuntimeError(message) 
May 23 02:47:57 RuntimeError: test_overrides failed! 
May 23 02:47:57 + cleanup 
May 23 02:47:57 + retcode=1 
May 23 02:47:57 + set +x 
May 23 02:47:57 =================== sccache compilation log =================== 
May 23 02:47:57 ERROR:sccache::server: Compilation failed: Output { status: ExitStatus(ExitStatus(256)), stdout: "", stderr: "/var/lib/jenkins/.cache/torch_extensions/test_compilation_error_formatting/main.cpp: In function \'int main()\':\n/var/lib/jenkins/.cache/torch_extensions/test_compilation_error_formatting/main.cpp:2:23: error: expected \';\' before \'}\' token\n int main() { return 0 }\n                       ^\n" } 
May 23 02:47:57  
May 23 02:47:57 =========== If your build fails, please take a look at the log above for possible reasons =========== 
May 23 02:47:57 Compile requests                 68 
May 23 02:47:57 Compile requests executed        35 
May 23 02:47:57 Cache hits                        2 
May 23 02:47:57 Cache misses                     32 
May 23 02:47:57 Cache timeouts                    0 
May 23 02:47:57 Cache read errors                 0 
May 23 02:47:57 Forced recaches                   0 
May 23 02:47:57 Cache write errors                0 

See CircleCI build pytorch_linux_xenial_py3_6_gcc5_4_test (5/9)

Step: "Run tests" (full log | diagnosis details | 🔁 rerun)

May 23 02:48:53 ERROR:sccache::server: Compilation failed: Output { status: ExitStatus(ExitStatus(256)), stdout: "", stderr: "/var/lib/jenkins/.cache/torch_extensions/test_compilation_error_formatting/main.cpp: In function \'int main()\':\n/var/lib/jenkins/.cache/torch_extensions/test_compilation_error_formatting/main.cpp:2:23: error: expected \';\' before \'}\' token\n int main() { return 0 }\n ^\n" }
May 23 02:48:53 Traceback (most recent call last): 
May 23 02:48:53   File "test/run_test.py", line 691, in <module> 
May 23 02:48:53     main() 
May 23 02:48:53   File "test/run_test.py", line 684, in main 
May 23 02:48:53     raise RuntimeError(message) 
May 23 02:48:53 RuntimeError: test_overrides failed! 
May 23 02:48:53 + cleanup 
May 23 02:48:53 + retcode=1 
May 23 02:48:53 + set +x 
May 23 02:48:53 =================== sccache compilation log =================== 
May 23 02:48:53 ERROR:sccache::server: Compilation failed: Output { status: ExitStatus(ExitStatus(256)), stdout: "", stderr: "/var/lib/jenkins/.cache/torch_extensions/test_compilation_error_formatting/main.cpp: In function \'int main()\':\n/var/lib/jenkins/.cache/torch_extensions/test_compilation_error_formatting/main.cpp:2:23: error: expected \';\' before \'}\' token\n int main() { return 0 }\n                       ^\n" } 
May 23 02:48:53  
May 23 02:48:53 =========== If your build fails, please take a look at the log above for possible reasons =========== 
May 23 02:48:53 Compile requests                 68 
May 23 02:48:53 Compile requests executed        35 
May 23 02:48:53 Cache hits                       10 
May 23 02:48:53 Cache misses                     24 
May 23 02:48:53 Cache timeouts                    0 
May 23 02:48:53 Cache read errors                 0 
May 23 02:48:53 Forced recaches                   0 
May 23 02:48:53 Cache write errors                0 

See CircleCI build pytorch_linux_bionic_py3_6_clang9_test (6/9)

Step: "Run tests" (full log | diagnosis details | 🔁 rerun)

May 23 02:51:36 ERROR:sccache::server: Compilation failed: Output { status: ExitStatus(ExitStatus(256)), stdout: "", stderr: "/var/lib/jenkins/.cache/torch_extensions/test_compilation_error_formatting/main.cpp: In function \'int main()\':\n/var/lib/jenkins/.cache/torch_extensions/test_compilation_error_formatting/main.cpp:2:23: error: expected \';\' before \'}\' token\n int main() { return 0 }\n ^\n" }
May 23 02:51:36     raise RuntimeError(message) 
May 23 02:51:36 RuntimeError: test_overrides failed! 
May 23 02:51:36  
May 23 02:51:36 real	29m18.033s 
May 23 02:51:36 user	24m19.985s 
May 23 02:51:36 sys	2m17.707s 
May 23 02:51:36 + cleanup 
May 23 02:51:36 + retcode=1 
May 23 02:51:36 + set +x 
May 23 02:51:36 =================== sccache compilation log =================== 
May 23 02:51:36 ERROR:sccache::server: Compilation failed: Output { status: ExitStatus(ExitStatus(256)), stdout: "", stderr: "/var/lib/jenkins/.cache/torch_extensions/test_compilation_error_formatting/main.cpp: In function \'int main()\':\n/var/lib/jenkins/.cache/torch_extensions/test_compilation_error_formatting/main.cpp:2:23: error: expected \';\' before \'}\' token\n int main() { return 0 }\n                       ^\n" } 
May 23 02:51:36  
May 23 02:51:36 =========== If your build fails, please take a look at the log above for possible reasons =========== 
May 23 02:51:36 Compile requests                 68 
May 23 02:51:36 Compile requests executed        35 
May 23 02:51:36 Cache hits                        2 
May 23 02:51:36 Cache misses                     32 
May 23 02:51:36 Cache timeouts                    0 
May 23 02:51:36 Cache read errors                 0 
May 23 02:51:36 Forced recaches                   0 
May 23 02:51:36 Cache write errors                0 

See CircleCI build pytorch_windows_vs2019_py36_cuda10.1_test2 (7/9)

Step: "Test" (full log | diagnosis details | 🔁 rerun)

AssertionError: The following functions are not tested for __torch_function__ support, please ensure there is an entry in the dict returned by torch._overrides.get_testing_overrides for this function or if a __torch_function__ override does not make sense, add an entry to the tuple returned by torch._overrides.get_ignored_functions.
OK 
 
Generating XML reports... 
Generated XML report: test-reports\python-unittest\TEST-TestFunctionSchema-20200523030224.xml 
Running test_overrides ... [2020-05-23 03:02:25.853883] 
Traceback (most recent call last): 
  File "test_overrides.py", line 307, in <module> 
    generate_tensor_like_torch_implementations() 
  File "test_overrides.py", line 302, in generate_tensor_like_torch_implementations 
    assert len(untested_funcs) == 0, msg.format(pprint.pformat(untested_funcs)) 
AssertionError: The following functions are not tested for __torch_function__ support, please ensure there is an entry in the dict returned by torch._overrides.get_testing_overrides for this function or if a __torch_function__ override does not make sense, add an entry to the tuple returned by torch._overrides.get_ignored_functions. 
 
["<module 'torch' from " 
 "'C:\\\\Users\\\\circleci\\\\project\\\\build\\\\win_tmp\\\\build\\\\torch\\\\__init__.py'>.is_vulkan_available", 
 "<module 'torch' from " 
 "'C:\\\\Users\\\\circleci\\\\project\\\\build\\\\win_tmp\\\\build\\\\torch\\\\__init__.py'>.is_vulkan_available"] 
Traceback (most recent call last): 
  File "run_test.py", line 691, in <module> 
    main() 
  File "run_test.py", line 684, in main 
    raise RuntimeError(message) 

See CircleCI build pytorch_macos_10_13_py3_test (8/9)

Step: "Test" (full log | diagnosis details | 🔁 rerun)

May 22 20:23:27 AssertionError: The following functions are not tested for __torch_function__ support, please ensure there is an entry in the dict returned by torch._overrides.get_testing_overrides for this function or if a __torch_function__ override does not make sense, add an entry to the tuple returned by torch._overrides.get_ignored_functions.
May 22 20:23:26 OK 
May 22 20:23:26  
May 22 20:23:26 Generating XML reports... 
May 22 20:23:26 Generated XML report: test-reports/dist-gloo/TEST-TestFunctionSchema-20200522202325.xml 
May 22 20:23:26 Running test_overrides ... [2020-05-22 20:23:26.293083] 
May 22 20:23:27 Traceback (most recent call last): 
May 22 20:23:27   File "test_overrides.py", line 307, in <module> 
May 22 20:23:27     generate_tensor_like_torch_implementations() 
May 22 20:23:27   File "test_overrides.py", line 302, in generate_tensor_like_torch_implementations 
May 22 20:23:27     assert len(untested_funcs) == 0, msg.format(pprint.pformat(untested_funcs)) 
May 22 20:23:27 AssertionError: The following functions are not tested for __torch_function__ support, please ensure there is an entry in the dict returned by torch._overrides.get_testing_overrides for this function or if a __torch_function__ override does not make sense, add an entry to the tuple returned by torch._overrides.get_ignored_functions. 
May 22 20:23:27  
May 22 20:23:27 ["<module 'torch' from " 
May 22 20:23:27  "'/Users/distiller/workspace/miniconda3/lib/python3.7/site-packages/torch/__init__.py'>.is_vulkan_available", 
May 22 20:23:27  "<module 'torch' from " 
May 22 20:23:27  "'/Users/distiller/workspace/miniconda3/lib/python3.7/site-packages/torch/__init__.py'>.is_vulkan_available"] 
May 22 20:23:27 Traceback (most recent call last): 
May 22 20:23:27   File "test/run_test.py", line 691, in <module> 
May 22 20:23:27     main() 
May 22 20:23:27   File "test/run_test.py", line 684, in main 
May 22 20:23:27     raise RuntimeError(message) 

See CircleCI build pytorch_linux_xenial_py3_clang5_asan_test (9/9)

Step: "Run tests" (full log | diagnosis details | 🔁 rerun)

May 23 02:58:11 SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior /var/lib/jenkins/workspace/aten/src/ATen/Utils.cpp:11:3 in
May 23 02:58:11     #7 0x561f7fe3a74b in PyEval_EvalCode /tmp/build/80754af9/python_1585002248360/work/Python/ceval.c:731 
May 23 02:58:11     #8 0x561f7feba633 in run_mod /tmp/build/80754af9/python_1585002248360/work/Python/pythonrun.c:1025 
May 23 02:58:11     #9 0x561f7feba6cc in PyRun_StringFlags /tmp/build/80754af9/python_1585002248360/work/Python/pythonrun.c:949 
May 23 02:58:11     #10 0x561f7feba72e in PyRun_SimpleStringFlags /tmp/build/80754af9/python_1585002248360/work/Python/pythonrun.c:445 
May 23 02:58:11     #11 0x561f7febe532 in run_command /tmp/build/80754af9/python_1585002248360/work/Modules/main.c:301 
May 23 02:58:11     #12 0x561f7febe532 in Py_Main /tmp/build/80754af9/python_1585002248360/work/Modules/main.c:749 
May 23 02:58:11     #13 0x561f7fd891fd in main /tmp/build/80754af9/python_1585002248360/work/Programs/python.c:69 
May 23 02:58:11     #14 0x7f960ff6882f in __libc_start_main /build/glibc-LK5gWL/glibc-2.23/csu/../csu/libc-start.c:291 
May 23 02:58:11     #15 0x561f7fe67c29 in _start /home/rdonnelly/mc/conda-bld/compilers_linux-64_1534865402226/work/.build/src/glibc-2.12.2/csu/../sysdeps/x86_64/elf/start.S:103 
May 23 02:58:11  
May 23 02:58:11 SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior /var/lib/jenkins/workspace/aten/src/ATen/Utils.cpp:11:3 in  
May 23 02:58:11 + retcode=1 
May 23 02:58:11 + set -e 
May 23 02:58:11 + return 1 
May 23 02:58:11 + [[ pytorch-linux-xenial-py3-clang5-asan-test == *-NO_AVX-* ]] 
May 23 02:58:11 + [[ pytorch-linux-xenial-py3-clang5-asan-test == *-NO_AVX2-* ]] 
May 23 02:58:11 + '[' -n https://github.com/pytorch/pytorch/pull/36491 ']' 
May 23 02:58:11 ++ mktemp 
May 23 02:58:11 + DETERMINE_FROM=/tmp/tmp.U3nmq0VEsR 
May 23 02:58:11 + file_diff_from_base /tmp/tmp.U3nmq0VEsR 
May 23 02:58:11 + set +e 

❄️ 1 failure tentatively classified as flaky

but reruns have not yet been triggered to confirm:

See CircleCI build pytorch_linux_xenial_cuda10_2_cudnn7_py3_gcc7_test (1/1)

Step: "Run tests" (full log | diagnosis details | 🔁 rerun) ❄️

May 23 05:06:47 ConnectionResetError: [Errno 104] Connection reset by peer
May 23 05:06:47   File "/opt/conda/lib/python3.6/multiprocessing/connection.py", line 455, in accept 
May 23 05:06:47     deliver_challenge(c, self._authkey) 
May 23 05:06:47   File "/opt/conda/lib/python3.6/multiprocessing/connection.py", line 722, in deliver_challenge 
May 23 05:06:47     response = connection.recv_bytes(256)        # reject large message 
May 23 05:06:47   File "/opt/conda/lib/python3.6/multiprocessing/connection.py", line 216, in recv_bytes 
May 23 05:06:47     buf = self._recv_bytes(maxlength) 
May 23 05:06:47   File "/opt/conda/lib/python3.6/multiprocessing/connection.py", line 407, in _recv_bytes 
May 23 05:06:47     buf = self._recv(4) 
May 23 05:06:47   File "/opt/conda/lib/python3.6/multiprocessing/connection.py", line 379, in _recv 
May 23 05:06:47     chunk = read(handle, remaining) 
May 23 05:06:47 ConnectionResetError: [Errno 104] Connection reset by peer 
May 23 05:06:47 /opt/conda/lib/python3.6/multiprocessing/semaphore_tracker.py:143: UserWarning: semaphore_tracker: There appear to be 14 leaked semaphores to clean up at shutdown 
May 23 05:06:47   len(cache)) 
May 23 05:06:49 Process ErrorTrackingProcess-126: 
May 23 05:06:49 Traceback (most recent call last): 
May 23 05:06:49   File "/opt/conda/lib/python3.6/multiprocessing/process.py", line 258, in _bootstrap 
May 23 05:06:49     self.run() 
May 23 05:06:49   File "/var/lib/jenkins/workspace/test/test_dataloader.py", line 362, in run 
May 23 05:06:49     super(ErrorTrackingProcess, self).run() 
May 23 05:06:49   File "/opt/conda/lib/python3.6/multiprocessing/process.py", line 93, in run 
May 23 05:06:49     self._target(*self._args, **self._kwargs) 

ci.pytorch.org: 1 failed


This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions on the GitHub issue tracker or post in the (internal) Dr. CI Users group.

See how this bot performed.

This comment has been revised 414 times.

@IvanKobzarev IvanKobzarev force-pushed the mgpu_integration_proto branch 4 times, most recently from 05c36f0 to 1cd824f Compare April 18, 2020 04:43
@IvanKobzarev IvanKobzarev force-pushed the mgpu_integration_proto branch 2 times, most recently from 74d46cd to 563c250 Compare April 19, 2020 23:38
CMakeLists.txt Outdated
"Use system Eigen instead of the one under third_party" OFF)
option(USE_TENSORRT "Using Nvidia TensorRT library" OFF)
option(USE_VULKAN "Use Vulkan GPU backend" ON)
option(USE_VULKANGL "Use VulkanGL GPU backend" OFF)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this name is too confusing because it has nothing to do with Vulkan. Can we call this GLES or something like that?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I will rename it.

I thought about organizing it as a fallback option for USE_VULKAN,
something like USE_VULKAN_FALLBACK_GLES to cover logic:
"try vulkan first, if did not start - try GL approach"

At the moment it uses preprocessor macros so it could not be used like fallback, only as an alternative build time.

Do you think we need 'fallback'? Or at the moment just either USE_VULKAN or USE_GLES with priority if USE_VULKAN - USE_GLES turned forced to turn off.

#ifdef USE_VULKAN
using VTensor = at::native::vulkan::details::vulkan::VulkanVulkanTensor;
#endif
#ifdef USE_VULKANGL
using VTensor = at::native::vulkan::details::gl::VulkanGLTensor;
#endif
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if these are both true?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will add mutually exclusive logic with priority to USE_VULKAN.
I am thinking if to keep GLES option, should it be a separate Backend or it can be switch inside USE_VULKAN as a fallback?

Comment on lines 166 to 171
if(UNIX AND NOT APPLE)
IF (USE_VULKANGL)
list(APPEND ATen_VULKANGL_DEPENDENCY_LIBS EGL GLESv3)
ENDIF()

IF (UNIX AND NOT APPLE)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like our convention is to use lowercase if and endif.

namespace at {
namespace detail {

C10_REGISTER_GUARD_IMPL(VULKAN, VULKANGuardImpl);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if this should be VulkanGuardImpl. All of the others (CPU, CUDA) are acronyms.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My capitalization here was based as in DeviceType.h all DeviceTypes are capitalized, so I added it as VULKAN and Device is VULKAN as , GUARD_IMPL is for Device and I used the same capitalization here. I prefer 'Vulkan' everywhere - in Device,DeviceType and here. Do you think it will be ok to have DeviceType 'Vulkan'?

@@ -0,0 +1,64 @@
#pragma once
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason to put this class in a .h file instead of all in the .cpp?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I copied CPUGuardImpl approach. I will try to move it to cpp.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it can be moved all to cpp, no implicit codegen usage of headers :)

namespace vulkan {
namespace debug {

void vk_print(const char* m, const float* t, uint32_t rank, uint32_t* dims) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't we already have generic code for printing tensors?

Comment on lines 13 to 22
#ifdef __ANDROID__
#include <android/log.h>
// __android_log_print(ANDROID_LOG_ERROR, "AGPU", format, ##__VA_ARGS__)
#define AGPU_ERROR(format, ...) printf(format, ##__VA_ARGS__)
// __android_log_print(ANDROID_LOG_INFO, "AGPU", format, ##__VA_ARGS__)
#define APRINT(format, ...) printf(format, ##__VA_ARGS__)

#define FUNC_PRINT(x) APRINT(#x "=%d in %s, %d \n", x, __func__, __LINE__);
#define FUNC_PRINT_ALL(x, type) \
APRINT(#x "=" #type " %" #type " in %s, %d \n", x, __func__, __LINE__);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use glog rather than android/printf directly?


static const bool enableValidationLayers = true;

class AVKContext;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the "A" here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wanted some naming separation between internal wrappers and core vulkan functions, as they all start from vk or Vk.
Probably just using Context in proper namespace should be enough.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have renamed internal classes for GLES with 'GL' prefix and for Vulkan with 'V'


namespace c10 {
enum class Layout : int8_t { Strided, Sparse, Mkldnn };
enum class Layout : int8_t { Strided, Sparse, Mkldnn, Vulkan };
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think of a more specific name like Texture4C, since we're thinking of having buffer-based Vulkan tensors that don't use this layout.

# add_subdirectory(${CMAKE_CURRENT_LIST_DIR}/../third_party/Serenity)
# list(APPEND Caffe2_DEPENDENCY_LIBS Orion)

if(NOT ANDROID)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How hard is it to remove this restriction?

Copy link
Contributor Author

@IvanKobzarev IvanKobzarev Apr 22, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should not be hard - at the moment it tries to find vulkan, vulkan_wrapper, shaderc from android ndk path.

We just need to support case to get vulkan and vulkan_wrapper from VULKAN_SDK, shaderc clonning from github and building.

That desktop setup already exists inside Ashkans library.

@IvanKobzarev IvanKobzarev force-pushed the mgpu_integration_proto branch 2 times, most recently from 01cf3b7 to b61613d Compare April 24, 2020 05:44
Copy link
Collaborator

@dzhulgakov dzhulgakov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Epic! I did a pass on pytorch-y integrations and have no idea about actual vulkan/gles bits.

As it matures more it we should also do a review with more folks:

@@ -0,0 +1,64 @@
#include <c10/core/impl/DeviceGuardImplInterface.h>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move the entire VulkanGuardImpl.cpp under vulkan/ (or native/vulkan)

dispatch:
CPU: dense_to_mkldnn

- func: to_vulkan(Tensor self) -> Tensor
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do you need it? The reason it exists for mkldnn is because it's not a device, it's a layout.

For vulkan given that it's a device we should just use to('vulkan'). It should be handled somewhere in copy_ implementation: https://codebrowser.bddppq.com/pytorch/pytorch/aten/src/ATen/native/TensorConversions.cpp.html#44 (which you should implement too)

Comment on lines 21 to 34
template <typename T>
struct CAFFE2_API IntrusivePtrTargetWrapper : c10::intrusive_ptr_target {
private:
T target_;

public:
IntrusivePtrTargetWrapper() = delete;
IntrusivePtrTargetWrapper(const T& target) : target_(target) {}
IntrusivePtrTargetWrapper(T&& target) : target_(std::move(target)) {}

T& get_target() {
return target_;
}
};
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, you can just make VTensor inherit from intrusive_ptr_target directly and remove one level of unwrapping. In case of mkldnn they are wrapping the external library and thus need an indirection.

using VTensorWrapper = IntrusivePtrTargetWrapper<VTensor>;
using VTensorWrapperPtr = c10::intrusive_ptr<VTensorWrapper>;
using VulkanTensorImpl = OpaqueTensorImpl<VTensorWrapperPtr>;
using VulkanTensor = at::Tensor;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe not have it at all - it's more confusing than helpful and conflicts with at::native::vulkan::details::vulkan::VulkanTensor


using VTensorWrapper = IntrusivePtrTargetWrapper<VTensor>;
using VTensorWrapperPtr = c10::intrusive_ptr<VTensorWrapper>;
using VulkanTensorImpl = OpaqueTensorImpl<VTensorWrapperPtr>;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you can make VTensor copiable (because we technically support "shallow copies" of tensors with "shared storage") then you don't need this indirection. (I don't even recall exactly why we allow shallow copies for opaque tensors to be honest)

SparseCPU: sparse_to_dense
SparseCUDA: sparse_to_dense
MkldnnCPU: mkldnn_to_dense
Vulkan: vulkan_to_dense
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

vulkan_to_dense should be a noop (and similarly to regular Dense it's not even declared here).

unlike mkldnn, vulkan is a device, not layout. So you should just put this transfer logic into copy_

} // namespace native
} // namespace at

#else
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe you can change the codegen to avoid ifdefs here? E.g. for CUDA I think we just don't even generate dispatch if compiled without CUDA. We should do the same for vulkan (not I wonder why we didn't do it for mkldnn, lol)


namespace c10 {
enum class Layout : int8_t { Strided, Sparse, Mkldnn };
enum class Layout : int8_t { Strided, Sparse, Mkldnn, Texture4C };
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is Vulkan going to support multiple layouts? If not - I'd suggest to just use 'Strided' (aka Dense) which is kind of default. Our layouts are sadly not super clean, but it's less modifications this way

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this diff we have only one opaque texture layout.
The next step will be to add Strided buffer layout.

I think we will expose 'Strided' buffer layout for users, to write custom ops, to support all the logic etc.

I do not know if we want to expose Texture4c, when we have 'Strided', probably that will be our internal implementation details. Anyway we will need conversion strided buffer -> texture to not write stride-iteration logic inside shaders for ops. But maybe @dreiss , @AshkanAliabadi have different opinion about it.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, interesting. Do we want to allow users to explicitly convert to/from Texture4c? Can there be a kernel that takes arguments in different format (one in Texture4c and one in strided)?

Basically we need to decide which option of leveraging current tensor primitives makes most sense:

  1. make Texture4c choice completely opaque, e.g. some operators return textures instead of buffers but it's hidden in TensorImpl details. So that if I call contiguous or anything else the texture gets silently converted to buffer
  2. make Texture4c the layout as you do here. In this case it'd be user-visible and the user can manipulate conversion back and forth explicitly
  3. make Texture4c separate device. It's almost the same as layout, but we can leverage dispatcher to model mixed kernels and new textures allocation would go a bit of a different path (with its own allocator). You'd be also able to do .to('vulcan-texture') instead of a separate method

Above is an important decision to make - feel free to ping me offline too for a detailed discussion

return DispatchKey::MSNPUTensorId;
case DeviceType::XLA:
return DispatchKey::XLATensorId;
// IKTODO? Is it right to have (Dense - Vulkan) here ?
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, as mentioned above we probably should map to strided directly. I recall you mentioning earlier that vulkan actually has two different layouts (texture vs buffer) - do we want to expose it to the user or make completely implicit?

@IvanKobzarev IvanKobzarev force-pushed the mgpu_integration_proto branch from b61613d to 3474c63 Compare April 24, 2020 16:14
@IvanKobzarev
Copy link
Contributor Author

Epic! I did a pass on pytorch-y integrations and have no idea about actual vulkan/gles bits.

As it matures more it we should also do a review with more folks:

Thanks a lot for your comments! I will start stacking commits on this to apply them.

Vulkan/GLES part at the moment is trivial, just to have end-to-end testing and starting bits for most of the moving parts.

I was thinking about landing plan.
One of the ideas - to keep changes smaller, to get to some point when we can land 'opaque' tensor way for the start, and after that iteratively add functionality with strided and non trivial Vulkan/GL handling etc.

What do you think about landing 'opaque' tensor first with very limited functionality?

@AshkanAliabadi AshkanAliabadi removed their request for review April 25, 2020 05:10
@dzhulgakov
Copy link
Collaborator

General plan of landing opaque tensor with minimal support (like allocation / copy) indeed makes sense. But the first PR doesn't have to be too small either.

It's important to first close on how we want to represent textures/buffers in the system so we don't need changing it down the road.

Copy link
Collaborator

@dzhulgakov dzhulgakov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The general C++ API looks pretty reasonable

@@ -0,0 +1,1152 @@
#ifdef USE_VULKAN
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was commenting somewhere already I think - you can make file inclusion conditional in the build and then no need to have #ifdefs in every file

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I am doing split what to compile:
in aten/src/ATen/CMakeLists.txt:

if(USE_VULKAN OR USE_GLES)
  set(all_cpu_cpp ${all_cpu_cpp} ${native_vulkan_cpp} ${vulkan_generated_cpp})
else()
  set(all_cpu_cpp ${all_cpu_cpp} ${native_vulkan_stub_cpp})
endif()

I will configure it there and remove ifdefs.


#endif

using VTensorPtr = c10::intrusive_ptr<VTensor>;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

now that you have shared_ptr inside VulkanTensor (i.e. it's copyable) - you don't need an extra indirection here -> just OpaqueTensorImpl (though you might need to do the same for GLTensor)

@@ -0,0 +1,48 @@
#if !defined(USE_VULKAN) && !defined(USE_GLES)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as mentioned earlier - better to just modify codegen (like we do for cuda) to not generate registrations if not building with vulcan

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have added --vulkan arg to aten/gen.py the same way as --rocm, that without --vulkan no generations for Vulkan backend.
But is_vulkan_available that is registered for CPU backend still needs this stub, so I left it here, probably that can be solved in some other place that will be enabled only when cmake USE_VULKAN

@IvanKobzarev IvanKobzarev force-pushed the mgpu_integration_proto branch from f729836 to cfb5c13 Compare May 19, 2020 23:39
Comment on lines 63 to 70
file(GLOB native_gles_cpp
"native/vulkan/VulkanAten.cpp"
"native/vulkan/VulkanGuardImpl.cpp"
"native/vulkan/gl/*.cpp")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't need to be in this diff, but we should separate Vulkan from GLES if possible.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Deleting GLES from this PR

// in host visible memory that can be memory mapped to CPU memory.
//
// 1. VImage(TexC4) - (wrapper on vulkan VkImage), optional representation of
// tensors with dimension <= 4 as VkImage, sed in shaders as texture or storage
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"sed"?

class VContext;
const VContext& context();

// VulkanTensor is a handle that holds shared pointer to VulkanTensor:Impl,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really like this comment. I think it's going to be important to keep this up-to-date to ensure that we're always clear about how tensors are physically represented. I'm having bit of a hard time understanding the specifics of the dimension indexing. Let's have a follow up session to go over the wording below.

Comment on lines +128 to +137
std::shared_ptr<Impl> impl();
std::shared_ptr<const Impl> impl() const;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would consider dropping these.

1/ You have a number of extra lines of code to handle declaraion, definition, const and non-const.
2/ They return by value, which probably results in in shared_ptr copies, which require atomic ops.
3/ As long as they are private, they aren't really buying you any protection.

uint32_t queueFamilyIndex_;
bool enableValidationLayers_;
VkCommandPool commandPool_;
}; // class VContext
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd drop this comment. Looks like it's not in common use in our codebase.

Comment on lines 122 to 129
if GLSL_DIR_PATH is None:
raise Exception("")

if GLSLC_PATH is None:
raise Exception("")

if OUTPUT_DIR_PATH is None:
raise Exception("")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

argparse supports required=True so you don't need explicit checks for these.

Comment on lines 97 to 98
if __name__ == '__main__':
parser = argparse.ArgumentParser(description='')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One consequence of putting code directly here is that variables like "parser" become visible from every function in this file, which is probably not what you want. I prefer

def main(argv):
    # blah

if __name__ == "__main__":
    sys.exit(main(sys.argv))

Then all of my variables are local to main.

if(INTERN_BUILD_MOBILE)
list(APPEND CUSTOM_BUILD_FLAGS --backend_whitelist CPU QuantizedCPU)
if(USE_VULKAN OR USE_GLES)
message(STATUS "XXX VULKAN")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove

else()
# USE_VULKAN AND NOT ANDROID
if(NOT DEFINED ENV{VULKAN_SDK})
message(FATAL_ERROR "USE_VULKAN requires environment var VULKAN_SDK set")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Extra space before VULKAN_SDK. :)

std::vector<std::unique_ptr<BaseOp>> ops;
};

class MobileNetV2 : public OpsList {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't MobileNetV2 have residual connections?

@IvanKobzarev IvanKobzarev changed the title [Mobile GPU][Integration][proto] Vulkan/GL Integration prototype [Mobile GPU][Integration] Vulkan backend integration May 21, 2020
Copy link
Contributor

@dreiss dreiss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the commit message, are the ON/OFF descriptions for USE_VULKAN_SHADERC_RUNTIME swapped?

Any reason the files are called "spv" instead of "spirv"?

Looks good. Lots of work to do, but this should be safe. Let's land it and start making improvements in-tree.

@IvanKobzarev IvanKobzarev force-pushed the mgpu_integration_proto branch from 7662daf to 7c995d9 Compare May 21, 2020 22:51
Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@IvanKobzarev has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why exactly do you need it? to auto-contiguous the tensors? I wonder whether it's better to error out instead and fix it at higher levels of API.

At the very least - add a comment here.

cc @ezyang fyi

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, please don't do this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have prepared fix in separate PR: #39019 (introduced empty_strided_vulkan and after that these 'ifs' can be removed, I added them before decision to use Strided layout in Vulkan and have not cleaned them after switch )

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do you need it here duplicated? it's already in to_impl

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removing it in #39019

@IvanKobzarev IvanKobzarev force-pushed the mgpu_integration_proto branch from 7c995d9 to 88d0ee8 Compare May 22, 2020 01:31
Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@IvanKobzarev has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@IvanKobzarev IvanKobzarev force-pushed the mgpu_integration_proto branch from 88d0ee8 to b2d5f0e Compare May 23, 2020 00:17
Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@IvanKobzarev has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@IvanKobzarev merged this pull request in b460465.

facebook-github-bot pushed a commit that referenced this pull request May 26, 2020
Summary:
As a follow up for #36491 and last comments on it.

Vulkan uses Strided Layout (at the moment strides are not supported, but in plan)
empty_strided just forwards to empty_vulkan, ignoring strides params.

Removing explicit ifs in TensorConversions that were added before decision to use Strided layout and have not been cleaned after that :(
Pull Request resolved: #39019

Differential Revision: D21726480

Pulled By: IvanKobzarev

fbshipit-source-id: d465456df248a118bfef441c85280aa0025860cd
@facebook-github-bot facebook-github-bot deleted the mgpu_integration_proto branch July 13, 2020 17:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants