-
Notifications
You must be signed in to change notification settings - Fork 26.3k
[Mobile GPU][Integration] Vulkan backend integration #36491
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
💊 CI failures summary and remediationsAs of commit b2d5f0e (more details on the Dr. CI page):
🕵️ 9 new failures recognized by patternsThe following CI failures do not appear to be due to upstream breakages:
|
05c36f0 to
1cd824f
Compare
74d46cd to
563c250
Compare
CMakeLists.txt
Outdated
| "Use system Eigen instead of the one under third_party" OFF) | ||
| option(USE_TENSORRT "Using Nvidia TensorRT library" OFF) | ||
| option(USE_VULKAN "Use Vulkan GPU backend" ON) | ||
| option(USE_VULKANGL "Use VulkanGL GPU backend" OFF) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this name is too confusing because it has nothing to do with Vulkan. Can we call this GLES or something like that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I will rename it.
I thought about organizing it as a fallback option for USE_VULKAN,
something like USE_VULKAN_FALLBACK_GLES to cover logic:
"try vulkan first, if did not start - try GL approach"
At the moment it uses preprocessor macros so it could not be used like fallback, only as an alternative build time.
Do you think we need 'fallback'? Or at the moment just either USE_VULKAN or USE_GLES with priority if USE_VULKAN - USE_GLES turned forced to turn off.
| #ifdef USE_VULKAN | ||
| using VTensor = at::native::vulkan::details::vulkan::VulkanVulkanTensor; | ||
| #endif | ||
| #ifdef USE_VULKANGL | ||
| using VTensor = at::native::vulkan::details::gl::VulkanGLTensor; | ||
| #endif |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if these are both true?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will add mutually exclusive logic with priority to USE_VULKAN.
I am thinking if to keep GLES option, should it be a separate Backend or it can be switch inside USE_VULKAN as a fallback?
aten/src/ATen/CMakeLists.txt
Outdated
| if(UNIX AND NOT APPLE) | ||
| IF (USE_VULKANGL) | ||
| list(APPEND ATen_VULKANGL_DEPENDENCY_LIBS EGL GLESv3) | ||
| ENDIF() | ||
|
|
||
| IF (UNIX AND NOT APPLE) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like our convention is to use lowercase if and endif.
| namespace at { | ||
| namespace detail { | ||
|
|
||
| C10_REGISTER_GUARD_IMPL(VULKAN, VULKANGuardImpl); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if this should be VulkanGuardImpl. All of the others (CPU, CUDA) are acronyms.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My capitalization here was based as in DeviceType.h all DeviceTypes are capitalized, so I added it as VULKAN and Device is VULKAN as , GUARD_IMPL is for Device and I used the same capitalization here. I prefer 'Vulkan' everywhere - in Device,DeviceType and here. Do you think it will be ok to have DeviceType 'Vulkan'?
| @@ -0,0 +1,64 @@ | |||
| #pragma once | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any reason to put this class in a .h file instead of all in the .cpp?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I copied CPUGuardImpl approach. I will try to move it to cpp.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it can be moved all to cpp, no implicit codegen usage of headers :)
| namespace vulkan { | ||
| namespace debug { | ||
|
|
||
| void vk_print(const char* m, const float* t, uint32_t rank, uint32_t* dims) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't we already have generic code for printing tensors?
| #ifdef __ANDROID__ | ||
| #include <android/log.h> | ||
| // __android_log_print(ANDROID_LOG_ERROR, "AGPU", format, ##__VA_ARGS__) | ||
| #define AGPU_ERROR(format, ...) printf(format, ##__VA_ARGS__) | ||
| // __android_log_print(ANDROID_LOG_INFO, "AGPU", format, ##__VA_ARGS__) | ||
| #define APRINT(format, ...) printf(format, ##__VA_ARGS__) | ||
|
|
||
| #define FUNC_PRINT(x) APRINT(#x "=%d in %s, %d \n", x, __func__, __LINE__); | ||
| #define FUNC_PRINT_ALL(x, type) \ | ||
| APRINT(#x "=" #type " %" #type " in %s, %d \n", x, __func__, __LINE__); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we use glog rather than android/printf directly?
|
|
||
| static const bool enableValidationLayers = true; | ||
|
|
||
| class AVKContext; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the "A" here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wanted some naming separation between internal wrappers and core vulkan functions, as they all start from vk or Vk.
Probably just using Context in proper namespace should be enough.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have renamed internal classes for GLES with 'GL' prefix and for Vulkan with 'V'
c10/core/Layout.h
Outdated
|
|
||
| namespace c10 { | ||
| enum class Layout : int8_t { Strided, Sparse, Mkldnn }; | ||
| enum class Layout : int8_t { Strided, Sparse, Mkldnn, Vulkan }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you think of a more specific name like Texture4C, since we're thinking of having buffer-based Vulkan tensors that don't use this layout.
cmake/Dependencies.cmake
Outdated
| # add_subdirectory(${CMAKE_CURRENT_LIST_DIR}/../third_party/Serenity) | ||
| # list(APPEND Caffe2_DEPENDENCY_LIBS Orion) | ||
|
|
||
| if(NOT ANDROID) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How hard is it to remove this restriction?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should not be hard - at the moment it tries to find vulkan, vulkan_wrapper, shaderc from android ndk path.
We just need to support case to get vulkan and vulkan_wrapper from VULKAN_SDK, shaderc clonning from github and building.
That desktop setup already exists inside Ashkans library.
01cf3b7 to
b61613d
Compare
dzhulgakov
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| @@ -0,0 +1,64 @@ | |||
| #include <c10/core/impl/DeviceGuardImplInterface.h> | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
move the entire VulkanGuardImpl.cpp under vulkan/ (or native/vulkan)
| dispatch: | ||
| CPU: dense_to_mkldnn | ||
|
|
||
| - func: to_vulkan(Tensor self) -> Tensor |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do you need it? The reason it exists for mkldnn is because it's not a device, it's a layout.
For vulkan given that it's a device we should just use to('vulkan'). It should be handled somewhere in copy_ implementation: https://codebrowser.bddppq.com/pytorch/pytorch/aten/src/ATen/native/TensorConversions.cpp.html#44 (which you should implement too)
| template <typename T> | ||
| struct CAFFE2_API IntrusivePtrTargetWrapper : c10::intrusive_ptr_target { | ||
| private: | ||
| T target_; | ||
|
|
||
| public: | ||
| IntrusivePtrTargetWrapper() = delete; | ||
| IntrusivePtrTargetWrapper(const T& target) : target_(target) {} | ||
| IntrusivePtrTargetWrapper(T&& target) : target_(std::move(target)) {} | ||
|
|
||
| T& get_target() { | ||
| return target_; | ||
| } | ||
| }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, you can just make VTensor inherit from intrusive_ptr_target directly and remove one level of unwrapping. In case of mkldnn they are wrapping the external library and thus need an indirection.
| using VTensorWrapper = IntrusivePtrTargetWrapper<VTensor>; | ||
| using VTensorWrapperPtr = c10::intrusive_ptr<VTensorWrapper>; | ||
| using VulkanTensorImpl = OpaqueTensorImpl<VTensorWrapperPtr>; | ||
| using VulkanTensor = at::Tensor; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe not have it at all - it's more confusing than helpful and conflicts with at::native::vulkan::details::vulkan::VulkanTensor
|
|
||
| using VTensorWrapper = IntrusivePtrTargetWrapper<VTensor>; | ||
| using VTensorWrapperPtr = c10::intrusive_ptr<VTensorWrapper>; | ||
| using VulkanTensorImpl = OpaqueTensorImpl<VTensorWrapperPtr>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you can make VTensor copiable (because we technically support "shallow copies" of tensors with "shared storage") then you don't need this indirection. (I don't even recall exactly why we allow shallow copies for opaque tensors to be honest)
| SparseCPU: sparse_to_dense | ||
| SparseCUDA: sparse_to_dense | ||
| MkldnnCPU: mkldnn_to_dense | ||
| Vulkan: vulkan_to_dense |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
vulkan_to_dense should be a noop (and similarly to regular Dense it's not even declared here).
unlike mkldnn, vulkan is a device, not layout. So you should just put this transfer logic into copy_
| } // namespace native | ||
| } // namespace at | ||
|
|
||
| #else |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe you can change the codegen to avoid ifdefs here? E.g. for CUDA I think we just don't even generate dispatch if compiled without CUDA. We should do the same for vulkan (not I wonder why we didn't do it for mkldnn, lol)
c10/core/Layout.h
Outdated
|
|
||
| namespace c10 { | ||
| enum class Layout : int8_t { Strided, Sparse, Mkldnn }; | ||
| enum class Layout : int8_t { Strided, Sparse, Mkldnn, Texture4C }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is Vulkan going to support multiple layouts? If not - I'd suggest to just use 'Strided' (aka Dense) which is kind of default. Our layouts are sadly not super clean, but it's less modifications this way
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this diff we have only one opaque texture layout.
The next step will be to add Strided buffer layout.
I think we will expose 'Strided' buffer layout for users, to write custom ops, to support all the logic etc.
I do not know if we want to expose Texture4c, when we have 'Strided', probably that will be our internal implementation details. Anyway we will need conversion strided buffer -> texture to not write stride-iteration logic inside shaders for ops. But maybe @dreiss , @AshkanAliabadi have different opinion about it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, interesting. Do we want to allow users to explicitly convert to/from Texture4c? Can there be a kernel that takes arguments in different format (one in Texture4c and one in strided)?
Basically we need to decide which option of leveraging current tensor primitives makes most sense:
- make Texture4c choice completely opaque, e.g. some operators return textures instead of buffers but it's hidden in TensorImpl details. So that if I call contiguous or anything else the texture gets silently converted to buffer
- make Texture4c the layout as you do here. In this case it'd be user-visible and the user can manipulate conversion back and forth explicitly
- make Texture4c separate device. It's almost the same as layout, but we can leverage dispatcher to model mixed kernels and new textures allocation would go a bit of a different path (with its own allocator). You'd be also able to do
.to('vulcan-texture')instead of a separate method
Above is an important decision to make - feel free to ping me offline too for a detailed discussion
c10/core/TensorOptions.h
Outdated
| return DispatchKey::MSNPUTensorId; | ||
| case DeviceType::XLA: | ||
| return DispatchKey::XLATensorId; | ||
| // IKTODO? Is it right to have (Dense - Vulkan) here ? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, as mentioned above we probably should map to strided directly. I recall you mentioning earlier that vulkan actually has two different layouts (texture vs buffer) - do we want to expose it to the user or make completely implicit?
b61613d to
3474c63
Compare
Thanks a lot for your comments! I will start stacking commits on this to apply them. Vulkan/GLES part at the moment is trivial, just to have end-to-end testing and starting bits for most of the moving parts. I was thinking about landing plan. What do you think about landing 'opaque' tensor first with very limited functionality? |
|
General plan of landing opaque tensor with minimal support (like allocation / copy) indeed makes sense. But the first PR doesn't have to be too small either. It's important to first close on how we want to represent textures/buffers in the system so we don't need changing it down the road. |
dzhulgakov
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The general C++ API looks pretty reasonable
| @@ -0,0 +1,1152 @@ | |||
| #ifdef USE_VULKAN | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was commenting somewhere already I think - you can make file inclusion conditional in the build and then no need to have #ifdefs in every file
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I am doing split what to compile:
in aten/src/ATen/CMakeLists.txt:
if(USE_VULKAN OR USE_GLES)
set(all_cpu_cpp ${all_cpu_cpp} ${native_vulkan_cpp} ${vulkan_generated_cpp})
else()
set(all_cpu_cpp ${all_cpu_cpp} ${native_vulkan_stub_cpp})
endif()
I will configure it there and remove ifdefs.
|
|
||
| #endif | ||
|
|
||
| using VTensorPtr = c10::intrusive_ptr<VTensor>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
now that you have shared_ptr inside VulkanTensor (i.e. it's copyable) - you don't need an extra indirection here -> just OpaqueTensorImpl (though you might need to do the same for GLTensor)
| @@ -0,0 +1,48 @@ | |||
| #if !defined(USE_VULKAN) && !defined(USE_GLES) | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as mentioned earlier - better to just modify codegen (like we do for cuda) to not generate registrations if not building with vulcan
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have added --vulkan arg to aten/gen.py the same way as --rocm, that without --vulkan no generations for Vulkan backend.
But is_vulkan_available that is registered for CPU backend still needs this stub, so I left it here, probably that can be solved in some other place that will be enabled only when cmake USE_VULKAN
f729836 to
cfb5c13
Compare
aten/src/ATen/CMakeLists.txt
Outdated
| file(GLOB native_gles_cpp | ||
| "native/vulkan/VulkanAten.cpp" | ||
| "native/vulkan/VulkanGuardImpl.cpp" | ||
| "native/vulkan/gl/*.cpp") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't need to be in this diff, but we should separate Vulkan from GLES if possible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Deleting GLES from this PR
aten/src/ATen/native/vulkan/Vulkan.h
Outdated
| // in host visible memory that can be memory mapped to CPU memory. | ||
| // | ||
| // 1. VImage(TexC4) - (wrapper on vulkan VkImage), optional representation of | ||
| // tensors with dimension <= 4 as VkImage, sed in shaders as texture or storage |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"sed"?
| class VContext; | ||
| const VContext& context(); | ||
|
|
||
| // VulkanTensor is a handle that holds shared pointer to VulkanTensor:Impl, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I really like this comment. I think it's going to be important to keep this up-to-date to ensure that we're always clear about how tensors are physically represented. I'm having bit of a hard time understanding the specifics of the dimension indexing. Let's have a follow up session to go over the wording below.
| std::shared_ptr<Impl> impl(); | ||
| std::shared_ptr<const Impl> impl() const; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would consider dropping these.
1/ You have a number of extra lines of code to handle declaraion, definition, const and non-const.
2/ They return by value, which probably results in in shared_ptr copies, which require atomic ops.
3/ As long as they are private, they aren't really buying you any protection.
aten/src/ATen/native/vulkan/Vulkan.h
Outdated
| uint32_t queueFamilyIndex_; | ||
| bool enableValidationLayers_; | ||
| VkCommandPool commandPool_; | ||
| }; // class VContext |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd drop this comment. Looks like it's not in common use in our codebase.
| if GLSL_DIR_PATH is None: | ||
| raise Exception("") | ||
|
|
||
| if GLSLC_PATH is None: | ||
| raise Exception("") | ||
|
|
||
| if OUTPUT_DIR_PATH is None: | ||
| raise Exception("") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
argparse supports required=True so you don't need explicit checks for these.
| if __name__ == '__main__': | ||
| parser = argparse.ArgumentParser(description='') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One consequence of putting code directly here is that variables like "parser" become visible from every function in this file, which is probably not what you want. I prefer
def main(argv):
# blah
if __name__ == "__main__":
sys.exit(main(sys.argv))
Then all of my variables are local to main.
cmake/Codegen.cmake
Outdated
| if(INTERN_BUILD_MOBILE) | ||
| list(APPEND CUSTOM_BUILD_FLAGS --backend_whitelist CPU QuantizedCPU) | ||
| if(USE_VULKAN OR USE_GLES) | ||
| message(STATUS "XXX VULKAN") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove
cmake/Dependencies.cmake
Outdated
| else() | ||
| # USE_VULKAN AND NOT ANDROID | ||
| if(NOT DEFINED ENV{VULKAN_SDK}) | ||
| message(FATAL_ERROR "USE_VULKAN requires environment var VULKAN_SDK set") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Extra space before VULKAN_SDK. :)
| std::vector<std::unique_ptr<BaseOp>> ops; | ||
| }; | ||
|
|
||
| class MobileNetV2 : public OpsList { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't MobileNetV2 have residual connections?
dreiss
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the commit message, are the ON/OFF descriptions for USE_VULKAN_SHADERC_RUNTIME swapped?
Any reason the files are called "spv" instead of "spirv"?
Looks good. Lots of work to do, but this should be safe. Let's land it and start making improvements in-tree.
7662daf to
7c995d9
Compare
facebook-github-bot
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@IvanKobzarev has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why exactly do you need it? to auto-contiguous the tensors? I wonder whether it's better to error out instead and fix it at higher levels of API.
At the very least - add a comment here.
cc @ezyang fyi
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, please don't do this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have prepared fix in separate PR: #39019 (introduced empty_strided_vulkan and after that these 'ifs' can be removed, I added them before decision to use Strided layout in Vulkan and have not cleaned them after switch )
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do you need it here duplicated? it's already in to_impl
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removing it in #39019
7c995d9 to
88d0ee8
Compare
facebook-github-bot
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@IvanKobzarev has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
88d0ee8 to
b2d5f0e
Compare
facebook-github-bot
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@IvanKobzarev has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
|
@IvanKobzarev merged this pull request in b460465. |
Summary: As a follow up for #36491 and last comments on it. Vulkan uses Strided Layout (at the moment strides are not supported, but in plan) empty_strided just forwards to empty_vulkan, ignoring strides params. Removing explicit ifs in TensorConversions that were added before decision to use Strided layout and have not been cleaned after that :( Pull Request resolved: #39019 Differential Revision: D21726480 Pulled By: IvanKobzarev fbshipit-source-id: d465456df248a118bfef441c85280aa0025860cd
This PR contains the initial version of Vulkan (GPU) Backend integration.
The primary target environment is Android, but the desktop build is also supported.
CMake
Introducing three cmake options:
USE_VULKAN:
The main switch, if it is off, all other options do not affect.
USE_VULKAN_WRAPPER:
ON - Vulkan will be used loading it at runtime as "libvulkan.so" using libdl, every function call is wrapped in vulkan_wrapper.h.
OFF - linking with libvulkan.so directly
USE_VULKAN_SHADERC_RUNTIME:
ON - Shader compilation library will be linked, and shaders will be compiled runtime.
OFF - Shaders will be precompiled and shader compilation library is not included.
Codegen
if
USE_VULKAN_SHADERC_RUNTIMEis ON:Shaders precompilation () starts in cmake/VulkanCodegen.cmake, which calls
aten/src/ATen/native/vulkan/gen_glsl.pyoraten/src/ATen/native/vulkan/gen_spv.pyto include shaders source or SPIR-V bytecode inside binary as uint32_t array in spv.h,spv.cpp.if
USE_VULKAN_SHADERC_RUNTIMEis OFF:The source of shaders is included as
glsl.h,glsl.cpp.All codegen results happen in the build directory.
Build dependencies
cmake/Dependencies.cmake
If the target platform is Android - vulkan library, headers, Vulkan wrapper will be used from ANDROID_NDK.
Desktop build requires the VULKAN_SDK environment variable, and all vulkan dependencies will be used from it.
(Desktop build was tested only on Linux).
Pytorch integration:
Adding 'Vulkan" as new Backend, DispatchKey, DeviceType.
We are using Strided layout without supporting strides at the moment, but we plan to support them in the future.
Using OpaqueTensorImpl where OpaqueHandle is copyable VulkanTensor,
more details in comments in
aten/src/ATen/native/vulkan/Vulkan.hMain code location:
aten/src/ATen/native/vulkanaten/src/ATen/native/vulkan/VulkanAten.cpp- connection link between ATen and Vulkan api (Vulkan.h) that converts at::Tensor to VulkanTensor.aten/src/ATen/native/Vulkan/Vulkan.h- Vulkan API that contains VulkanTensor representation and functions to work with it. Plan to expose it for clients to be able to write their own Vulkan Ops.aten/src/ATen/native/vulkan/VulkanOps.cpp- Vulkan Operations Implementations that uses Vulkan.h APIGLSL shaders
Located in
aten/src/ATen/native/vulkan/glslas *.glsl files.All shaders use Vulkan specialized constants for workgroup sizes with ids 1, 2, 3
Supported operations
Code point:
conv2d no-groups
conv2d depthwise
addmm
upsample nearest 2d
clamp
hardtanh
Testing
aten/src/ATen/test/vulkan_test.cpp- contains tests forcopy from CPU to Vulkan and back
all supported operations
Desktop builds supported, and testing can be done on a desktop that has Vulkan supported GPU or with installed software implementation of Vulkan, like https://github.com/google/swiftshader
Vulkan execution
The initial implementation is trivial and waits every operator's execution.