ExecuTorch Vulkan Backend on Linux - Request for Documentation/Guidance #8288

csn1800 · 2025-02-05T16:11:03Z

csn1800
Feb 5, 2025

🚀 The feature, motivation and pitch

Hi PyTorch team,

I'm interested in using the ExecuTorch Vulkan backend on Linux. While the documentation mentions the Vulkan delegate being cross-platform, the current guides primarily focus on Android and iOS.

ExecuTorch Vulkan Delegate: https://pytorch.org/executorch/stable/native-delegates-executorch-vulkan-delegate.html
Building and Running ExecuTorch with the Vulkan Backend: https://pytorch.org/executorch/stable/build-run-vulkan.html

I haven't been able to find any specific instructions or examples for building and running ExecuTorch with the Vulkan backend on a Linux platform. Could you please provide any additional information or guidance on this?

Specifically, I'd be grateful if you could address the following:

Are there any known limitations or issues with using the Vulkan delegate on Linux?
Are there any recommended steps or configurations for building ExecuTorch with Vulkan support on Linux?
Are there any example CMakeLists.txt or build scripts that demonstrate how to link against the Vulkan SDK and integrate the Vulkan delegate on Linux?
Any insights into integrating the Vulkan delegate with the ExecuTorch runtime on Linux would be greatly appreciated.

Any help or pointers you can provide would be extremely helpful. Thank you for your time and consideration.

Alternatives

No response

Additional context

No response

RFC (Optional)

No response

cc @mergennachin @byjlw @SS-JIA @manuelcandales

Answered by SS-JIA

Feb 5, 2025

The Vulkan Delegate was started last year, and the focus in the first year was building the core components of the platform and adding initial implementations of several operators. The focus for this year will be optimization, both for latency and memory consumption. In particular, my specific focus this year will be to optimize 4-bit weight quantized matrix multiplication to improve performance on Transformer models.

Integer computations, such as embedding layers for text/speech models

We are currently working on optimizing weight quantized operators, but that may be different than what you mean here. In these quantized shaders a quantized weight value is converted back into a floating…

View full answer

SS-JIA · 2025-02-05T20:16:47Z

SS-JIA
Feb 5, 2025
Collaborator

Hi @csn1800, thank you for your interest in the Vulkan delegate!

As you've pointed out, the Vulkan delegate currently has a focus on Edge/Mobile SoCs. However, it should also be possible to access and run the Vulkan delegate on a Linux machine. In fact, we use Linux as a development environment when developing in the Meta internal development repository so I can confirm that it works. I have also been able to build and run tests for the Vulkan delegate via the open source repository.

Are there any known limitations or issues with using the Vulkan delegate on Linux?

To my knowledge, there are no known issues at the moment. However, our compute shaders, especially shaders for bottleneck operators such as matrix multiplication, convolution, etc. are not optimized for server environments and performance will be poor. We are currently focused on optimizing for mobile GPUs, but I want to eventually add compute shaders that are optimized for NVIDIA GPUs that take advantage of Vulkan extensions such as VK_KHR_cooperative_matrix to access tensor cores. If you are interested in contributing compute kernels optimized for NVIDIA or AMD GPUs on server, please let me know!

Are there any recommended steps or configurations for building ExecuTorch with Vulkan support on Linux?

Here are the steps I use to build and test the Vulkan delegate on Linux:

cd ~/executorch

# Install ExecuTorch with the Vulkan delegate enabled
(rm -rf cmake-out && \                     
  cmake . \
  -DCMAKE_INSTALL_PREFIX=cmake-out \
  -DEXECUTORCH_BUILD_EXTENSION_DATA_LOADER=ON \
  -DEXECUTORCH_BUILD_EXTENSION_MODULE=ON \
  -DEXECUTORCH_BUILD_EXTENSION_RUNNER_UTIL=ON \
  -DEXECUTORCH_BUILD_EXTENSION_TENSOR=ON \
  -DEXECUTORCH_BUILD_KERNELS_OPTIMIZED=ON \
  -DEXECUTORCH_BUILD_KERNELS_CUSTOM=ON \
  -DEXECUTORCH_BUILD_KERNELS_QUANTIZED=ON \
  -DEXECUTORCH_BUILD_TESTS=ON \
  -DEXECUTORCH_BUILD_VULKAN=ON \
  -Bcmake-out && \
  cmake --build cmake-out -j64 --target install)

# Build the Vulkan delegate test binary
(rm -rf cmake-out/backends/vulkan/test && \
  cmake backends/vulkan/test \
    -DCMAKE_INSTALL_PREFIX=cmake-out \
    -DPYTHON_EXECUTABLE=python \
    -Bcmake-out/backends/vulkan/test && \
  cmake --build cmake-out/backends/vulkan/test -j16)

# Run the test binary
cmake-out/backends/vulkan/test/vulkan_compute_api_test --gtest_filter="*print*"

Please try these steps out and let me know if it works for you.

Regarding your last two questions:

Are there any example CMakeLists.txt or build scripts that demonstrate how to link against the Vulkan SDK and integrate the Vulkan delegate on Linux?

Any insights into integrating the Vulkan delegate with the ExecuTorch runtime on Linux would be greatly appreciated.

Are you asking if there's an example of another application linking against the ExecuTorch runtime with the Vulkan SDK? If so, the llama runner example binary may be a good reference. Specifically, here is where I believe they are linking against the vulkan backend.

Please let me know if you have any more questions!

0 replies

SS-JIA · 2025-02-05T20:26:02Z

SS-JIA
Feb 5, 2025
Collaborator

Btw, a similar issue I assisted with recently was #7343. You can find some more information in the discussion in that issue as well, though the steps I provided are the same as the steps I gave in my earlier comment.

0 replies

csn1800 · 2025-02-05T20:54:39Z

csn1800
Feb 5, 2025
Author

Thank you so much for the detailed response and the build steps. I am currently evaluating them in my environment and will update you with my observations as soon as possible.

"To my knowledge, there are no known issues at the moment. However, our compute shaders, especially shaders for bottleneck operators such as matrix multiplication, convolution, etc., are not optimized for server environments and performance will be poor. We are currently focused on optimizing for mobile GPUs."

My work involves ARM platforms with mobile GPUs such as Broadcom Videocore IV, and my question pertains to models for instance segmentation, specifically Detectron models. Additionally, I am interested in understanding the support for models (speech recognition/text) requiring:

Integer computations, such as embedding layers for text/speech models
Models with dynamic graphs, loops, and conditions

Given the focus on mobile GPUs, it would be helpful to know if there are any ongoing or planned optimizations for these types of models on mobile platforms. Insights into any performance benchmarks or best practices for deploying such models would also be valuable.

"Are you asking if there's an example of another application linking against the ExecuTorch runtime with the Vulkan SDK? If so, the llama runner example binary may be a good reference. Specifically, here is where I believe they are linking against the Vulkan backend."

Thank you for sharing the example and the ticket. The scenario I am focused on requires a C++ application to be linked with the ExecuTorch libraries and to use the C++/C APIs directly, without any Python abstraction. Any guidance you could provide in this context would be greatly appreciated. Apologies for not being clear in my earlier post.

Additionally, if there are any specific examples or documentation that detail the process of linking C++ applications with ExecuTorch libraries, that would be extremely helpful. Understanding the nuances of integrating with the Vulkan backend in this setup is crucial for my project.

0 replies

SS-JIA · 2025-02-05T22:50:51Z

SS-JIA
Feb 5, 2025
Collaborator

The Vulkan Delegate was started last year, and the focus in the first year was building the core components of the platform and adding initial implementations of several operators. The focus for this year will be optimization, both for latency and memory consumption. In particular, my specific focus this year will be to optimize 4-bit weight quantized matrix multiplication to improve performance on Transformer models.

Integer computations, such as embedding layers for text/speech models

We are currently working on optimizing weight quantized operators, but that may be different than what you mean here. In these quantized shaders a quantized weight value is converted back into a floating point value and the computation is performed in floating point.

We currently do not have plans to work on any operators that perform only integer compute. However, if you are interested in adding compute shaders to the delegate to support your use-case, I can help guide you through that process.

Models with dynamic graphs, loops, and conditions

This will be a lot tougher. We currently do not support dynamic graphs with loops and conditions. The ideal scenario for Vulkan delegate compute is a static graph that can be compiled into a single command buffer that doesn't need to be rebuilt across inferences. However, I'm not opposed to supporting dynamism in the future. What are some examples of the type of dynamism that you would need?

The scenario I am focused on requires a C++ application to be linked with the ExecuTorch libraries and to use the C++/C APIs directly, without any Python abstraction. Any guidance you could provide in this context would be greatly appreciated. Apologies for not being clear in my earlier post.

I see. The llama runner example binary that I called out in my previous comment should be a good reference. Although it is within our repository, to my knowledge it is a C++ binary that treats ExecuTorch as an external dependency that is installed on the system.

Additionally, if there are any specific examples or documentation that detail the process of linking C++ applications with ExecuTorch libraries, that would be extremely helpful. Understanding the nuances of integrating with the Vulkan backend in this setup is crucial for my project.

Unfortunately, I'm not super familiar with examples of an external C++ application linking to ExecuTorch libraries. Tagging some folks who might be able to provide a pointer: @mergennachin @kirklandsign @larryliu0820

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ExecuTorch Vulkan Backend on Linux - Request for Documentation/Guidance #8288

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 4 comments

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

Select a reply

ExecuTorch Vulkan Backend on Linux - Request for Documentation/Guidance #8288

csn1800 Feb 5, 2025

🚀 The feature, motivation and pitch

Alternatives

Additional context

RFC (Optional)

Replies: 4 comments

SS-JIA Feb 5, 2025 Collaborator

SS-JIA Feb 5, 2025 Collaborator

csn1800 Feb 5, 2025 Author

SS-JIA Feb 5, 2025 Collaborator

csn1800
Feb 5, 2025

SS-JIA
Feb 5, 2025
Collaborator

SS-JIA
Feb 5, 2025
Collaborator

csn1800
Feb 5, 2025
Author

SS-JIA
Feb 5, 2025
Collaborator