-
Notifications
You must be signed in to change notification settings - Fork 11.7k
Vulkan implementation (via Kompute) #2039
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
👀 Other than reaching more platforms, what advantages would this Vulkan implementation have over OpenCL? Better performance? Better support for IGPs? Support for the matmul acceleration in Arc/RDNA3? |
OpenCL and OpenGL are basically deprecated and Vulkan is the replacement. Better driver support in newer GPUs etc. |
In case it helps anyone build this: Make sure you set up the Kompute submodule:
There currently isn't support for building with the I wasn't able to compile Kompute due to compile warnings, so it was necessary to edit Right now it seems like it just skips any operations that aren't implemented for GPU so you can't run inference on a model yet. Fun fact: Turns out evaluating models is really fast if you skip all the matrix multiplication operations! |
In addition to what @SlyEcho answered, my view is that since we now have a relatively good way to implement decoupled GPU backends (CUDA, Metal, OpenCL, etc.), there is no good reason not to keep doing it. As long as the new implementations do not touch the core In the long run, we can eventually obsolete and deprecate certain implementations, but for now the main goal is to experiment and explore the most efficient ways for hardware acceleration. Supporting more backends also helps to find certain good patterns in the implementations and even though currently there is a lot of "copy-paste", at some point we can think about the best way to consolidate the code and reuse the best techniques that we have found across all architectures. |
I can't get both this PR and #2059 to work. I wanted to run a perplexity test with In the case of second implementation, I can't even get to filling my ram as it's just crashes on me with I am using RX 580 (RADV driver, Polaris 10 architecture). On the side note, OpenCL works, but prompt ingestion is painfully slow. |
@Firstbober AMD GPUs have fewer queues, it seems. The initialization code of my implementation didn't work with that yet. Can you try again? Don't expect any good performance yet, though. |
It did work this time after your patch! I needed to remove the validation layer as the amount of errors it produced was too much. Two major ones were After making the output readable, I can clearly see my GPU working and ingesting the prompt. I wasn't able to test the perplexity of It seems to be very close to OpenCL. I ran the test with
OpenCL:
Overall, Vulkan used my VRAM much better, while OpenCL allocated all of it and even didn't use my GPU as much as your Vulkan implementation. |
I couldn't get this to work, which throws some werid error "fatal error C1189: #error: You I also came around another project which fully uses Vulkan, and I tested on Windows with Intel integrated GPU. Speed is quite plausible but the good thing is it fully used the integrated GPU so PC doesn't feel slow anymore, otherwise CPU will be at 100% https://mlc.ai/mlc-llm/ |
Yeah, MLC uses Apache TVM's Vulkan backend. It features are pretty barebones, and it doesn't support any model splitting, CPU offloading, or anything like K Quants, but it is very fast and relatively portable once compiled. If you are using MSVC or something, my guess is you need to compile in WSL? |
You could try reporting this issue to w64devkit, but it might be a fundamental compiler limitation. |
I think the C1189 is a MSVC error. Not sure whose fault thought at this moment |
The error/warning seems to come from the fmt library build, using the system's fmt library with |
[Jul 12 2023 15:12:23] [warn] [/home/river/LLM/llama.cpp/kompute/src/Manager.cpp:231] Kompute Manager no valid layer names found from desired layer names What's problem about this? |
Hey!
This is an attempt for a Vulkan implementation in GLSL via Kompute. The use of Kompute instead of using Vulkan directly is to avoid thousands of lines of hard to maintain boilterplate code.
I mostly base the code on the current Metal implementation and approach.
Already implemented
ADD
(Untested)MUL
(Untested)SCALE
(Test passed) ✅SILU
(Test passed) ✅RELU
(Test passed) ✅GELU
(Test passed) ✅SOFT_MAX
(Test failed) ❌DIAG_MASK_INF
(likely broken)MUL_MAT
(still needs fallback for small sizes)F16
(Unused)Q4_0
(Test failed) ❌Q4_1
(Test passed) ✅GET_ROW
F16
(Unused)Q4_0
(Test passed) ✅Q4_1
(Test failed) ❌NORM
(Unused)RMS_NORM
(Test passed) ✅ROPE
(Test passed) ✅CPY
(Test passed) ✅TODO things before merge