-
Notifications
You must be signed in to change notification settings - Fork 234
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature Request] Add dynamic kernel selection to torchao/experimental #1376
Comments
@supriyar @msaroufim an external contributor reached out to me and expressed interest in this project (https://github.com/Darshcg). How can I tag him on the issue? |
Thank you @metascroy for creating this issue and adding me. Hi @supriyar @msaroufim, I am Darshan, an AI Inference and Compiler Engineer. Over the past few years, I have been working more on optimizing quantized (low-bit) and sparse inference for X86 and ARM targets, primarily with QNNPACK and XNNPACK. Although I have not contributed to open-source projects yet, you can explore some of my related work here https://scholar.google.com/citations?user=GUaOqIIAAAAJ&hl=en On last Saturday, I attended Scott’s session on low-bit inference(via CUDA mode) and reached out to him regarding potential contributions and collaborations. That led me here, I look forward to working on this with you all, solving problems, and learning new stuff. Thanks! |
Wonderful! I'm glad you two got to meet. At a high level your plan sounds reasonable, much easier to give feedback once we have some sample PRs we can take a look at! |
Awesome @Darshcg! Maybe the first thing to do is study the current ukernel selection logic, look at what other libraries like XNNPACK do, and put together some sample prototype PRs or an RFC/design proposal? We can then chime in and give feedback? |
For sure @metascroy! I will go through the existing ukernel selection implementation in torchao/experimental and compare it with XNNPACK's kernel selection mechanisms. And I aim to identify the differences, improvements, and any best practices we could adopt. Based on this I will come up with a draft design proposal/sample PR for review and feedback. Thanks! |
@Darshcg are you still working on this? |
Started draft PR here: draft ukernel selection logic #1652 |
Thank you a lot @metascroy! Thank you for taking the time for the call to discuss the tasks regarding the dynamic kernel selection and the progress. I will make sure to finish it before next week. |
Currently in torchao/experimental, we use a ukernel config to identify the function pointers to use in the linear operator.
During runtime, we select ukernel config to use, but the current logic is very simplistic. This is partially because we currently only have one kind of kernel.
But if we wish to support more kernels in future (e.g., GEMM kernels, kernels from KleidiAI, kernels based on i8mm), we need a better ukernel config selection mechanism.
We'd like to select an appropriate ukernel based on features like CPU uarch, activation size, and packing format. We can use CPU info to get CPU uarch. The feature request here is to design an efficient dynamic kernel selection infrastructure. XNNPACK has a similar feature implemented.
For now, we will select the ukernel config based on the CPU, but in future we might want to extend the design to select a different ukernel config based on the CPU core.
cc @digantdesai @kimishpatel @supriyar @msaroufim
The text was updated successfully, but these errors were encountered: