Skip to content

Commit e65b89e

Browse files
nikhil-armpytorchmergebot
authored andcommitted
[Feat]: Improve KleidiAI 4 bit kernel performance (pytorch#146476)
Description: 1. New thread blocking accelerates GEMVs 2. We increase throughput of the lhs quant pack + matmul pipeline by decoupling two operations. 3. The new blocking strategy blocks ```out_feature``` to accelerate GEMVs Perf improvements: 12% speedup in LLM prefill phase and upto 16% speedup in autoregressive phase Perf Benchmarking : pytorch#143289 (comment) Change-Id: Ie574ff8459fdb75701ae366158b4e118c70694e4 Pull Request resolved: pytorch#146476 Approved by: https://github.com/malfet
1 parent 4d626c2 commit e65b89e

File tree

1 file changed

+171
-312
lines changed

1 file changed

+171
-312
lines changed

0 commit comments

Comments
 (0)