sync : llama.cpp#1476
Conversation
* MoE Mxfp4 CLC kernel added, router reorder on GPU * Pass test-backend-ops for MoE mxfp4 Adreno CLC * remove putenv in llama-model.cpp * fix indent style and whitespace * opencl: remove unnecessary headers * opencl: do not save cl_program objects * opencl: remove unnecessary assert * fix precision issue --------- Co-authored-by: Li He <lih@qti.qualcomm.com>
…irely) (llama/22533) * fix: CUDA device PCI bus ID detection for multi-GPU de-dupe * HIP, MUSA macros --------- Co-authored-by: Johannes Gäßler <johannesg@5d6.de>
* shader(norm): add layer norm ops * shader(norm): stablize floating point computation with Kahan summation and handle mixed types * shader(norm): remove the non-contiguous strides * shader(norm): use the original implementation rather than the kahan summation
* llama : add option to save memory in device buffers * tests : extend llama-save-load-state
79dc8b5 to
8914191
Compare
|
@taronaeo The python deps are failing to install on the self-hosted runner: https://github.com/ggml-org/ggml/actions/runs/25366621707/job/74379012178#step:3:155 Not sure how to fix it. Any ideas? |
This Line 11 in ac6f7b4 Line 4 in ac6f7b4 torchvision deprecated support for torch 2.5.X and since we are forcing torch to match versions within 2.5.X, pip can't find a suitable version to install.
I would suggest that we update and align the version requirements with llama.cpp (i.e., bumping torch version to This problem is isolated to the GGML repository only. |
No description provided.