ggml : fix arm build #10890

slaren · 2024-12-18T18:42:11Z

Uses -mcpu=native with GGML_NATIVE for arm, but additionally check for dotprod and i8mm because not every compiler enables them
Adds option GGML_CPU_ARM_ARCH that can be used to specify the architecture when GGML_NATIVE is disabled
Removes GGML_SVE, use -DGGML_NATIVE=OFF -DGGML_CPU_ARM_ARCH=armv8.6-a+sve for the same effect
Adds flags for i8mm and dotprod to ggml_backend_cpu_get_features
Removes MSVC support for ARM, use clang instead

Supersedes #10752

Signed-off-by: Adrien Gallouët <[email protected]>

ggerganov

Looks good:

M1 Pro

-- ARM detected
-- Performing Test COMPILER_SUPPORTS_FP16_FORMAT_I3E
-- Performing Test COMPILER_SUPPORTS_FP16_FORMAT_I3E - Failed
-- Performing Test GGML_COMPILER_SUPPORT_DOTPROD
-- Performing Test GGML_COMPILER_SUPPORT_DOTPROD - Success
-- Performing Test GGML_COMPILER_SUPPORT_I8MM
-- Performing Test GGML_COMPILER_SUPPORT_I8MM - Failed
-- ARM feature DOTPROD enabled
-- ARM feature FMA enabled
-- ARM feature FP16_VECTOR_ARITHMETIC enabled
-- Adding CPU backend variant ggml-cpu: -march=native+dotprod __ARM_FEATURE_DOTPROD

M2 Ultra

-- ARM detected
-- Performing Test COMPILER_SUPPORTS_FP16_FORMAT_I3E
-- Performing Test COMPILER_SUPPORTS_FP16_FORMAT_I3E - Failed
-- Performing Test GGML_COMPILER_SUPPORT_DOTPROD
-- Performing Test GGML_COMPILER_SUPPORT_DOTPROD - Success
-- Performing Test GGML_COMPILER_SUPPORT_I8MM
-- Performing Test GGML_COMPILER_SUPPORT_I8MM - Success
-- ARM feature DOTPROD enabled
-- ARM feature MATMUL_INT8 enabled
-- ARM feature FMA enabled
-- ARM feature FP16_VECTOR_ARITHMETIC enabled
-- Adding CPU backend variant ggml-cpu: -march=native+dotprod+i8mm __ARM_FEATURE_DOTPROD;__ARM_FEATURE_MATMUL_INT8

ggml/src/ggml-cpu/CMakeLists.txt

slaren · 2024-12-18T19:27:26Z

Note: llamafile sgemm uses vld1q_f16 and vld1_f16 without checking for __ARM_FEATURE_FP16_VECTOR_ARITHMETIC, which causes the build to fail when this feature is not enabled. Previously, there was a hack to enable this feature always when building for Android armv7, which has now been removed. I have disabled llamafile in the android example until this is fixed.

ggml-ci

angt · 2024-12-18T19:56:59Z

Just for information, that way we need to check all required features as -march is really generic.
As an example on my M3, -mcpu=native enable:

$ clang -mcpu=native -dM -E -v - </dev/null 2>&1 | grep -oE '\-target-feature [^ ]+'
-target-feature +v8.5a
-target-feature +aes
-target-feature +crc
-target-feature +dotprod
-target-feature +fp-armv8
-target-feature +fp16fml
-target-feature +lse
-target-feature +ras
-target-feature +rcpc
-target-feature +rdm
-target-feature +sha2
-target-feature +sha3
-target-feature +neon
-target-feature +zcm
-target-feature +zcz
-target-feature +fullfp16

while -march=native only enable:

$ clang -march=native -dM -E -v - </dev/null 2>&1 | grep -oE '\-target-feature [^ ]+'
-target-feature +neon
-target-feature +v8.5a
-target-feature +zcm
-target-feature +zcz

slaren · 2024-12-18T20:52:16Z

Ok, it is using -mcpu now. I think it should be good as is for modern platforms at least, but we might need to add another option to set -mfpu as well.

angt · 2024-12-18T21:06:47Z

Ok, it is using -mcpu now. I think it should be good as is for modern platforms at least, but we might need to add another option to set -mfpu as well.

And many other flags too, especially for cross-compilation, that's why I think we should have a generic flag setting when GGML_NATIVE=OFF and let llama.cpp, whisper.ccp set it for ggml.

slaren · 2024-12-18T21:14:27Z

Which flags are you thinking about?

angt · 2024-12-18T23:02:22Z

Which flags are you thinking about?

Sometimes we need to force -mfloat-abi if it's not forced by the toolchain (like gnuabihf). The -mno-unaligned-access was also used in the removed code.

Cross-compiling is always complicated anyway, some iterations will be needed so having a generic flag could help.

slaren · 2024-12-18T23:22:10Z

It might make more sense to pass these kind of flags with CMAKE_C_FLAGS and CMAKE_CXX_FLAGS, since you would probably want to apply them to the entire program, not just to the CPU backend. The main reason of having GGML_CPU_ARM_ARCH is to allow in the future building multiple versions of the CPU backend with different arch flags, so that we can distribute a single binary that can automatically choose which version of the CPU backend to load at runtime, in a similar way that it is already done for x86. When these flags were added to the CMakeLists.txt, the CPU backend was not built as a separate target, it was a monolithic build with a single target for the entire ggml library, so the flags would apply to everything.

ajiekc905 · 2024-12-19T12:27:14Z

On Android termux Snapdragon 8 gen 1 gives
clang -mcpu=native -dM -E -v - </dev/null 2>&1 | grep -oE '-target-feature [^ ]+'
-target-feature +v9a
-target-feature +am
-target-feature +bf16
-target-feature +ccidx
-target-feature +complxnum
-target-feature +crc
-target-feature +dotprod
-target-feature +ete
-target-feature +flagm
-target-feature +fp-armv8
-target-feature +fp16fml
-target-feature +fullfp16
-target-feature +i8mm
-target-feature +jsconv
-target-feature +lse
-target-feature +mte
-target-feature +neon
-target-feature +pauth
-target-feature +perfmon
-target-feature +ras
-target-feature +rcpc
-target-feature +rdm
-target-feature +sb
-target-feature +ssbs
-target-feature +sve
-target-feature +sve2
-target-feature +sve2-bitperm
-target-feature +trbe
-target-feature +fix-cortex-a53-835769
-target-feature +outline-atomics

With this commit it produces unusable binaries.
Illegal instruction

cat /proc/cpuinfo
processor : 0
BogoMIPS : 38.40
Features : fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 asimdfhm dit uscat ilrcpc flagm ssbs sb paca pacg dcpodp flagm2 frint i8mm bti
CPU implementer : 0x41
CPU architecture: 8
CPU variant : 0x0
CPU part : 0xd46
CPU revision : 2

processor : 1
BogoMIPS : 38.40
Features : fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 asimdfhm dit uscat ilrcpc flagm ssbs sb paca pacg dcpodp flagm2 frint i8mm bti
CPU implementer : 0x41
CPU architecture: 8
CPU variant : 0x0
CPU part : 0xd46
CPU revision : 2

processor : 2
BogoMIPS : 38.40
Features : fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 asimdfhm dit uscat ilrcpc flagm ssbs sb paca pacg dcpodp flagm2 frint i8mm bti
CPU implementer : 0x41
CPU architecture: 8
CPU variant : 0x0
CPU part : 0xd46
CPU revision : 2

processor : 3
BogoMIPS : 38.40
Features : fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 asimdfhm dit uscat ilrcpc flagm ssbs sb paca pacg dcpodp flagm2 frint i8mm bti
CPU implementer : 0x41
CPU architecture: 8
CPU variant : 0x0
CPU part : 0xd46
CPU revision : 2

processor : 4
BogoMIPS : 38.40
Features : fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 asimdfhm dit uscat ilrcpc flagm ssbs sb paca pacg dcpodp flagm2 frint i8mm bti
CPU implementer : 0x41
CPU architecture: 8
CPU variant : 0x2
CPU part : 0xd47
CPU revision : 0

processor : 5
BogoMIPS : 38.40
Features : fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 asimdfhm dit uscat ilrcpc flagm ssbs sb paca pacg dcpodp flagm2 frint i8mm bti
CPU implementer : 0x41
CPU architecture: 8
CPU variant : 0x2
CPU part : 0xd47
CPU revision : 0

processor : 6
BogoMIPS : 38.40
Features : fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 asimdfhm dit uscat ilrcpc flagm ssbs sb paca pacg dcpodp flagm2 frint i8mm bti
CPU implementer : 0x41
CPU architecture: 8
CPU variant : 0x2
CPU part : 0xd47
CPU revision : 0

processor : 7
BogoMIPS : 38.40
Features : fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 asimdfhm dit uscat ilrcpc flagm ssbs sb paca pacg dcpodp flagm2 frint i8mm bti
CPU implementer : 0x41
CPU architecture: 8
CPU variant : 0x2
CPU part : 0xd48
CPU revision : 0

Jimex · 2024-12-19T13:14:59Z

I encounterred the error "Failed to get ARM features" when I tried to compile the Android example with this build.

gustrd · 2024-12-19T15:07:46Z

#On Android termux Snapdragon 8 gen 1 gives
#clang -mcpu=native -dM -E -v - </dev/null 2>&1 | grep -oE '-target-feature [^ ]+'

With Snapdragon 8 gen 1 I can only build with -march=native+nosve . Seems like it's sve implementation is bad.

https://x.com/never_released/status/1628885404785991683?ref_src=twsrc%5Etfw%7Ctwcamp%5Etweetembed%7Ctwterm%5E1629432494532509697%7Ctwgr%5E73eda2d668d483f5440a37d5d4b204b45b2a1fcb%7Ctwcon%5Es3_&ref_url=https%3A%2F%2Fforums.anandtech.com%2Fthreads%2Fqualcomm-snapdragon-thread.2616013%2Fpage-4

slaren · 2024-12-19T15:27:32Z

Maybe we could add more tests for features (SVE), and disable them explicitly with the no flag if the test fails.

* ggml: GGML_NATIVE uses -mcpu=native on ARM Signed-off-by: Adrien Gallouët <[email protected]> * ggml: Show detected features with GGML_NATIVE Signed-off-by: Adrien Gallouët <[email protected]> * remove msvc support, add GGML_CPU_ARM_ARCH option * disable llamafile in android example * march -> mcpu, skip adding feature macros ggml-ci --------- Signed-off-by: Adrien Gallouët <[email protected]> Co-authored-by: Adrien Gallouët <[email protected]>

sandrohanea · 2025-01-11T14:35:14Z

Hey @slaren , @ggerganov ,

I see that MSVC support for ARM was removed in this PR. Is there any plan to bring it back in the future or what was the reason for this removal?

Thank you!

Couldn't make it work yet with the LLVM / Clang (tried with both Ninja Generator and VS generator).

slaren · 2025-01-11T14:45:16Z

MSVC does not support inline assembly that is used for some ARM kernels. You will get worse performance, and clang is widely available alongside VS, so I don't think it is worth the effort that it takes to continue supporting MSVC.

sandrohanea · 2025-01-11T14:53:11Z

MSVC does not support inline assembly that is used for some ARM kernels. You will get worse performance, and clang is widely available alongside VS, so I don't think it is worth the effort that it takes to continue supporting MSVC.

Thanks for the quick response!

Will continue the effort to switch to clang for my ARM builds.

Is there any plan to do the same for other arch?

slaren · 2025-01-11T14:59:11Z

I don't intend to remove MSVC support for x86, but using clang is still a good idea, performance tends to be slightly better, but there is no inline assembly in the x86 code, so the difference won't be as significant.

sandrohanea · 2025-01-12T14:30:36Z

I don't intend to remove MSVC support for x86, but using clang is still a good idea, performance tends to be slightly better, but there is no inline assembly in the x86 code, so the difference won't be as significant.

Short update:

Managed to finish the build for clang, it was some problem with the parent CMakeLists.txt that was adding some MSVC specific flags.
Both Arm64 and x86/x64 builds are working with Clang
However, x86/x64 whisper.cpp seems to deadlock at runtime based on the tests: https://github.com/sandrohanea/whisper.net/actions/runs/12733852333
https://github.com/sandrohanea/whisper.net/actions/runs/12726248502

Reverting to MSVC fixes the problem: https://github.com/sandrohanea/whisper.net/actions/runs/12733915488 with only commit change being: sandrohanea/whisper.net@288ce85

Unfortunately, I don't have an arm64 machine / agent to test if the deadlock is reproducing for arm64 builds of clang.

Do you have any idea what can cause the deadlock for clang builds?

* ggml: GGML_NATIVE uses -mcpu=native on ARM Signed-off-by: Adrien Gallouët <[email protected]> * ggml: Show detected features with GGML_NATIVE Signed-off-by: Adrien Gallouët <[email protected]> * remove msvc support, add GGML_CPU_ARM_ARCH option * disable llamafile in android example * march -> mcpu, skip adding feature macros ggml-ci --------- Signed-off-by: Adrien Gallouët <[email protected]> Co-authored-by: Adrien Gallouët <[email protected]>

angt and others added 3 commits December 18, 2024 08:13

ggml: GGML_NATIVE uses -mcpu=native on ARM

7eb81e1

Signed-off-by: Adrien Gallouët <[email protected]>

ggml: Show detected features with GGML_NATIVE

1dae1d8

Signed-off-by: Adrien Gallouët <[email protected]>

remove msvc support, add GGML_CPU_ARM_ARCH option

6f6794f

github-actions bot added the ggml changes relating to the ggml tensor library for machine learning label Dec 18, 2024

slaren mentioned this pull request Dec 18, 2024

ggml: GGML_NATIVE uses -mcpu=native on ARM #10752

Closed

ggerganov approved these changes Dec 18, 2024

View reviewed changes

angt reviewed Dec 18, 2024

View reviewed changes

ggml/src/ggml-cpu/CMakeLists.txt Outdated Show resolved Hide resolved

disable llamafile in android example

0a4b79c

github-actions bot added android Issues specific to Android examples labels Dec 18, 2024

march -> mcpu, skip adding feature macros

d4b1259

ggml-ci

slaren linked an issue Dec 18, 2024 that may be closed by this pull request

Misc. bug: Q4_0 with runtime repacking not working as expected (TYPE_Q4_0_4_4 REMOVED) #10757

Closed

slaren merged commit 9177484 into master Dec 18, 2024
55 of 56 checks passed

slaren deleted the sl/fix-arm-build branch December 18, 2024 22:21

ekcrisp mentioned this pull request Dec 18, 2024

Add an option to enable --runtime-repack in llama.cpp abetlen/llama-cpp-python#1860

Closed

ag2s20150909 mentioned this pull request Dec 23, 2024

Compile bug: Android example compilation failure after commit b4357. #10952

Closed

sandrohanea mentioned this pull request Jan 11, 2025

WIP v1.7.5 sandrohanea/whisper.net#319

Merged

ggerganov mentioned this pull request Apr 5, 2025

ggml : simplify Arm fp16 CPU logic ggml-org/ggml#1177

Merged

ggml : fix arm build #10890

ggml : fix arm build #10890

Uh oh!

Conversation

slaren commented Dec 18, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ggerganov left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

slaren commented Dec 18, 2024

Uh oh!

angt commented Dec 18, 2024

Uh oh!

slaren commented Dec 18, 2024

Uh oh!

angt commented Dec 18, 2024

Uh oh!

slaren commented Dec 18, 2024

Uh oh!

Uh oh!

angt commented Dec 18, 2024

Uh oh!

slaren commented Dec 18, 2024

Uh oh!

ajiekc905 commented Dec 19, 2024

Uh oh!

Jimex commented Dec 19, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gustrd commented Dec 19, 2024

Uh oh!

slaren commented Dec 19, 2024

Uh oh!

sandrohanea commented Jan 11, 2025

Uh oh!

slaren commented Jan 11, 2025

Uh oh!

sandrohanea commented Jan 11, 2025

Uh oh!

slaren commented Jan 11, 2025

Uh oh!

sandrohanea commented Jan 12, 2025

Uh oh!

Uh oh!

slaren commented Dec 18, 2024 •

edited

Loading

Jimex commented Dec 19, 2024 •

edited

Loading