-
Notifications
You must be signed in to change notification settings - Fork 11.7k
ggml: GGML_NATIVE uses -mcpu=native on ARM #10752
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closed
angt
wants to merge
2
commits into
ggml-org:master
from
angt:ggml-allow-march-native-on-generic-arm-platforms
Closed
Changes from all commits
Commits
Show all changes
2 commits
Select commit
Hold shift + click to select a range
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Putting all the old code in a
else
might be too drastic but I guess the other cases are only relevant when cross-compiling.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the other code in fact seems to be doing the same thing that
-march=native
would do.GGML_NATIVE
disabled should generate a consistent build depending on the flags specified during compilation, which is not the case at the moment.This needs to be completely revamped, and as it is, this PR is just adding to the mess that will need to be cleaned up later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can help with revamping but I need some clarification first :)
Today, building on generic ARM gives very poor performance because the build system completely ignores the
GGML_NATIVE
directive, so I just aligned the current code with the current description ofGGML_NATIVE
found inggml/CMakeLists.txt
:I completely agree that it's not the best way to get performance but it's better than nothing and it already fixes many modern setups.
So, do we want to relax the definition of
GGML_NATIVE
and allow to use, for example,-mcpu=native
on ARM which would be much better for performance?The old code was clearly aimed at small devices, like android and raspberry and also used
CMAKE_SYSTEM_PROCESSOR
so for me it wasn't used as a way to fix-march=native
at all but rather as a way to find acceptable flags when cross-compiling and in this case you really don't wantGGML_NATIVE
(hence the move toelse
).Maybe @ggerganov has some memories to share about that ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I believe these flags were mostly set by trial and error, back when we were running
whisper.cpp
on some raspberries. But this is very likely wrong as I didn't really understand the specifics and should be revamped. I'm not really an expert and I still get quite confused with all the different Arm architectures, so whatever you think makes sense to improve this is welcome. I can test changes on the entire spectrum of Apple Silicon if necessary.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I understand correctly, with gcc/clang it is enough to set the correct architecture flags with
-march
, and-match=native
should work in the same way as x86. The exception is likely to be MSVC once again, because it does not set the preprocessor definitions of the enabled ARM features. In which case, we may consider just dropping support for MSVC with ARM entirely, because it is a constant source of problems, doesn't work with the inline asm kernels, and doesn't really add anything beyond clang or possibly MINGW.I believe this should work:
-march=native
ifGGML_NATIVE
is enabledGGML_CPU_ARCH
to the build to set the architecture, so that ifGGML_NATIVE
is disabled and this parameter is provided, then-march=${GGML_CPU_ARCH}
is used.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On ARM, to build for local use (i.e. using
GGML_NATIVE
),-mcpu=native
alone should be the best option as far as I know. The-march=native
will often miss some opportunities. The-mtune=native
will optimize for the current microarchitecture (so still not fully optimized for the cpu).So I think redefining
GGML_NATIVE
to something like "Try to optimize builds for the current cpu" and using-march=native
onx86_64
and-mcpu=native
onARM
would already be much simpler and an improvement.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this sounds good to you, I can adapt this PR in this direction so we can see how it works in practice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, sounds good.