Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve apple M1 support #384

Closed
wants to merge 3 commits into from
Closed

Improve apple M1 support #384

wants to merge 3 commits into from

Conversation

felixmusil
Copy link
Contributor

@felixmusil felixmusil commented Oct 1, 2021

As referenced in #383, setting explicitly -march in the compilation arguments has yet to be supported by clang.
Thanks to CMake, for the specific case of arm64 we can build universal binaries which work on my machine. I made the change so that it does not affect the build settings on other architectures.

We don't have a test in the CI for the arm64 architecture so could you check that it works on your machines ? @rosecers
Also, I am using clang from homebrew (built for arm64) so my current python does not use roseta.

Note that llvm 12 (current stable version) does not have arm64 as a target architecture even though it has apple-a14 which is basically the same). It seems that it should exist in llvm 13.

@felixmusil felixmusil added the compilation Issues related to compilation procedure or settings label Oct 1, 2021
@felixmusil felixmusil requested review from Luthaf and rosecers October 1, 2021 10:34
@felixmusil felixmusil self-assigned this Oct 1, 2021
Copy link
Contributor

@Luthaf Luthaf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll leave it to @rosecers to test if this fixes her problem, I have a small question in the meantime

"${CMAKE_CXX_COMPILER_ID}" STREQUAL "AppleClang")
# detect if the local architechture is arm64 or x86_64
if(CMAKE_HOST_SYSTEM_PROCESSOR STREQUAL "arm64")
set(CMAKE_OSX_ARCHITECTURES "x86_64;arm64" CACHE STRING "")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why build a universal binary instead of detecting which architecture we actually need?

Copy link
Contributor Author

@felixmusil felixmusil Oct 4, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought it would be needed to handle the case where you have a python distribution built for x86_64 while working on an arm64. I need @rosecers input to check that it's actually working though.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My understanding is that if Python is an x86_64 binary, it will only need x86_64 libraries. Rosetta 2 will then deal with the emulation of x86 on ARM. The main issue with building universal binary is the increased compile time (and file size when we upload this to PyPI).

If there is an easy way to detect the required architecture insead of doing both that would be nice, otherwise we'll survive.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could not find one.

@felixmusil
Copy link
Contributor Author

From the CI, it seems that compilation/runtime increased twofold causing the valgrind workflow to timeout.
The weird aspect is that it happens also for the gcc-10 build (I compared the timings with the build timings from another PR) which should not be affected by my changes. I also compared the compilation arguments in these 2 cases and, as expected, they are identical. Except for the underlying hardware, what could cause such increase in the overall execution time of the workflow ?

@rosecers
Copy link
Contributor

Just trying this now -- unfortunately it doesn't fix the issue and I get:

[ 13%] Building CXX object src/CMakeFiles/rascal.dir/rascal/utils/json_io.cc.o
[ 13%] Building CXX object src/CMakeFiles/rascal.dir/rascal/utils/units.cc.o
[ 13%] Building CXX object src/CMakeFiles/rascal.dir/rascal/utils/utils.cc.o
[ 13%] Building CXX object src/CMakeFiles/rascal.dir/rascal/math/bessel.cc.o
errorerrorerror: : : errorunknown target CPU 'vortex': unknown target CPU 'vortex'
unknown target CPU 'vortex'
unknown target CPU 'vortex'

note: note: notevalid target CPU values are: nocona, core2, penryn, bonnell, atom, silvermont, slm, goldmont, goldmont-plus, tremont, nehalem, corei7, westmere, sandybridge, corei7-avx, ivybridge, core-avx-i, haswell, core-avx2, broadwell, skylake, skylake-avx512, skx, cascadelake, cooperlake, cannonlake, icelake-client, icelake-server, tigerlake, sapphirerapids, alderlake, knl, knm, k8, athlon64, athlon-fx, opteron, k8-sse3, athlon64-sse3, opteron-sse3, amdfam10, barcelona, btver1, btver2, bdver1, bdver2, bdver3, bdver4, znver1, znver2, znver3, x86-64, x86-64-v2, x86-64-v3, x86-64-v4: note
valid target CPU values are: nocona, core2, penryn, bonnell, atom, silvermont, slm, goldmont, goldmont-plus, tremont, nehalem, corei7, westmere, sandybridge, corei7-avx, ivybridge, core-avx-i, haswell, core-avx2, broadwell, skylake, skylake-avx512, skx, cascadelake, cooperlake, cannonlake, icelake-client, icelake-server, tigerlake, sapphirerapids, alderlake, knl, knm, k8, athlon64, athlon-fx, opteron, k8-sse3, athlon64-sse3, opteron-sse3, amdfam10, barcelona, btver1, btver2, bdver1, bdver2, bdver3, bdver4, znver1, znver2, znver3, x86-64, x86-64-v2, x86-64-v3, x86-64-v4valid target CPU values are: nocona, core2, penryn, bonnell, atom, silvermont, slm, goldmont, goldmont-plus, tremont, nehalem, corei7, westmere, sandybridge, corei7-avx, ivybridge, core-avx-i, haswell, core-avx2, broadwell, skylake, skylake-avx512, skx, cascadelake, cooperlake, cannonlake, icelake-client, icelake-server, tigerlake, sapphirerapids, alderlake, knl, knm, k8, athlon64, athlon-fx, opteron, k8-sse3, athlon64-sse3, opteron-sse3, amdfam10, barcelona, btver1, btver2, bdver1, bdver2, bdver3, bdver4, znver1, znver2, znver3, x86-64, x86-64-v2, x86-64-v3, x86-64-v4

: valid target CPU values are: nocona, core2, penryn, bonnell, atom, silvermont, slm, goldmont, goldmont-plus, tremont, nehalem, corei7, westmere, sandybridge, corei7-avx, ivybridge, core-avx-i, haswell, core-avx2, broadwell, skylake, skylake-avx512, skx, cascadelake, cooperlake, cannonlake, icelake-client, icelake-server, tigerlake, sapphirerapids, alderlake, knl, knm, k8, athlon64, athlon-fx, opteron, k8-sse3, athlon64-sse3, opteron-sse3, amdfam10, barcelona, btver1, btver2, bdver1, bdver2, bdver3, bdver4, znver1, znver2, znver3, x86-64, x86-64-v2, x86-64-v3, x86-64-v4
make[2]: *** [src/CMakeFiles/rascal.dir/rascal/utils/units.cc.o] Error 1
make[2]: *** Waiting for unfinished jobs....
make[2]: *** [src/CMakeFiles/rascal.dir/rascal/utils/json_io.cc.o] Error 1
make[2]: *** [src/CMakeFiles/rascal.dir/rascal/utils/utils.cc.o] Error 1
make[2]: *** [src/CMakeFiles/rascal.dir/rascal/math/bessel.cc.o] Error 1
make[1]: *** [src/CMakeFiles/rascal.dir/all] Error 2
make: *** [all] Error 2

@felixmusil
Copy link
Contributor Author

Ok then there is no point compiling for x86_64 then. Could you try to setup a native python environment and check that it works as it does on my machine ?

@Luthaf
Copy link
Contributor

Luthaf commented Oct 15, 2021

Ok then there is no point compiling for x86_64 then.

Why? If the Python install is x86_64, we want to compile for that. For me the issue is also about -march=native when doing cross-compilation to x86_64

@felixmusil
Copy link
Contributor Author

Why? If the Python install is x86_64, we want to compile for that.

From Rose's test it seems that the attempt to compile for both x86_64 and arm64 on arm64 failed.
But it's true that it might also be the detection mechanism of arm64 that failed. This could be checked by looking into the cmake setup variables.
Now that you mention it, it might also be that the cmake checks for arm64 fail catch it when using a Rosetta 2 install and then it falls back onto the native argument. The question is why it detects 'vortex' rather than something more generic?

For me the issue is also about -march=native when doing cross-compilation to x86_64

The arch argument is set to native as default but can be very easily overiden at the cmake level so I am not sure to understand your concern.

@Luthaf
Copy link
Contributor

Luthaf commented Oct 15, 2021

The question is why it detects 'vortex' rather than something more generic?

Because -march=native asks for the most specific arch possible? From https://en.wikipedia.org/wiki/Comparison_of_ARMv8-A_cores, looks like Vortex is the name of one of Apple architecture. I more puzzled about why this is not selecting Firestorm/Icestorm instead, since these are the architectures used in the M1 CPU and would be the most specific ones. Maybe Vortex is the common base between firestorm & icestorm?

The arch argument is set to native as default but can be very easily overiden at the cmake level so I am not sure to understand your concern.

My concern is 100% about the default value. By default, everything should work for everyone, and -march=native breaks here and in other ways (#335). I think we should have a more conservative default, while still allowing advanced users to build with something else.

@Luthaf
Copy link
Contributor

Luthaf commented May 19, 2022

Fixed in #403 for now

@Luthaf Luthaf closed this May 19, 2022
@Luthaf Luthaf deleted the feat/apple-m1-support branch May 19, 2022 20:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
compilation Issues related to compilation procedure or settings
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants