Improve apple M1 support #384

felixmusil · 2021-10-01T10:34:06Z

As referenced in #383, setting explicitly -march in the compilation arguments has yet to be supported by clang.
Thanks to CMake, for the specific case of arm64 we can build universal binaries which work on my machine. I made the change so that it does not affect the build settings on other architectures.

We don't have a test in the CI for the arm64 architecture so could you check that it works on your machines ? @rosecers
Also, I am using clang from homebrew (built for arm64) so my current python does not use roseta.

Note that llvm 12 (current stable version) does not have arm64 as a target architecture even though it has apple-a14 which is basically the same). It seems that it should exist in llvm 13.

Luthaf

I'll leave it to @rosecers to test if this fixes her problem, I have a small question in the meantime

Luthaf · 2021-10-04T08:29:39Z

CMakeLists.txt

+  "${CMAKE_CXX_COMPILER_ID}" STREQUAL "AppleClang")
+  # detect if the local architechture is arm64 or x86_64
+  if(CMAKE_HOST_SYSTEM_PROCESSOR STREQUAL "arm64")
+    set(CMAKE_OSX_ARCHITECTURES "x86_64;arm64" CACHE STRING "")


Why build a universal binary instead of detecting which architecture we actually need?

I thought it would be needed to handle the case where you have a python distribution built for x86_64 while working on an arm64. I need @rosecers input to check that it's actually working though.

My understanding is that if Python is an x86_64 binary, it will only need x86_64 libraries. Rosetta 2 will then deal with the emulation of x86 on ARM. The main issue with building universal binary is the increased compile time (and file size when we upload this to PyPI).

If there is an easy way to detect the required architecture insead of doing both that would be nice, otherwise we'll survive.

I could not find one.

felixmusil · 2021-10-04T11:32:20Z

From the CI, it seems that compilation/runtime increased twofold causing the valgrind workflow to timeout.
The weird aspect is that it happens also for the gcc-10 build (I compared the timings with the build timings from another PR) which should not be affected by my changes. I also compared the compilation arguments in these 2 cases and, as expected, they are identical. Except for the underlying hardware, what could cause such increase in the overall execution time of the workflow ?

rosecers · 2021-10-14T10:11:19Z

Just trying this now -- unfortunately it doesn't fix the issue and I get:

[ 13%] Building CXX object src/CMakeFiles/rascal.dir/rascal/utils/json_io.cc.o
[ 13%] Building CXX object src/CMakeFiles/rascal.dir/rascal/utils/units.cc.o
[ 13%] Building CXX object src/CMakeFiles/rascal.dir/rascal/utils/utils.cc.o
[ 13%] Building CXX object src/CMakeFiles/rascal.dir/rascal/math/bessel.cc.o
errorerrorerror: : : errorunknown target CPU 'vortex': unknown target CPU 'vortex'
unknown target CPU 'vortex'
unknown target CPU 'vortex'

note: note: notevalid target CPU values are: nocona, core2, penryn, bonnell, atom, silvermont, slm, goldmont, goldmont-plus, tremont, nehalem, corei7, westmere, sandybridge, corei7-avx, ivybridge, core-avx-i, haswell, core-avx2, broadwell, skylake, skylake-avx512, skx, cascadelake, cooperlake, cannonlake, icelake-client, icelake-server, tigerlake, sapphirerapids, alderlake, knl, knm, k8, athlon64, athlon-fx, opteron, k8-sse3, athlon64-sse3, opteron-sse3, amdfam10, barcelona, btver1, btver2, bdver1, bdver2, bdver3, bdver4, znver1, znver2, znver3, x86-64, x86-64-v2, x86-64-v3, x86-64-v4: note
valid target CPU values are: nocona, core2, penryn, bonnell, atom, silvermont, slm, goldmont, goldmont-plus, tremont, nehalem, corei7, westmere, sandybridge, corei7-avx, ivybridge, core-avx-i, haswell, core-avx2, broadwell, skylake, skylake-avx512, skx, cascadelake, cooperlake, cannonlake, icelake-client, icelake-server, tigerlake, sapphirerapids, alderlake, knl, knm, k8, athlon64, athlon-fx, opteron, k8-sse3, athlon64-sse3, opteron-sse3, amdfam10, barcelona, btver1, btver2, bdver1, bdver2, bdver3, bdver4, znver1, znver2, znver3, x86-64, x86-64-v2, x86-64-v3, x86-64-v4valid target CPU values are: nocona, core2, penryn, bonnell, atom, silvermont, slm, goldmont, goldmont-plus, tremont, nehalem, corei7, westmere, sandybridge, corei7-avx, ivybridge, core-avx-i, haswell, core-avx2, broadwell, skylake, skylake-avx512, skx, cascadelake, cooperlake, cannonlake, icelake-client, icelake-server, tigerlake, sapphirerapids, alderlake, knl, knm, k8, athlon64, athlon-fx, opteron, k8-sse3, athlon64-sse3, opteron-sse3, amdfam10, barcelona, btver1, btver2, bdver1, bdver2, bdver3, bdver4, znver1, znver2, znver3, x86-64, x86-64-v2, x86-64-v3, x86-64-v4

: valid target CPU values are: nocona, core2, penryn, bonnell, atom, silvermont, slm, goldmont, goldmont-plus, tremont, nehalem, corei7, westmere, sandybridge, corei7-avx, ivybridge, core-avx-i, haswell, core-avx2, broadwell, skylake, skylake-avx512, skx, cascadelake, cooperlake, cannonlake, icelake-client, icelake-server, tigerlake, sapphirerapids, alderlake, knl, knm, k8, athlon64, athlon-fx, opteron, k8-sse3, athlon64-sse3, opteron-sse3, amdfam10, barcelona, btver1, btver2, bdver1, bdver2, bdver3, bdver4, znver1, znver2, znver3, x86-64, x86-64-v2, x86-64-v3, x86-64-v4
make[2]: *** [src/CMakeFiles/rascal.dir/rascal/utils/units.cc.o] Error 1
make[2]: *** Waiting for unfinished jobs....
make[2]: *** [src/CMakeFiles/rascal.dir/rascal/utils/json_io.cc.o] Error 1
make[2]: *** [src/CMakeFiles/rascal.dir/rascal/utils/utils.cc.o] Error 1
make[2]: *** [src/CMakeFiles/rascal.dir/rascal/math/bessel.cc.o] Error 1
make[1]: *** [src/CMakeFiles/rascal.dir/all] Error 2
make: *** [all] Error 2

felixmusil · 2021-10-14T21:13:58Z

Ok then there is no point compiling for x86_64 then. Could you try to setup a native python environment and check that it works as it does on my machine ?

Luthaf · 2021-10-15T09:00:41Z

Ok then there is no point compiling for x86_64 then.

Why? If the Python install is x86_64, we want to compile for that. For me the issue is also about -march=native when doing cross-compilation to x86_64

felixmusil · 2021-10-15T10:40:11Z

Why? If the Python install is x86_64, we want to compile for that.

From Rose's test it seems that the attempt to compile for both x86_64 and arm64 on arm64 failed.
But it's true that it might also be the detection mechanism of arm64 that failed. This could be checked by looking into the cmake setup variables.
Now that you mention it, it might also be that the cmake checks for arm64 fail catch it when using a Rosetta 2 install and then it falls back onto the native argument. The question is why it detects 'vortex' rather than something more generic?

For me the issue is also about -march=native when doing cross-compilation to x86_64

The arch argument is set to native as default but can be very easily overiden at the cmake level so I am not sure to understand your concern.

Luthaf · 2021-10-15T12:23:50Z

The question is why it detects 'vortex' rather than something more generic?

Because -march=native asks for the most specific arch possible? From https://en.wikipedia.org/wiki/Comparison_of_ARMv8-A_cores, looks like Vortex is the name of one of Apple architecture. I more puzzled about why this is not selecting Firestorm/Icestorm instead, since these are the architectures used in the M1 CPU and would be the most specific ones. Maybe Vortex is the common base between firestorm & icestorm?

The arch argument is set to native as default but can be very easily overiden at the cmake level so I am not sure to understand your concern.

My concern is 100% about the default value. By default, everything should work for everyone, and -march=native breaks here and in other ways (#335). I think we should have a more conservative default, while still allowing advanced users to build with something else.

Luthaf · 2022-05-19T13:28:10Z

Fixed in #403 for now

felixmusil added 2 commits October 1, 2021 12:18

modify cmake architecture settings for apple m1

c8efab8

improve architecture detection

c1d163e

felixmusil added the compilation Issues related to compilation procedure or settings label Oct 1, 2021

felixmusil requested review from Luthaf and rosecers October 1, 2021 10:34

felixmusil self-assigned this Oct 1, 2021

Luthaf reviewed Oct 4, 2021

View reviewed changes

increase CI timeout

3820c76

Luthaf closed this May 19, 2022

Luthaf deleted the feat/apple-m1-support branch May 19, 2022 20:43

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve apple M1 support #384

Improve apple M1 support #384

felixmusil commented Oct 1, 2021 •

edited

Loading

Luthaf left a comment

Luthaf Oct 4, 2021

felixmusil Oct 4, 2021 •

edited

Loading

Luthaf Oct 4, 2021

felixmusil Oct 4, 2021

felixmusil commented Oct 4, 2021

rosecers commented Oct 14, 2021

felixmusil commented Oct 14, 2021

Luthaf commented Oct 15, 2021

felixmusil commented Oct 15, 2021

Luthaf commented Oct 15, 2021

Luthaf commented May 19, 2022

Improve apple M1 support #384

Improve apple M1 support #384

Conversation

felixmusil commented Oct 1, 2021 • edited Loading

Luthaf left a comment

Choose a reason for hiding this comment

Luthaf Oct 4, 2021

Choose a reason for hiding this comment

felixmusil Oct 4, 2021 • edited Loading

Choose a reason for hiding this comment

Luthaf Oct 4, 2021

Choose a reason for hiding this comment

felixmusil Oct 4, 2021

Choose a reason for hiding this comment

felixmusil commented Oct 4, 2021

rosecers commented Oct 14, 2021

felixmusil commented Oct 14, 2021

Luthaf commented Oct 15, 2021

felixmusil commented Oct 15, 2021

Luthaf commented Oct 15, 2021

Luthaf commented May 19, 2022

felixmusil commented Oct 1, 2021 •

edited

Loading

felixmusil Oct 4, 2021 •

edited

Loading