Port shapeit5 to MacOS on Apple Silicon using SIMDE #68

pettyalex · 2023-10-28T17:15:33Z

There's demand for users to be able to run shapeit on Mac OS, with users who want that creating Github issues or asking on other discussion forums:

odelaneau/shapeit4#15
odelaneau/shapeit4#36
#22

Shapeit5 cannot run on modern (2020+) Macs even in a VM, because it uses AVX2. Apple's x86 support supports SSE1/2/3/4 but not AVX or AVX2: https://developer.apple.com/documentation/apple-silicon/about-the-rosetta-translation-environment.

This PR contains a minimal set of changes that will allow a native ARM build on modern Macs. It does this using https://github.com/simd-everywhere/simde, a header-only library that provides high-performance vector intrinsic translation between x86 and ARM. This would also allow Linux on ARM support to be added very easily. Because of AWS's very low pricing for ARM compute instances, ARM is already the cheapest way to run large computational loads on the cloud so Linux on ARM support would be helpful for anyone operating in AWS.

This PR also has some very small changes to support building on Apple clang. std::random_shuffle was removed in C++17, and will be removed in a future version of GCC. It's already gone in clang 15, which I used to develop and test this PR. As recommended, I've replaced it with std::shuffle: https://en.cppreference.com/w/cpp/algorithm/random_shuffle

…ting work finished.

…ith std::shuffle.

srubinacci · 2024-03-26T20:33:47Z

Hi Alex,
Thanks for your this, it's actually really great and we are definitively interested in merging it. I've tried a previous version myself and it works very well.
Just waiting @odelaneau for merging with new updates. Also we need to be sure it works with the new make system @rwk-unil introduced recently.

If we approve this, we should add mac compilation in the github workflow.

pettyalex · 2024-03-26T23:51:08Z

I'd be glad to add a mac build workflow to github actions, and clean up the make target a bit. I was mostly just validating that this will work until I heard you were interested in it.

Right now, as is, it suffixes the binaries with _mac which is pretty silly.

…attempt to link libdeflate, but everythingn else works.

pettyalex · 2024-03-27T04:59:25Z

OK, I went ahead and simplified the makefile for Mac build, and also tested it on aarch64-linux-gnu. I noticed a few things:

It doesn't look like libboost_serialization is actually used anywhere? I don't see it linked in any dynamic libs, and there's no includes into it.
Any objections to targeting x86-64-v3 instead of -mavx2 -mfma? The baseline requirement is the same, but the compiler is able to generate many more instructions where they may be useful: https://en.wikipedia.org/wiki/X86-64
Is there a reason why when linking it dynamically we need to explicitly link to all the dependencies of our dependencies, like -lbz2 -llzma -lcurl -lcrypto -ldeflate? Can't we only link -lboost_iostreams -lboost_program_options -lhts? If so, then the conditional for mac build support can be entirely eliminated.

Oh and for x86 this updates the -mtune to Skylake server, which should be a good baseline these days for older machines.

… server.

rwk-unil · 2024-03-27T08:27:02Z

@pettyalex thank you for your pull request and your work,
@srubinacci @odelaneau I have reviewed the changes, here are my comments :

I like integrating SIMDE which will help port to other architectures
Thank you for replacing random_shuffle as it is removed in C++17 and this will break builds depending on compiler. I fully agree with this change, we have been discussing this with @srubinacci already. (also the big problem with random_shuffle is that the source of randomness is implementation defined, often with std::rand used (see ref), making reproducibility an issue, even with same seed)
As for the Makefile :

OK, I went ahead and simplified the makefile for Mac build, and also tested it on aarch64-linux-gnu. I noticed a few things:

It doesn't look like libboost_serialization is actually used anywhere? I don't see it linked in any dynamic libs, and there's no includes into it.

It may be an artefact left over from experimentation, these were introduced when the Makefiles were updated (see git blame e.g., makefile) and may not be necessary anymore.

Any objections to targeting x86-64-v3 instead of -mavx2 -mfma? The baseline requirement is the same, but the compiler is able to generate many more instructions where they may be useful: https://en.wikipedia.org/wiki/X86-64

I agree, AVX2/FMA requires Haswell (2013) or newer CPU anyways, so targeting x86-64-v3 would enable extra CPU features that are available anyways while simplifying the build command at the same time. I totally support this suggestion.

Is there a reason why when linking it dynamically we need to explicitly link to all the dependencies of our dependencies, like -lbz2 -llzma -lcurl -lcrypto -ldeflate? Can't we only link -lboost_iostreams -lboost_program_options -lhts? If so, then the conditional for mac build support can be entirely eliminated.

When I simplified the Makefile, I did the minimal amount of changes possible that would get SHAPEIT5 to build in an Ubuntu 20.04 GitHub action, IIRQ without the flags it would fail the linking the executable. But I agree the whole Makefile could benefit from further cleanup. If your fork allows to build without the flags on the following action https://github.com/odelaneau/shapeit5/blob/main/.github/workflows/build.yml (and you make the necessary changes so that the static build also still works) we can think of removing the flags (which would be nice because as you said we can drop the conditional entirely).

Oh and for x86 this updates the -mtune to Skylake server, which should be a good baseline these days for older machines.

Not sure we would want to include this, it is very specific, people wanting to really optimise their build can change this on their own, but I wouldn't add it as default.

Cheers.
Rick

…used requirement for boost serialization. Remove unecessary extra dynamic linking flags for deps of our deps when linking dynamically.

… This will fix building in environments which override CXXFLAG

pettyalex · 2024-03-28T20:15:11Z

I have:

Reverted the -mtune change entirely.
Tested on both an M1 Mac and ubuntu 20.04 environment
Added a Github Actions workflow to test on Mac OS, but I haven't tested it. Could we enable workflow runs for this PR? The github action to configure boost doesn't support Mac OS, and I feel like compiling boost on every action run feels a little wasteful... so I used boost & htslib from brew. If you don't like this, I can just compile both of those.

I've also cleaned up the makefile a smidge to make it easier to build inside of places like conda that expect to be able to set CXXFLAGS to change the include path.

…sier in more environments.

pettyalex · 2024-03-28T21:10:39Z

I believe this xcftools PR that fixes up its build on Apple Silicon would have to go through before that action will pass, though: odelaneau/xcftools#7

Also, while I was testing packaging this in conda, I noticed that your build rules do not include LDFLAGS or CXXFLAGS. I changed CXXFLAG to CXXFLAGS in the makefile, so now it will pick up the standard environment variables and build more easily in more places. I can revert this if you'd like.

Finally, I enabled lto, link-time optimization. If you don't want this, I can get that out of this PR.

rwk-unil · 2024-04-02T07:54:33Z

I have:

Reverted the -mtune change entirely.

Tested on both an M1 Mac and ubuntu 20.04 environment

Added a Github Actions workflow to test on Mac OS, but I haven't tested it. Could we enable workflow runs for this PR? The github action to configure boost doesn't support Mac OS, and I feel like compiling boost on every action run feels a little wasteful... so I used boost & htslib from brew. If you don't like this, I can just compile both of those.

I've also cleaned up the makefile a smidge to make it easier to build inside of places like conda that expect to be able to set CXXFLAGS to change the include path.

This seems great, I think the most important is that the Linux (Ubuntu) build succeeds with the new Makefiles, for the Mac OS build, I don't think it is necessary to have an action that builds every time, but anyways it is good to have, but it should not run on "push" and "PR" on the main branch because it will consume ressources. I think this should only be ran on major updates (new versions) manually.

I believe this xcftools PR that fixes up its build on Apple Silicon would have to go through before that action will pass, though: odelaneau/xcftools#7

Also, while I was testing packaging this in conda, I noticed that your build rules do not include LDFLAGS or CXXFLAGS. I changed CXXFLAG to CXXFLAGS in the makefile, so now it will pick up the standard environment variables and build more easily in more places. I can revert this if you'd like.

Finally, I enabled lto, link-time optimization. If you don't want this, I can get that out of this PR.

This has been bothering me too for quite some time, I totally support this change 👍 .

One other thing, we should ditch -lpthread in the libraries and use -pthread (see stackoverflow)

Also, for linking we should use the standard variables LDLIBS or LOADLIBES.

If these standard variables are used, we could remove the rules for building the .o object files and the binaries by relying on the Make built-in rules, this would further simplify the Makefile (but in no way a necessity, and might make the Makefile less clear).

The default rule for .o from .cc/.cpp/.C is : $(CXX) $(CPPFLAGS) $(CXXFLAGS) -c
The default rule for linking is : $(CC) $(LDFLAGS) $^ $(LOADLIBES) $(LDLIBS)

Reference : Make catalog of rules

Cheers.

pettyalex · 2024-04-02T20:16:21Z

The build is working on Linux in its current state, I've been testing it. The real blocker at this point to getting this merge ready is the xcftools PR: odelaneau/xcftools#7

Once that's merged I can update the xcftools submodule to point at it.

I updated to -pthread instead.

for the Mac OS build, I don't think it is necessary to have an action that builds every time, but anyways it is good to have, but it should not run on "push" and "PR" on the main branch because it will consume ressources. I think this should only be ran on major updates (new versions) manually.

Given that they're free, github provided runners, is spending the extra resources to run CI on Mac a problem? If it takes longer to find/run on Mac it could be disabled in the future, but my observation has been that Github has plenty of mac runners.

rwk-unil · 2024-04-03T06:54:00Z

for the Mac OS build, I don't think it is necessary to have an action that builds every time, but anyways it is good to have, but it should not run on "push" and "PR" on the main branch because it will consume ressources. I think this should only be ran on major updates (new versions) manually.

Given that they're free, github provided runners, is spending the extra resources to run CI on Mac a problem? If it takes longer to find/run on Mac it could be disabled in the future, but my observation has been that Github has plenty of mac runners.

They are free up to a point, free tier GitHub provides up to 2,000 minutes a month (all hosted projects combined) and Mac OS X runners have a 10x multiplier on running minutes. That's why I suggest running it only on releases. (Also it is more environmentally friendly). At the end the decision for this should go to @odelaneau as the project is hosted on his account.

Reference : https://docs.github.com/en/billing/managing-billing-for-github-actions/about-billing-for-github-actions

pettyalex · 2024-07-18T19:28:15Z

@rwk-unil I've validated that for open-source projects like SHAPEIT5, the 2000 minute limit does not apply. that's only for closed-source projects.

I'd love to get this in, I was working w/ SHAPEIT5 locally on my laptop today and noticed that this was still not in main.

rwk-unil · 2024-07-19T10:46:15Z

Great ! @pettyalex Where did you find that the limit does not apply to open-source projects ? I could not find this in the GitHub doc. Thanks.

I'd recommend merging @srubinacci @odelaneau.
Just double check the commit reference on the XCFTools submodule.

Cheers.
Rick

nicolergay · 2024-11-21T21:30:44Z

common/makefile_common.mk

+# Disable this if building on x86 CPUs without AVX2 support.
+UNAME_M := $(shell uname -m)
+ifeq ($(UNAME_M),x86_64)
+    CXXFLAGS+= -march=x86-64-v3


FYI I got this error when trying to compile this in an linux/amd64 instance in Drone:

#12 0.943 g++ -std=c++17 -O3 -march=x86-64-v3 -D__COMMIT_ID__=\"faa3224\" -D__COMMIT_DATE__=\"2024-07-18\" -c src/containers/bitmatrix.cpp -o obj/bitmatrix.o -Isrc -I../simde -I/opt/htslib/include -I/opt/boost/include #12 0.950 cc1plus: error: bad value ('x86-64-v3') for '-march=' switch #12 0.950 cc1plus: note: valid arguments to '-march=' switch are: nocona core2 nehalem corei7 westmere sandybridge corei7-avx ivybridge core-avx-i haswell core-avx2 broadwell skylake skylake-avx512 cannonlake icelake-client icelake-server cascadelake tigerlake bonnell atom silvermont slm goldmont goldmont-plus tremont knl knm x86-64 eden-x2 nano nano-1000 nano-2000 nano-3000 nano-x2 eden-x4 nano-x4 k8 k8-sse3 opteron opteron-sse3 athlon64 athlon64-sse3 athlon-fx amdfam10 barcelona bdver1 bdver2 bdver3 bdver4 znver1 znver2 btver1 btver2 native; did you mean 'x86-64'?

Switching the flags back to -mavx2 -mfma worked

nicolergay · 2024-11-21T21:36:14Z

common/makefile_common.mk

 #COMPILATION RULES
 all: desktop

 $(BFILE): $(OFILE)
-	$(CXX) $(LDFLAG) $^ -o $@ $(DYN_LIBS)
+	$(CXX) $(LDFLAGS) $^ -o $@ $(DYN_LIBS)


For using non-standard htslib and boost paths, I needed to modify this to successfully compile and run the binaries, where HTSSRC_LIB=/opt/htslib/lib and BOOST_LIB=/opt/boost/lib:

$(BFILE): $(OFILE) $(CXX) $(LDFLAGS) $^ -o $@ $(DYN_LIBS) -Wl,-rpath=$(BOOST_LIB) -Wl,-rpath=$(HTSSRC_LIB)

I also needed to add LDFLAGS+= -L$(HTSSRC_LIB) -L$(BOOST_LIB)

pettyalex mentioned this pull request Oct 29, 2023

Fix includes: std::map was used without #include <map> odelaneau/xcftools#2

Merged

pettyalex mentioned this pull request Mar 26, 2024

feature request: Mac OS and Linux ARM64 compatibility? #95

Open

pettyalex added 5 commits March 26, 2024 13:42

WIP: Experimenting with apple silicon build.

e65f92a

Use simde with native aliases and add mac build targets.

13d0d39

WIP: Cleaning up some exploratory changes. phase_rare still needs por…

fb8c90e

…ting work finished.

Update phase_rare to be C++ compatible: replace std::random_shuffle w…

1bcd697

…ith std::shuffle.

Update compiled ligate_mac binary to match latest source.

fad421b

pettyalex force-pushed the mac-build-with-simde branch from e2034c2 to fad421b Compare March 26, 2024 18:43

Fix makefiles for mac build and add a top level target.

855ba97

srubinacci assigned odelaneau and unassigned odelaneau Mar 26, 2024

srubinacci requested a review from odelaneau March 26, 2024 20:35

pettyalex added 2 commits March 26, 2024 23:22

Clean up makefile. xcftools doesn't build on mac because of a failed …

f285f8f

…attempt to link libdeflate, but everythingn else works.

Remove checked in mac binaries

c0d5940

pettyalex force-pushed the mac-build-with-simde branch from bf39376 to 59b0ddd Compare March 27, 2024 05:10

Simplify makefile: Only use AVX2 / FMA if on x86_64, tune for skylake…

2f78fb5

… server.

pettyalex force-pushed the mac-build-with-simde branch from 59b0ddd to 2f78fb5 Compare March 27, 2024 05:10

pettyalex added 2 commits March 28, 2024 13:49

Add a github action to build on Mac OS 14 on Apple Silicon. Remove un…

0d058cc

…used requirement for boost serialization. Remove unecessary extra dynamic linking flags for deps of our deps when linking dynamically.

Re-add pthread and move to conventional CXXFLAGS rather than CXXFLAG.…

50c277d

… This will fix building in environments which override CXXFLAG

pettyalex force-pushed the mac-build-with-simde branch from 5de5c3c to 50c277d Compare March 28, 2024 20:11

Update LDFLAG username to standarized LDFLAGS so shapeit compilers ea…

c506d69

…sier in more environments.

pettyalex added 3 commits March 28, 2024 22:08

Remove -flto. out of scope for this changeset

83e1e62

Revert to master branch binaries

b8d437a

Update to simde v0.8.0

6071b42

Update pthread link command for mac

faa3224

nicolergay reviewed Nov 21, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Port shapeit5 to MacOS on Apple Silicon using SIMDE #68

Port shapeit5 to MacOS on Apple Silicon using SIMDE #68

pettyalex commented Oct 28, 2023 •

edited

Loading

srubinacci commented Mar 26, 2024 •

edited

Loading

pettyalex commented Mar 26, 2024

pettyalex commented Mar 27, 2024 •

edited

Loading

rwk-unil commented Mar 27, 2024

pettyalex commented Mar 28, 2024

pettyalex commented Mar 28, 2024

rwk-unil commented Apr 2, 2024

pettyalex commented Apr 2, 2024

rwk-unil commented Apr 3, 2024

pettyalex commented Jul 18, 2024

rwk-unil commented Jul 19, 2024

nicolergay Nov 21, 2024

nicolergay Nov 21, 2024

Port shapeit5 to MacOS on Apple Silicon using SIMDE #68

Are you sure you want to change the base?

Port shapeit5 to MacOS on Apple Silicon using SIMDE #68

Conversation

pettyalex commented Oct 28, 2023 • edited Loading

srubinacci commented Mar 26, 2024 • edited Loading

pettyalex commented Mar 26, 2024

pettyalex commented Mar 27, 2024 • edited Loading

rwk-unil commented Mar 27, 2024

pettyalex commented Mar 28, 2024

pettyalex commented Mar 28, 2024

rwk-unil commented Apr 2, 2024

pettyalex commented Apr 2, 2024

rwk-unil commented Apr 3, 2024

pettyalex commented Jul 18, 2024

rwk-unil commented Jul 19, 2024

nicolergay Nov 21, 2024

Choose a reason for hiding this comment

nicolergay Nov 21, 2024

Choose a reason for hiding this comment

pettyalex commented Oct 28, 2023 •

edited

Loading

srubinacci commented Mar 26, 2024 •

edited

Loading

pettyalex commented Mar 27, 2024 •

edited

Loading