-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Port shapeit5 to MacOS on Apple Silicon using SIMDE #68
base: main
Are you sure you want to change the base?
Conversation
…ting work finished.
…ith std::shuffle.
e2034c2
to
fad421b
Compare
Hi Alex, If we approve this, we should add mac compilation in the github workflow. |
I'd be glad to add a mac build workflow to github actions, and clean up the make target a bit. I was mostly just validating that this will work until I heard you were interested in it. Right now, as is, it suffixes the binaries with |
…attempt to link libdeflate, but everythingn else works.
OK, I went ahead and simplified the makefile for Mac build, and also tested it on
Oh and for x86 this updates the |
bf39376
to
59b0ddd
Compare
59b0ddd
to
2f78fb5
Compare
@pettyalex thank you for your pull request and your work,
It may be an artefact left over from experimentation, these were introduced when the Makefiles were updated (see git blame e.g., makefile) and may not be necessary anymore.
I agree, AVX2/FMA requires Haswell (2013) or newer CPU anyways, so targeting x86-64-v3 would enable extra CPU features that are available anyways while simplifying the build command at the same time. I totally support this suggestion.
When I simplified the Makefile, I did the minimal amount of changes possible that would get SHAPEIT5 to build in an Ubuntu 20.04 GitHub action, IIRQ without the flags it would fail the linking the executable. But I agree the whole Makefile could benefit from further cleanup. If your fork allows to build without the flags on the following action https://github.com/odelaneau/shapeit5/blob/main/.github/workflows/build.yml (and you make the necessary changes so that the static build also still works) we can think of removing the flags (which would be nice because as you said we can drop the conditional entirely).
Not sure we would want to include this, it is very specific, people wanting to really optimise their build can change this on their own, but I wouldn't add it as default. Cheers. |
…used requirement for boost serialization. Remove unecessary extra dynamic linking flags for deps of our deps when linking dynamically.
… This will fix building in environments which override CXXFLAG
5de5c3c
to
50c277d
Compare
I have:
I've also cleaned up the makefile a smidge to make it easier to build inside of places like conda that expect to be able to set CXXFLAGS to change the include path. |
…sier in more environments.
I believe this xcftools PR that fixes up its build on Apple Silicon would have to go through before that action will pass, though: odelaneau/xcftools#7 Also, while I was testing packaging this in conda, I noticed that your build rules do not include LDFLAGS or CXXFLAGS. I changed CXXFLAG to CXXFLAGS in the makefile, so now it will pick up the standard environment variables and build more easily in more places. I can revert this if you'd like. Finally, I enabled lto, link-time optimization. If you don't want this, I can get that out of this PR. |
This seems great, I think the most important is that the Linux (Ubuntu) build succeeds with the new Makefiles, for the Mac OS build, I don't think it is necessary to have an action that builds every time, but anyways it is good to have, but it should not run on "push" and "PR" on the main branch because it will consume ressources. I think this should only be ran on major updates (new versions) manually.
This has been bothering me too for quite some time, I totally support this change 👍 . One other thing, we should ditch Also, for linking we should use the standard variables If these standard variables are used, we could remove the rules for building the
Reference : Make catalog of rules Cheers. |
The build is working on Linux in its current state, I've been testing it. The real blocker at this point to getting this merge ready is the xcftools PR: odelaneau/xcftools#7 Once that's merged I can update the xcftools submodule to point at it. I updated to
Given that they're free, github provided runners, is spending the extra resources to run CI on Mac a problem? If it takes longer to find/run on Mac it could be disabled in the future, but my observation has been that Github has plenty of mac runners. |
They are free up to a point, free tier GitHub provides up to 2,000 minutes a month (all hosted projects combined) and Mac OS X runners have a 10x multiplier on running minutes. That's why I suggest running it only on releases. (Also it is more environmentally friendly). At the end the decision for this should go to @odelaneau as the project is hosted on his account. Reference : https://docs.github.com/en/billing/managing-billing-for-github-actions/about-billing-for-github-actions |
@rwk-unil I've validated that for open-source projects like SHAPEIT5, the 2000 minute limit does not apply. that's only for closed-source projects. I'd love to get this in, I was working w/ SHAPEIT5 locally on my laptop today and noticed that this was still not in main. |
Great ! @pettyalex Where did you find that the limit does not apply to open-source projects ? I could not find this in the GitHub doc. Thanks. I'd recommend merging @srubinacci @odelaneau. Cheers. |
# Disable this if building on x86 CPUs without AVX2 support. | ||
UNAME_M := $(shell uname -m) | ||
ifeq ($(UNAME_M),x86_64) | ||
CXXFLAGS+= -march=x86-64-v3 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI I got this error when trying to compile this in an linux/amd64 instance in Drone:
#12 0.943 g++ -std=c++17 -O3 -march=x86-64-v3 -D__COMMIT_ID__=\"faa3224\" -D__COMMIT_DATE__=\"2024-07-18\" -c src/containers/bitmatrix.cpp -o obj/bitmatrix.o -Isrc -I../simde -I/opt/htslib/include -I/opt/boost/include
#12 0.950 cc1plus: error: bad value ('x86-64-v3') for '-march=' switch
#12 0.950 cc1plus: note: valid arguments to '-march=' switch are: nocona core2 nehalem corei7 westmere sandybridge corei7-avx ivybridge core-avx-i haswell core-avx2 broadwell skylake skylake-avx512 cannonlake icelake-client icelake-server cascadelake tigerlake bonnell atom silvermont slm goldmont goldmont-plus tremont knl knm x86-64 eden-x2 nano nano-1000 nano-2000 nano-3000 nano-x2 eden-x4 nano-x4 k8 k8-sse3 opteron opteron-sse3 athlon64 athlon64-sse3 athlon-fx amdfam10 barcelona bdver1 bdver2 bdver3 bdver4 znver1 znver2 btver1 btver2 native; did you mean 'x86-64'?
Switching the flags back to -mavx2 -mfma
worked
#COMPILATION RULES | ||
all: desktop | ||
|
||
$(BFILE): $(OFILE) | ||
$(CXX) $(LDFLAG) $^ -o $@ $(DYN_LIBS) | ||
$(CXX) $(LDFLAGS) $^ -o $@ $(DYN_LIBS) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For using non-standard htslib and boost paths, I needed to modify this to successfully compile and run the binaries, where HTSSRC_LIB=/opt/htslib/lib
and BOOST_LIB=/opt/boost/lib
:
$(BFILE): $(OFILE)
$(CXX) $(LDFLAGS) $^ -o $@ $(DYN_LIBS) -Wl,-rpath=$(BOOST_LIB) -Wl,-rpath=$(HTSSRC_LIB)
I also needed to add LDFLAGS+= -L$(HTSSRC_LIB) -L$(BOOST_LIB)
There's demand for users to be able to run shapeit on Mac OS, with users who want that creating Github issues or asking on other discussion forums:
odelaneau/shapeit4#15
odelaneau/shapeit4#36
#22
Shapeit5 cannot run on modern (2020+) Macs even in a VM, because it uses AVX2. Apple's x86 support supports SSE1/2/3/4 but not AVX or AVX2: https://developer.apple.com/documentation/apple-silicon/about-the-rosetta-translation-environment.
This PR contains a minimal set of changes that will allow a native ARM build on modern Macs. It does this using https://github.com/simd-everywhere/simde, a header-only library that provides high-performance vector intrinsic translation between x86 and ARM. This would also allow Linux on ARM support to be added very easily. Because of AWS's very low pricing for ARM compute instances, ARM is already the cheapest way to run large computational loads on the cloud so Linux on ARM support would be helpful for anyone operating in AWS.
This PR also has some very small changes to support building on Apple clang. std::random_shuffle was removed in C++17, and will be removed in a future version of GCC. It's already gone in clang 15, which I used to develop and test this PR. As recommended, I've replaced it with std::shuffle: https://en.cppreference.com/w/cpp/algorithm/random_shuffle