-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add MIX_OS_DEPS_COMPILE_PARTITION_COUNT for concurrent deps compilation #14340
Conversation
I will test this out when I get a chance later. Is it possible that this behavior could be made an argument/flag to deps.compile instead of requiring an environment variable? This would make it a little easier to discover / integrate into documentation, I suspect. (similar to the way |
I decided to go with with an environment variable because:
For example, I will likely set I think |
Regarding discovery, we will document it on both |
on 6c3f308, I get this:
(might not be related to this change) |
Can you run a |
Yeah, persists after mix deps.get && mix deps.compile yaml_elixir (without the env set). Dependencies here, lock here Also persists after cleaning and refetching.
|
If I compile yamerl first it will fix itself. So it probably is related to this change. |
@liamwhite apologies, a commit I pushed yesterday just before you tried introduced a race condition. We should be all good now. |
@josevalim I tried the branch on the biggest repos I have. In first on I see nice improvement up to ~4 partitions (tested on M2 mac):
The second one however fails to compile:
The error is reproducible. Running
|
@lukaszsamson which commit did you try? I pushed a fix early morning today. |
|
@lukaszsamson I see. Can you please try to provide a way to reproduce it? It may be that a mix.exs with the mix.lock is enough. Thank you. |
@josevalim It's not reproducible in 100% tries. I got the crash in 4 out of 5. My real app is 2x more umbrella apps underneath so maybe that contributes.
Note that both |
With the above fixes, it does work for me:
Despite the reduced overhead, I have tried various partition counts and not been able to get a compilation time faster than my own solution in #14200 (comment) . This was made using Elixir 1.17 and compiled in about 16.04 seconds using all 32 processors. However, it isn't apples-to-apples, because the compile lock added in 1.18 now prevents that solution from working, so I haven't rebenchmarked it. |
The performance seems to degrade extremely badly and start using ridiculous amounts of memory at high concurrency levels (I tested at 32 and completely ran out of system memory despite having 64GB). That also was not an issue in my own solution. |
@liamwhite you can set For this solution though, I don't recommend starting the same amount of instances because you will eventually run out of memory indeed. |
Btw, our solutions are somewhat similar, we both distribute the graph, but I distribute it over tcp, so we don't need to start multiple instances, and you start multiple instances. I tried your approach locally on Livebook and it took 14.8s and with |
Tested here in some real projects with quite large number of dependencies. The results below are only for dependencies compilation, as the application compilation would skew the results. PC Specs
Project 1: 180 depsNo flag
4 OS processes
8 OS processes
Project 2: 220 depsNo flag
4 OS processes
8 OS processes
Project 3: 238 deps (umbrella, some of the dependencies are big)No flag
4 OS processes
8 OS processes
Note: for all the runs, I had MIX_DEBUG enabled. Results vary a bit on retries, but not too much. This is huge for us. Thank you for making it! |
Potential improvement:
|
Parallel booting of partitions has been added, thanks for the suggestion! |
@lukaszsamson thank you. Here is a one-liner that reproduces it:
Blake2 has a wrong application definition: riverrun/blake2_elixir#2 However, keep in mind Erlang's crypto ships with blake2, so you don't need a dep:
|
Thank's @josevalim, I'll try your branch
I do. The
|
Working fine with your patch @josevalim. Nice win with build time cut in 3 here. Above 4 partitions I see diminishing returns.
Maybe a good idea to add a warning or fail compilation |
Co-authored-by: felipe stival <[email protected]>
Rebenchmarked. With 8 partitions it's so close that this is fine (I'm not going to notice ~1 second on a timescale of 17 seconds)
|
@liamwhite btw, your script runs fine if you start IEx with |
7e61f87
to
4f7d0bf
Compare
💚 💙 💜 💛 ❤️ |
|
||
defp server(socket, deps, count, force?) do | ||
elixir = | ||
System.get_env("MIX_OS_DEPS_COMPILE_PARTITION_ELIXIR_EXECUTABLE") || |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we document this one?
I was trying it out and it failed because my locally built mix was trying to use the asdf installed elixir.
Or I wondered, could we solve it by introducing something like Mix.elixir_interpreter/0
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I decided to not expose it for now and wait if someone will have a use case for it. If they do, then we can just document it.
Any chance this can get backported to 1.18.x? |
@v0idpwn no plans at the moment but please do use main and give us feedback. I already found a bug yesterday in that it did not work on |
Looks like I hit a race condition in our app:
I can get around it by reducing concurrency to 2 but it happens consistently with 4. |
@halfdan which commit are you using? |
Nevermind, fixed here: mroth/exmoji#23 |
The implementation uses a TCP port to distribute the work. We chose to not use the Erlang distribution as that may affect the user app running afterwards.
When compiling Livebook on 10 core machine:
In this case, increasing more than 4 did not decrease compilation times (perhaps expected due to the cost of starting new instances and that compilation itself is already parallel within the same dependency). Closes #14200.