[MLX] Isolate submodule build with ExternalProject#20585
Conversation
MLX was pulled into ExecuTorch's build via add_subdirectory, dropping MLX's
whole CMake project into ET's target/option namespace. That shared scope
collides with deps MLX fetches (today nlohmann_json), worked around by patching
MLX's own CMakeLists.txt at configure time. Patching a submodule is fragile: the
patch is pinned to specific lines and silently stops applying when an MLX bump
touches that region, at which point the json collision returns with no clear
signal. It also dirties the submodule and leaks MLX's MLX_BUILD_* options into
ET's cache.
Build MLX in its own isolated CMake scope via ExternalProject and consume it as
a prebuilt static lib + metallib through an imported `mlx` target. MLX then runs
its FetchContent in its own namespace, so the json collision cannot happen and
the patch is deleted. The imported target re-adds the Metal/Foundation/
QuartzCore frameworks a static libmlx.a does not carry (matching the imported
mlx target already in tools/cmake/executorch-config.cmake), and mlxdelegate
depends on mlx_external directly since add_dependencies on an imported target
does not order the build.
The ExternalProject BINARY_DIR is kept at the same location add_subdirectory
used, so libmlx.a and mlx.metallib land at their existing paths; downstream
consumers (package config, metallib copy helper, the pybindings wheel in
setup.py) need no changes.
- Replace MLX_BUILD_* options + patch loop + add_subdirectory with
ExternalProject_Add(mlx_external) and an imported `mlx` target.
- install(TARGETS mlx) -> install(FILES ${_mlx_static_lib}), since an imported
target cannot be installed via install(TARGETS).
- Point the metallib install and MLX_METALLIB_PATH at ${_mlx_metallib}.
- Delete backends/mlx/patches/mlx_json.patch and the patch-apply loop.
Verified: cmake --preset mlx-release configures, builds, and installs;
libmlx.a, libmlxdelegate.a, and mlx.metallib land in cmake-out/lib/; the MLX
submodule stays pristine through configure/build/install; the delegate tests
(op_test_runner, multi_thread_test_runner, mlx_mutable_state_test) build and
link mlx directly, and mlx_mutable_state_test passes.
Fixes pytorch#20556.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/20585
Note: Links to docs will display an error until the docs builds have been completed. ❌ 1 New Failure, 8 Pending, 1 Unrelated FailureAs of commit 84d2a1d with merge base 55a71e6 ( NEW FAILURE - The following job has failed:
FLAKY - The following job failed but was likely due to flakiness present on trunk:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
|
Hi @SuryanshSS1011! Thank you for your pull request and welcome to our community. Action RequiredIn order to merge any pull request (code, docs, etc.), we require contributors to sign our Contributor License Agreement, and we don't seem to have one on file for you. ProcessIn order for us to review and merge your suggested changes, please sign at https://code.facebook.com/cla. If you are contributing on behalf of someone else (eg your employer), the individual CLA may not be sufficient and your employer may need to sign the corporate CLA. Once the CLA is signed, our tooling will perform checks and validations. Afterwards, the pull request will be tagged with If you have received this in error or have any questions, please contact us at cla@meta.com. Thanks! |
|
Thank you for signing our Contributor License Agreement. We can now accept your code for this (and any) Meta Open Source project. Thanks! |
|
@pytorchbot label "release notes: build" |
|
@claude review this code |
|
Claude finished @metascroy's task in 4m 55s —— View job Code Review
Reviewed A few things worth addressing: 1. QuartzCore divergence between the two imported
|
| # For the installed library, we put metallib in lib/ alongside libmlx.a. The | ||
| # metallib is produced in the mlx_external build tree (MLX_METAL_JIT=ON does not | ||
| # install it); _mlx_metallib points there. | ||
| install(FILES ${_mlx_metallib} DESTINATION ${CMAKE_INSTALL_LIBDIR}) |
There was a problem hiding this comment.
WIth this change, do we still need the function(executorch_target_copy_mlx_metallib target) utility in executorch/tools/cmake/Utils.cmake?
There was a problem hiding this comment.
Yes, from what I have seen, it's still needed and still works. It copies the metallib next to a target binary using MLX_METALLIB_PATH, which this PR keeps. I just repointed it at the ExternalProject output (same location as before, since I left BINARY_DIR where add_subdirectory had it). The runners and pybindings still rely on it to colocate mlx.metallib with the binary that statically links MLX, so removing it might break that.
|
Thanks @SuryanshSS1011! How did the mlxdelegate.a size change with this? |
|
Thanks for the review, @metascroy! |
|
Looks great! Thanks for the contribution. I pushed an update to address some of the nits from Claude. If CI passes, I'll merge. |
Summary
Fixes #20556.
Right now MLX gets pulled in with
add_subdirectory, so MLX's entire CMake project lands in ExecuTorch's target/option namespace. This is a problem because MLX fetchesnlohmann_json, which collides with the copy ExecuTorch already provides. The current workaround patches MLX'sCMakeLists.txtat configure time to guard the fetch. That patch is brittle as it's pinned to specific lines, meaning an MLX bump that touches them makes it silently stop applying, and the collision comes back with no obvious signal. It also leaves the submodule dirty and lets MLX'sMLX_BUILD_*options bleed into ExecuTorch's cache.This builds MLX as an
ExternalProjectin its own CMake scope and consumes it through an importedmlxtarget, so MLX runs itsFetchContentin its own namespace and the patch is no longer needed.mlxtarget re-adds the Metal/Foundation/QuartzCore frameworks, since a staticlibmlx.adoesn't carry them. This mirrors the importedmlxtarget already intools/cmake/executorch-config.cmake.mlxdelegatedepends onmlx_externalexplicitly, becauseadd_dependencieson an imported target doesn't order the build on its own.install(TARGETS mlx)is changed toinstall(FILES ${_mlx_static_lib}); an imported target cannot be installed withinstall(TARGETS).MLX_METALLIB_PATHnow point at the ExternalProject output.backends/mlx/patches/mlx_json.patchand the patch loop.The
BINARY_DIRis kept whereadd_subdirectoryput it, solibmlx.aandmlx.metallibland at the same paths as before. The package config, the metallib copy helper inUtils.cmake, and the wheel path insetup.pyare unchanged.Test plan
Verified on Apple Silicon (macOS, Metal toolchain installed):
cmake --preset mlx-releaseconfigures, builds, and installs;libmlx.a,libmlxdelegate.a, andmlx.metallibare installed tocmake-out/lib/, and the MLX submodule stays clean (git -C backends/mlx/third-party/mlx status).With
-DEXECUTORCH_BUILD_TESTS=ON,op_test_runner,multi_thread_test_runner, andmlx_mutable_state_testbuild and linkmlx;mlx_mutable_state_testpasses.I don't have the model checkpoints to run the qwen3_5_moe / gemma4_31b MLX runners or the wheel end-to-end, so I'm relying on CI for those. Since the artifact paths are unchanged, I'd expect them to behave the same as before.
AI assistance disclosure
I used an AI coding assistant for parts of this change, including drafting the CMake and investigating the build. I reviewed and tested everything in the test plan myself and take complete responsibility for the submission.