[MLX] Isolate submodule build with ExternalProject by SuryanshSS1011 · Pull Request #20585 · pytorch/executorch

SuryanshSS1011 · 2026-06-28T18:29:56Z

Summary

Fixes #20556.

Right now MLX gets pulled in with add_subdirectory, so MLX's entire CMake project lands in ExecuTorch's target/option namespace. This is a problem because MLX fetches nlohmann_json, which collides with the copy ExecuTorch already provides. The current workaround patches MLX's CMakeLists.txt at configure time to guard the fetch. That patch is brittle as it's pinned to specific lines, meaning an MLX bump that touches them makes it silently stop applying, and the collision comes back with no obvious signal. It also leaves the submodule dirty and lets MLX's MLX_BUILD_* options bleed into ExecuTorch's cache.

This builds MLX as an ExternalProject in its own CMake scope and consumes it through an imported mlx target, so MLX runs its FetchContent in its own namespace and the patch is no longer needed.

The imported mlx target re-adds the Metal/Foundation/QuartzCore frameworks, since a static libmlx.a doesn't carry them. This mirrors the imported mlx target already in tools/cmake/executorch-config.cmake. mlxdelegate depends on mlx_external explicitly, because add_dependencies on an imported target doesn't order the build on its own.
install(TARGETS mlx) is changed to install(FILES ${_mlx_static_lib}); an imported target cannot be installed with install(TARGETS).
The metallib install and MLX_METALLIB_PATH now point at the ExternalProject output.
Removed backends/mlx/patches/mlx_json.patch and the patch loop.

The BINARY_DIR is kept where add_subdirectory put it, so libmlx.a and mlx.metallib land at the same paths as before. The package config, the metallib copy helper in Utils.cmake, and the wheel path in setup.py are unchanged.

Test plan

Verified on Apple Silicon (macOS, Metal toolchain installed):

cmake --preset mlx-release configures, builds, and installs; libmlx.a, libmlxdelegate.a, and mlx.metallib are installed to cmake-out/lib/, and the MLX submodule stays clean (git -C backends/mlx/third-party/mlx status).

With -DEXECUTORCH_BUILD_TESTS=ON, op_test_runner, multi_thread_test_runner, and mlx_mutable_state_test build and link mlx; mlx_mutable_state_test passes.

I don't have the model checkpoints to run the qwen3_5_moe / gemma4_31b MLX runners or the wheel end-to-end, so I'm relying on CI for those. Since the artifact paths are unchanged, I'd expect them to behave the same as before.

AI assistance disclosure

I used an AI coding assistant for parts of this change, including drafting the CMake and investigating the build. I reviewed and tested everything in the test plan myself and take complete responsibility for the submission.

MLX was pulled into ExecuTorch's build via add_subdirectory, dropping MLX's whole CMake project into ET's target/option namespace. That shared scope collides with deps MLX fetches (today nlohmann_json), worked around by patching MLX's own CMakeLists.txt at configure time. Patching a submodule is fragile: the patch is pinned to specific lines and silently stops applying when an MLX bump touches that region, at which point the json collision returns with no clear signal. It also dirties the submodule and leaks MLX's MLX_BUILD_* options into ET's cache. Build MLX in its own isolated CMake scope via ExternalProject and consume it as a prebuilt static lib + metallib through an imported `mlx` target. MLX then runs its FetchContent in its own namespace, so the json collision cannot happen and the patch is deleted. The imported target re-adds the Metal/Foundation/ QuartzCore frameworks a static libmlx.a does not carry (matching the imported mlx target already in tools/cmake/executorch-config.cmake), and mlxdelegate depends on mlx_external directly since add_dependencies on an imported target does not order the build. The ExternalProject BINARY_DIR is kept at the same location add_subdirectory used, so libmlx.a and mlx.metallib land at their existing paths; downstream consumers (package config, metallib copy helper, the pybindings wheel in setup.py) need no changes. - Replace MLX_BUILD_* options + patch loop + add_subdirectory with ExternalProject_Add(mlx_external) and an imported `mlx` target. - install(TARGETS mlx) -> install(FILES ${_mlx_static_lib}), since an imported target cannot be installed via install(TARGETS). - Point the metallib install and MLX_METALLIB_PATH at ${_mlx_metallib}. - Delete backends/mlx/patches/mlx_json.patch and the patch-apply loop. Verified: cmake --preset mlx-release configures, builds, and installs; libmlx.a, libmlxdelegate.a, and mlx.metallib land in cmake-out/lib/; the MLX submodule stays pristine through configure/build/install; the delegate tests (op_test_runner, multi_thread_test_runner, mlx_mutable_state_test) build and link mlx directly, and mlx_mutable_state_test passes. Fixes pytorch#20556. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

pytorch-bot · 2026-06-28T18:30:00Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/20585

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❌ 1 New Failure, 8 Pending, 1 Unrelated Failure

As of commit 84d2a1d with merge base 55a71e6 ():

NEW FAILURE - The following job has failed:

pull / test-qnn-delegate-linux / linux-job (gh)
RuntimeError: Command docker exec -t aa9afc28a418df0a7b3651eeece2f8293af5eb4eadc78c209f34f36ae109c80a /exec failed with exit code 1

FLAKY - The following job failed but was likely due to flakiness present on trunk:

pull / test-qnn-testsuite-linux / test-backend-linux (qnn, models) / linux-job (gh) (detected as infra flaky with no log or failing log classifier)

This comment was automatically generated by Dr. CI and updates every 15 minutes.

meta-cla · 2026-06-28T18:30:01Z

Hi @SuryanshSS1011!

Thank you for your pull request and welcome to our community.

Action Required

In order to merge any pull request (code, docs, etc.), we require contributors to sign our Contributor License Agreement, and we don't seem to have one on file for you.

Process

In order for us to review and merge your suggested changes, please sign at https://code.facebook.com/cla. If you are contributing on behalf of someone else (eg your employer), the individual CLA may not be sufficient and your employer may need to sign the corporate CLA.

Once the CLA is signed, our tooling will perform checks and validations. Afterwards, the pull request will be tagged with CLA signed. The tagging process may take up to 1 hour after signing. Please give it that time before contacting us about it.

If you have received this in error or have any questions, please contact us at cla@meta.com. Thanks!

linux-foundation-easycla · 2026-06-28T18:30:05Z

The committers listed above are authorized under a signed CLA.

✅ Unknown (351ee54)
✅ login: SuryanshSS1011 / name: Suryansh Sijwali (351ee54)

One or more co-authors of this pull request were not found. You must specify co-authors in commit message trailer via:

Co-authored-by: name <email>

Supported Co-authored-by: formats include:

Anything <id+login@users.noreply.github.com> - it will locate your GitHub user by id part.
Anything <login@users.noreply.github.com> - it will locate your GitHub user by login part.
Anything <public-email> - it will locate your GitHub user by public-email part. Note that this email must be made public on Github.
Anything <other-email> - it will locate your GitHub user by other-email part but only if that email was used before for any other CLA as a main commit author.
login <any-valid-email> - it will locate your GitHub user by login part, note that login part must be at least 3 characters long.

Alternatively, if the co-author should not be included, remove the Co-authored-by: line from the commit message.

Please update your commit message(s) by doing git commit --amend and then git push [--force] and then request re-running CLA check via commenting on this pull request:

/easycla

meta-cla · 2026-06-28T18:58:10Z

Thank you for signing our Contributor License Agreement. We can now accept your code for this (and any) Meta Open Source project. Thanks!

SuryanshSS1011 · 2026-06-28T19:18:04Z

@pytorchbot label "release notes: build"

metascroy · 2026-06-29T16:29:26Z

@claude review this code

claude · 2026-06-29T16:30:15Z

Claude finished @metascroy's task in 4m 55s —— View job

Code Review

Read backends/mlx/CMakeLists.txt
Verify ExternalProject wiring (imported target, dependencies, install)
Check related files (test/CMakeLists.txt, executorch-config.cmake, setup.py)
Post review feedback

Reviewed backends/mlx/CMakeLists.txt against origin/main, plus the downstream consumers (backends/mlx/test/CMakeLists.txt, tools/cmake/executorch-config.cmake, setup.py). This is a clean, well-motivated change — moving MLX into an ExternalProject to escape the shared target/option namespace is the right call, and dropping the line-pinned submodule patch removes a real silent-failure footgun. The comments explaining the add_dependencies-on-imported-target subtlety are excellent. The binary-dir layout is preserved, so setup.py:1081 (backends/mlx/mlx/mlx/backend/metal/kernels/) and the metallib copy helper continue to resolve.

A few things worth addressing:

1. QuartzCore divergence between the two imported `mlx` targets (medium)

The new in-tree imported target adds three frameworks (CMakeLists.txt:222-229):

find_library(QUARTZ_FRAMEWORK QuartzCore)
... INTERFACE_LINK_LIBRARIES "${METAL_FRAMEWORK};${FOUNDATION_FRAMEWORK};${QUARTZ_FRAMEWORK}"

But the imported mlx target in the installed package config (tools/cmake/executorch-config.cmake:138-146) only re-adds Metal;Foundation — no QuartzCore. The PR description says this "mirrors the imported mlx target already in executorch-config.cmake", but it actually diverges from it.

This matters because downstream consumers that link MLX through the installed package config (rather than the in-tree target) get a different link line. If QuartzCore is genuinely required to resolve libmlx.a symbols, those consumers will hit link errors; if it isn't required, it's dead weight here. Either way the two should agree. Please confirm whether QuartzCore is needed and bring them into parity. Fix this →

2. `ExternalProject` won't rebuild MLX on a submodule bump (medium)

ExternalProject_Add configures/builds once and stamps; with no BUILD_ALWAYS and no source-dir dependency tracking, updating the MLX submodule (git submodule update) won't invalidate the configure/build stamp. A developer who bumps MLX gets a stale libmlx.a with no signal — a quieter version of the exact failure mode this PR is fixing.

This is fine for clean CI builds, and the trade-off may well be acceptable, but it's worth either adding BUILD_ALWAYS ON (cheap with Ninja when nothing changed) or noting the "reconfigure after an MLX bump" expectation in the comment so it isn't a surprise.

3. Imported-target include dir is source-only (low — verify)

target_include_directories(mlx INTERFACE ${MLX_SOURCE_DIR}) (:221) exposes only the source tree. With add_subdirectory, MLX's own mlx target also carried its build-tree include dir, which is where MLX generates headers (e.g. build/version config). Your tested targets compile, so the headers mlxdelegate and strict_compile_test pull in are evidently all source-resident — but if a future MLX rev moves a public header into the build tree, consumers will fail to find it. Not a blocker; just flagging the narrowed include surface.

4. Minor: redundant `add_dependencies(mlx mlx_external)` (nit)

Line :230 adds a dependency on the imported mlx target, which (as your own comment at :258-261 explains) doesn't order anything — the real ordering is the add_dependencies(mlxdelegate mlx_external) at :261. The :230 call is a harmless no-op and could be dropped to avoid implying it does something. Optional.

Things I verified look correct

install(FILES ${_mlx_static_lib}) instead of install(TARGETS mlx) — correct, imported targets can't be installed via install(TARGETS).
BUILD_BYPRODUCTS correctly declares both libmlx.a and the metallib so Ninja tracks them.
$<BUILD_INTERFACE:mlx> link + tests/package-config linking mlx separately — consistent with the "consumers link mlx separately" model, unchanged from before.
Binary paths (_mlx_binary_dir = ${CMAKE_CURRENT_BINARY_DIR}/mlx) preserve the add_subdirectory layout, so setup.py and Utils.cmake need no changes, as claimed.
Toolchain/platform args (CMAKE_TOOLCHAIN_FILE, PLATFORM, DEPLOYMENT_TARGET, CMAKE_OSX_DEPLOYMENT_TARGET) are forwarded so the iOS/macOS deployment-target validation upstream stays meaningful.

Nothing here is blocking; item #1 is the one I'd most want resolved (or explained) before merge since it affects downstream linking through the installed config.
• branch fix/20556-mlx-build-isolation

metascroy · 2026-06-29T16:33:58Z

+# For the installed library, we put metallib in lib/ alongside libmlx.a. The
+# metallib is produced in the mlx_external build tree (MLX_METAL_JIT=ON does not
+# install it); _mlx_metallib points there.
+install(FILES ${_mlx_metallib} DESTINATION ${CMAKE_INSTALL_LIBDIR})


WIth this change, do we still need the function(executorch_target_copy_mlx_metallib target) utility in executorch/tools/cmake/Utils.cmake?

Yes, from what I have seen, it's still needed and still works. It copies the metallib next to a target binary using MLX_METALLIB_PATH, which this PR keeps. I just repointed it at the ExternalProject output (same location as before, since I left BINARY_DIR where add_subdirectory had it). The runners and pybindings still rely on it to colocate mlx.metallib with the binary that statically links MLX, so removing it might break that.

metascroy · 2026-06-29T16:34:28Z

Thanks @SuryanshSS1011! How did the mlxdelegate.a size change with this?

SuryanshSS1011 · 2026-06-29T18:07:34Z

Thanks for the review, @metascroy!
To answer your question: libmlxdelegate.a size is unchanged (3,893,456 bytes both before and after) as libmlxdelegate.a is built from the same three source files (MLXLoader.cpp, MLXBackend.cpp, mlx_mutable_state.cpp) with the same flags. The PR only affects how MLX itself is pulled in, and not the delegate's own compilation.

metascroy · 2026-06-29T19:15:25Z

Looks great! Thanks for the contribution.

I pushed an update to address some of the nits from Claude. If CI passes, I'll merge.

SuryanshSS1011 requested review from kirklandsign and larryliu0820 as code owners June 28, 2026 18:29

meta-cla Bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jun 28, 2026

pytorch-bot Bot added the release notes: build Changes related to build, including dependency upgrades, build flags, optimizations, etc. label Jun 28, 2026

rascani requested a review from metascroy June 29, 2026 14:53

metascroy reviewed Jun 29, 2026

View reviewed changes

nits

84d2a1d

metascroy added the ciflow/mlx label Jun 29, 2026

metascroy approved these changes Jun 29, 2026

View reviewed changes

metascroy merged commit 0cef6de into pytorch:main Jun 29, 2026
225 of 227 checks passed

Uh oh!

Conversation

SuryanshSS1011 commented Jun 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

AI assistance disclosure

Uh oh!

pytorch-bot Bot commented Jun 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/20585

❌ 1 New Failure, 8 Pending, 1 Unrelated Failure

Uh oh!

meta-cla Bot commented Jun 28, 2026

Action Required

Process

Uh oh!

linux-foundation-easycla Bot commented Jun 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

meta-cla Bot commented Jun 28, 2026

Uh oh!

SuryanshSS1011 commented Jun 28, 2026

Uh oh!

metascroy commented Jun 29, 2026

Uh oh!

claude Bot commented Jun 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code Review

1. QuartzCore divergence between the two imported mlx targets (medium)

2. ExternalProject won't rebuild MLX on a submodule bump (medium)

3. Imported-target include dir is source-only (low — verify)

4. Minor: redundant add_dependencies(mlx mlx_external) (nit)

Things I verified look correct

Uh oh!

metascroy Jun 29, 2026

Choose a reason for hiding this comment

Uh oh!

SuryanshSS1011 Jun 29, 2026

Choose a reason for hiding this comment

Uh oh!

metascroy commented Jun 29, 2026

Uh oh!

SuryanshSS1011 commented Jun 29, 2026

Uh oh!

metascroy commented Jun 29, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

SuryanshSS1011 commented Jun 28, 2026 •

edited

Loading

pytorch-bot Bot commented Jun 28, 2026 •

edited

Loading

linux-foundation-easycla Bot commented Jun 28, 2026 •

edited

Loading

claude Bot commented Jun 29, 2026 •

edited

Loading

1. QuartzCore divergence between the two imported `mlx` targets (medium)

2. `ExternalProject` won't rebuild MLX on a submodule bump (medium)

4. Minor: redundant `add_dependencies(mlx mlx_external)` (nit)