Skip to content

metal : add memory pool for temp allocs #12850

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 14 commits into
base: master
Choose a base branch
from
Open

metal : add memory pool for temp allocs #12850

wants to merge 14 commits into from

Conversation

ggerganov
Copy link
Member

@ggerganov ggerganov commented Apr 9, 2025

ref ggml-org/ggml#1152 (comment)

The goal is to introduce a mechanism that allows to allocate temporary buffers in the Metal backend that can be used to store intermediate results. This is needed for some composite operations (like convolution represented by im2col + mul_mat) or for rearranging or padding data on the fly. This is similar to the ggml_cuda_pool_alloc functionality in the CUDA backend.

For testing, currently using the SOFT_MAX operation by introducing an intermediate step of copying the input data to an intermediate buffer and then running the softmax kernel on that intermediate buffer (instead of on the input one).

make -j && MTL_DEBUG_LAYER=1 ./bin/test-backend-ops -b Metal -o SOFT_MAX

TODO:

  • Figure out how to create MTLHeap and allocate buffers from it
  • How to release the buffers
  • Create per-command-buffer heaps
  • How to dynamically resize the heap based on the memory need of the graph
  • Start using MTLHeapTypePlacement to be able to reuse heap memory from previous nodes
  • Un-encode the failed encoder - how? Maybe recreate the command buffer?
  • Check for memory leaks
  • Try to allocate the MTLHeaps dynamically in order to avoid the extra loop over the nodes.
  • Add comments

Next PRs:

  • Use this new functionality to add F16 x F16 MUL_MAT support by casting src1 from F32 to F16
  • Implement im2col + mul_mat for GGML_OP_CONV_XXX

@github-actions github-actions bot added ggml changes relating to the ggml tensor library for machine learning Apple Metal https://en.wikipedia.org/wiki/Metal_(API) labels Apr 9, 2025
@ggerganov ggerganov marked this pull request as ready for review April 15, 2025 12:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Apple Metal https://en.wikipedia.org/wiki/Metal_(API) ggml changes relating to the ggml tensor library for machine learning
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant