Skip to content

CUDA/HIP: Share the same unified memory allocation logic. #12934

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Apr 15, 2025

Conversation

hjc4869
Copy link
Contributor

@hjc4869 hjc4869 commented Apr 14, 2025

Replace compile-time GGML_HIP_UMA with environment variable GGML_CUDA_ENABLE_UNIFIED_MEMORY. This unifies the usage on NVIDIA and AMD GPUs, and allows a single binary to be shared between integrated and dedicated GPUs.

Replace compile-time `GGML_HIP_UMA` with environment variable `GGML_CUDA_ENABLE_UNIFIED_MEMORY`. This unifies the usage on NVIDIA and AMD GPUs, and allows a single binary to be shared between integrated and dedicated GPUs.
@github-actions github-actions bot added documentation Improvements or additions to documentation Nvidia GPU Issues specific to Nvidia GPUs ggml changes relating to the ggml tensor library for machine learning labels Apr 14, 2025
@hjc4869 hjc4869 changed the title CUDA/HIP: Reduce the divergence of unified memory allocation between CUDA and HIP. CUDA/HIP: Share the same unified memory allocation logic. Apr 14, 2025
@hjc4869
Copy link
Contributor Author

hjc4869 commented Apr 14, 2025

Requesting review from @JohannesGaessler and @IMbackK

@IMbackK
Copy link
Collaborator

IMbackK commented Apr 14, 2025

One wrinkle is that im not sure how cudaMallocManaged vs cudaMalloc behave on Nvidia apus (grace-hopper/ tegra) potentially the environment variable and what it implies dosent hold for those, in which case we should rename it back to HIP again and only apply it there.

@JohannesGaessler
Copy link
Collaborator

I'm not sure what you mean. Unless I'm misinterpreting the code changes there should be no change to the logic for NVIDIA hardware since for those we only support CUDA and not HIP.

@IMbackK
Copy link
Collaborator

IMbackK commented Apr 14, 2025

it dose change cuda, previously MallocManaged was never used on cuda, now it could be with GGML_CUDA_ENABLE_UNIFIED_MEMORY in env (previously HIP_UMA at compile time). The name of the var also suggests that it is effective for cuda too for the same purpose as on hip but im not sure using MallocManaged will have the same effect as on hip (allocation happens on gtt) when combined with a nvidia "apu" type device.

@IMbackK
Copy link
Collaborator

IMbackK commented Apr 14, 2025

also we should handle hipErrorNotSupported (or check managed support beforehand) and fall back to plain hipMalloc since managed memory isent supported on windows and i dont know if cuda supports it in all configurations either.

@hjc4869
Copy link
Contributor Author

hjc4869 commented Apr 14, 2025

it dose change cuda, previously MallocManaged was never used on cuda, now it could be with GGML_CUDA_ENABLE_UNIFIED_MEMORY in env (previously HIP_UMA at compile time). The name of the var also suggests that it is effective for cuda too for the same purpose as on hip but im not sure using MallocManaged will have the same effect as on hip (allocation happens on gtt) when combined with a nvidia "apu" type device.

The CUDA managed memory change was introduced in previous PR: #8035

In this PR it's simply merging the two together.

@IMbackK
Copy link
Collaborator

IMbackK commented Apr 14, 2025

Right, i misread the diff there. Second point still stands, MallocManaged is not supported everywhere on hip, so we should probably fallback to hipMalloc with warning if it is not.

i can also do this outside of this pr, as this dosent represent a regression, since the old compile time flag behaved the same way

@hjc4869
Copy link
Contributor Author

hjc4869 commented Apr 14, 2025

Do you think this is okay?

diff --git a/ggml/src/ggml-cuda/ggml-cuda.cu b/ggml/src/ggml-cuda/ggml-cuda.cu
index ff257378..05ef182c 100644
--- a/ggml/src/ggml-cuda/ggml-cuda.cu
+++ b/ggml/src/ggml-cuda/ggml-cuda.cu
@@ -104,6 +104,12 @@ static cudaError_t ggml_cuda_device_malloc(void ** ptr, size_t size, int device)
         if (err == hipSuccess) {
             CUDA_CHECK(cudaMemAdvise(*ptr, size, hipMemAdviseSetCoarseGrain, device));
         }
+
+        // fall back to cudaMalloc if not supported (e.g. on Windows)
+        if (err == hipErrorNotSupported) {
+            GGML_LOG_WARN("hipMallocManaged unsupported, falling back to hipMalloc.\n");
+            err = cudaMalloc(ptr, size);
+        }
 #endif // defined(GGML_USE_HIP)
     }
     else

@IMbackK
Copy link
Collaborator

IMbackK commented Apr 14, 2025

I think warning on every allocation to be too mutch.
A better design would be to check env and capability in ggml_cuda_init, warn once if those dont match, and store what allocator to use in ggml_cuda_device_info.

alternatively the easy way would be to just limit it to printing once locally.

@IMbackK
Copy link
Collaborator

IMbackK commented Apr 14, 2025

This looks fine to me now, ill test later and approve.

Copy link
Collaborator

@JohannesGaessler JohannesGaessler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on static code analysis this looks good to me, I currently can't test the code on AMD hardware because I can't turn on the corresponding machine remotely.

Co-authored-by: Johannes Gäßler <[email protected]>
Copy link
Collaborator

@JohannesGaessler JohannesGaessler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems to work as intended on my RX 6800.

Copy link
Collaborator

@IMbackK IMbackK left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Works on mi100 as intended, but i can not test an apu nor can i test the fallback path.

@IMbackK
Copy link
Collaborator

IMbackK commented Apr 15, 2025

Will merge once ci completes

@IMbackK IMbackK merged commit 84778e9 into ggml-org:master Apr 15, 2025
93 of 96 checks passed
@hjc4869 hjc4869 deleted the hip-uma branch April 15, 2025 09:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation ggml changes relating to the ggml tensor library for machine learning Nvidia GPU Issues specific to Nvidia GPUs
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants