-
Notifications
You must be signed in to change notification settings - Fork 11.4k
CUDA/HIP: Share the same unified memory allocation logic. #12934
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Replace compile-time `GGML_HIP_UMA` with environment variable `GGML_CUDA_ENABLE_UNIFIED_MEMORY`. This unifies the usage on NVIDIA and AMD GPUs, and allows a single binary to be shared between integrated and dedicated GPUs.
Requesting review from @JohannesGaessler and @IMbackK |
One wrinkle is that im not sure how cudaMallocManaged vs cudaMalloc behave on Nvidia apus (grace-hopper/ tegra) potentially the environment variable and what it implies dosent hold for those, in which case we should rename it back to HIP again and only apply it there. |
I'm not sure what you mean. Unless I'm misinterpreting the code changes there should be no change to the logic for NVIDIA hardware since for those we only support CUDA and not HIP. |
it dose change cuda, previously MallocManaged was never used on cuda, now it could be with GGML_CUDA_ENABLE_UNIFIED_MEMORY in env (previously HIP_UMA at compile time). The name of the var also suggests that it is effective for cuda too for the same purpose as on hip but im not sure using MallocManaged will have the same effect as on hip (allocation happens on gtt) when combined with a nvidia "apu" type device. |
also we should handle hipErrorNotSupported (or check managed support beforehand) and fall back to plain hipMalloc since managed memory isent supported on windows and i dont know if cuda supports it in all configurations either. |
The CUDA managed memory change was introduced in previous PR: #8035 In this PR it's simply merging the two together. |
Right, i misread the diff there. Second point still stands, MallocManaged is not supported everywhere on hip, so we should probably fallback to hipMalloc with warning if it is not. i can also do this outside of this pr, as this dosent represent a regression, since the old compile time flag behaved the same way |
Do you think this is okay? diff --git a/ggml/src/ggml-cuda/ggml-cuda.cu b/ggml/src/ggml-cuda/ggml-cuda.cu
index ff257378..05ef182c 100644
--- a/ggml/src/ggml-cuda/ggml-cuda.cu
+++ b/ggml/src/ggml-cuda/ggml-cuda.cu
@@ -104,6 +104,12 @@ static cudaError_t ggml_cuda_device_malloc(void ** ptr, size_t size, int device)
if (err == hipSuccess) {
CUDA_CHECK(cudaMemAdvise(*ptr, size, hipMemAdviseSetCoarseGrain, device));
}
+
+ // fall back to cudaMalloc if not supported (e.g. on Windows)
+ if (err == hipErrorNotSupported) {
+ GGML_LOG_WARN("hipMallocManaged unsupported, falling back to hipMalloc.\n");
+ err = cudaMalloc(ptr, size);
+ }
#endif // defined(GGML_USE_HIP)
}
else |
I think warning on every allocation to be too mutch. alternatively the easy way would be to just limit it to printing once locally. |
This looks fine to me now, ill test later and approve. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Based on static code analysis this looks good to me, I currently can't test the code on AMD hardware because I can't turn on the corresponding machine remotely.
Co-authored-by: Johannes Gäßler <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems to work as intended on my RX 6800.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Works on mi100 as intended, but i can not test an apu nor can i test the fallback path.
Will merge once ci completes |
Replace compile-time
GGML_HIP_UMA
with environment variableGGML_CUDA_ENABLE_UNIFIED_MEMORY
. This unifies the usage on NVIDIA and AMD GPUs, and allows a single binary to be shared between integrated and dedicated GPUs.