CUDA/HIP: Share the same unified memory allocation logic. #12934

hjc4869 · 2025-04-14T05:03:54Z

Replace compile-time GGML_HIP_UMA with environment variable GGML_CUDA_ENABLE_UNIFIED_MEMORY. This unifies the usage on NVIDIA and AMD GPUs, and allows a single binary to be shared between integrated and dedicated GPUs.

Replace compile-time `GGML_HIP_UMA` with environment variable `GGML_CUDA_ENABLE_UNIFIED_MEMORY`. This unifies the usage on NVIDIA and AMD GPUs, and allows a single binary to be shared between integrated and dedicated GPUs.

hjc4869 · 2025-04-14T05:46:07Z

Requesting review from @JohannesGaessler and @IMbackK

ggml/src/ggml-cuda/ggml-cuda.cu

IMbackK · 2025-04-14T13:10:39Z

One wrinkle is that im not sure how cudaMallocManaged vs cudaMalloc behave on Nvidia apus (grace-hopper/ tegra) potentially the environment variable and what it implies dosent hold for those, in which case we should rename it back to HIP again and only apply it there.

JohannesGaessler · 2025-04-14T13:27:47Z

I'm not sure what you mean. Unless I'm misinterpreting the code changes there should be no change to the logic for NVIDIA hardware since for those we only support CUDA and not HIP.

IMbackK · 2025-04-14T13:33:29Z

it dose change cuda, previously MallocManaged was never used on cuda, now it could be with GGML_CUDA_ENABLE_UNIFIED_MEMORY in env (previously HIP_UMA at compile time). The name of the var also suggests that it is effective for cuda too for the same purpose as on hip but im not sure using MallocManaged will have the same effect as on hip (allocation happens on gtt) when combined with a nvidia "apu" type device.

IMbackK · 2025-04-14T13:42:49Z

also we should handle hipErrorNotSupported (or check managed support beforehand) and fall back to plain hipMalloc since managed memory isent supported on windows and i dont know if cuda supports it in all configurations either.

hjc4869 · 2025-04-14T13:45:18Z

it dose change cuda, previously MallocManaged was never used on cuda, now it could be with GGML_CUDA_ENABLE_UNIFIED_MEMORY in env (previously HIP_UMA at compile time). The name of the var also suggests that it is effective for cuda too for the same purpose as on hip but im not sure using MallocManaged will have the same effect as on hip (allocation happens on gtt) when combined with a nvidia "apu" type device.

The CUDA managed memory change was introduced in previous PR: #8035

In this PR it's simply merging the two together.

IMbackK · 2025-04-14T13:50:31Z

Right, i misread the diff there. Second point still stands, MallocManaged is not supported everywhere on hip, so we should probably fallback to hipMalloc with warning if it is not.

i can also do this outside of this pr, as this dosent represent a regression, since the old compile time flag behaved the same way

hjc4869 · 2025-04-14T13:58:22Z

Do you think this is okay?

diff --git a/ggml/src/ggml-cuda/ggml-cuda.cu b/ggml/src/ggml-cuda/ggml-cuda.cu
index ff257378..05ef182c 100644
--- a/ggml/src/ggml-cuda/ggml-cuda.cu
+++ b/ggml/src/ggml-cuda/ggml-cuda.cu
@@ -104,6 +104,12 @@ static cudaError_t ggml_cuda_device_malloc(void ** ptr, size_t size, int device)
         if (err == hipSuccess) {
             CUDA_CHECK(cudaMemAdvise(*ptr, size, hipMemAdviseSetCoarseGrain, device));
         }
+
+        // fall back to cudaMalloc if not supported (e.g. on Windows)
+        if (err == hipErrorNotSupported) {
+            GGML_LOG_WARN("hipMallocManaged unsupported, falling back to hipMalloc.\n");
+            err = cudaMalloc(ptr, size);
+        }
 #endif // defined(GGML_USE_HIP)
     }
     else

IMbackK · 2025-04-14T14:06:58Z

I think warning on every allocation to be too mutch.
A better design would be to check env and capability in ggml_cuda_init, warn once if those dont match, and store what allocator to use in ggml_cuda_device_info.

alternatively the easy way would be to just limit it to printing once locally.

…ory is not supported.

IMbackK · 2025-04-14T14:39:02Z

This looks fine to me now, ill test later and approve.

JohannesGaessler

Based on static code analysis this looks good to me, I currently can't test the code on AMD hardware because I can't turn on the corresponding machine remotely.

ggml/src/ggml-cuda/ggml-cuda.cu

Co-authored-by: Johannes Gäßler <[email protected]>

JohannesGaessler

Seems to work as intended on my RX 6800.

IMbackK

Works on mi100 as intended, but i can not test an apu nor can i test the fallback path.

IMbackK · 2025-04-15T08:39:30Z

Will merge once ci completes

…2934) Replace compile-time `GGML_HIP_UMA` with environment variable `GGML_CUDA_ENABLE_UNIFIED_MEMORY`. This unifies the usage on NVIDIA and AMD GPUs, and allows a single binary to be shared between integrated and dedicated GPUs.

CUDA/HIP: Share the same unified memory allocation logic.

493d9f7

Replace compile-time `GGML_HIP_UMA` with environment variable `GGML_CUDA_ENABLE_UNIFIED_MEMORY`. This unifies the usage on NVIDIA and AMD GPUs, and allows a single binary to be shared between integrated and dedicated GPUs.

github-actions bot added documentation Improvements or additions to documentation Nvidia GPU Issues specific to Nvidia GPUs ggml changes relating to the ggml tensor library for machine learning labels Apr 14, 2025

hjc4869 changed the title ~~CUDA/HIP: Reduce the divergence of unified memory allocation between CUDA and HIP.~~ CUDA/HIP: Share the same unified memory allocation logic. Apr 14, 2025

IMbackK reviewed Apr 14, 2025

View reviewed changes

ggml/src/ggml-cuda/ggml-cuda.cu Outdated Show resolved Hide resolved

Remove comment

6fab0d4

hjc4869 added 2 commits April 14, 2025 22:20

Add a fallback to hipMalloc() and print warning once when managed mem…

2428d67

…ory is not supported.

Fix editorconfig check error

a3862d9

JohannesGaessler reviewed Apr 14, 2025

View reviewed changes

ggml/src/ggml-cuda/ggml-cuda.cu Outdated Show resolved Hide resolved

Use snake case variable

e6956ca

Co-authored-by: Johannes Gäßler <[email protected]>

JohannesGaessler approved these changes Apr 14, 2025

View reviewed changes

IMbackK approved these changes Apr 15, 2025

View reviewed changes

IMbackK merged commit 84778e9 into ggml-org:master Apr 15, 2025
93 of 96 checks passed

hjc4869 deleted the hip-uma branch April 15, 2025 09:22

CUDA/HIP: Share the same unified memory allocation logic. #12934

CUDA/HIP: Share the same unified memory allocation logic. #12934

Uh oh!

Conversation

hjc4869 commented Apr 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

hjc4869 commented Apr 14, 2025

Uh oh!

Uh oh!

IMbackK commented Apr 14, 2025

Uh oh!

JohannesGaessler commented Apr 14, 2025

Uh oh!

IMbackK commented Apr 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

IMbackK commented Apr 14, 2025

Uh oh!

hjc4869 commented Apr 14, 2025

Uh oh!

IMbackK commented Apr 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

hjc4869 commented Apr 14, 2025

Uh oh!

IMbackK commented Apr 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

IMbackK commented Apr 14, 2025

Uh oh!

JohannesGaessler left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

JohannesGaessler left a comment

Choose a reason for hiding this comment

Uh oh!

IMbackK left a comment

Choose a reason for hiding this comment

Uh oh!

IMbackK commented Apr 15, 2025

Uh oh!

Uh oh!

Uh oh!

hjc4869 commented Apr 14, 2025 •

edited

Loading

IMbackK commented Apr 14, 2025 •

edited

Loading

IMbackK commented Apr 14, 2025 •

edited

Loading

IMbackK commented Apr 14, 2025 •

edited

Loading