[FEAT] Model loading refactor #10604

SunMarc · 2025-01-17T23:48:24Z

What does this PR do?

Fixes #10013 . This PR refactors model loading in diffusers. Here's a list of major changes in this PR.

only two loading paths (low_cpu_mem_usage=True and low_cpu_mem_usage = False). We don't rely on load_checkpoint_and_dispatch anymore and we don't merge sharded checkpoint also.
support for sharded checkpoints for both loading paths
keep_module_in_fp32 support for sharded checkpoints
better support for displaying warning due to error/unexpected/missing/mismatched keys

For low_cpu_mem_usage = False:

Faster initialization (thanks to skipping the init + assign_to_params_buffers). I didn't benchmarked it but it should be as fast as low_cpu_mem_usage=True or maybe even faster. We did a similar PR in transformers thanks to @muellerzr.
Better torch_dtype support We don't initialize anymore the model in fp32 then cast the model to a specific dtype after finishing to load the weights.

For low_cpu_mem_usage = True or device_map!=None:

one path, we don't rely anymore on load_checkpoint_and_dispatch
device_map support for quantization
non persistance buffer support through dispatch_model ( the test you added is passing cc @hlky )

Single format file:

Simplified the single file format loading through from_pretrained. This way we have the same features as this function (device_map, quantization ...). Feel free to share your opinion @DN6, I didn't expect to touch this but I felt that we could simplify a bit

TODO (some items can be done in follow-up PRs):

Check if we have any regression / tests issues
Add more tests
Deal with missing keys in the model for both paths (before, it only worked when low_cpu_mem_usage=False since we are initializing the whole model)
Fix typing
Better support for offload with safetensors (like in transformers)

Please let me know your thoughts on the PR !

cc @sayakpaul, @DN6 , @yiyixuxu , @hlky , @a-r-r-o-w

SunMarc · 2025-01-18T10:54:55Z

FLAX CPU failing test is unrelated, failing in other PRs too

src/diffusers/loaders/single_file_model.py

sayakpaul

Thanks for starting this! Left some comments from a first pass.

I think we will need to also add tests for seeing if device_map works as expected for quantization. Okay to not test that a bit later once there is consensus about the design changes. Maybe we could add that as a TODO.

Other tests could include checking if we can do low_cpu_mem_usage=True along with some changed config values. This will ensure we're well tested for cases like #9343.

src/diffusers/models/model_loading_utils.py

src/diffusers/models/modeling_utils.py

src/diffusers/quantizers/bitsandbytes/bnb_quantizer.py

sayakpaul · 2025-01-20T06:37:59Z

@SunMarc,

Additionally, I ran some tests on audace (two RTX 4090s). Some tests that are failing (they fail on main too):

Failures

FAILED tests/models/unets/test_models_unet_2d_condition.py::UNet2DConditionModelTests::test_load_sharded_checkpoint_from_hub_0_hf_internal_testing_unet2d_sharded_dummy - RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cuda:1! (when checking argumen...
FAILED tests/models/unets/test_models_unet_2d_condition.py::UNet2DConditionModelTests::test_load_sharded_checkpoint_from_hub_1_hf_internal_testing_tiny_sd_unet_sharded_latest_format - RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cuda:1! (when checking argumen...
FAILED tests/models/unets/test_models_unet_2d_condition.py::UNet2DConditionModelTests::test_load_sharded_checkpoint_from_hub_local - RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cuda:1! (when checking argumen...
FAILED tests/models/unets/test_models_unet_2d_condition.py::UNet2DConditionModelTests::test_load_sharded_checkpoint_from_hub_local_subfolder - RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cuda:1! (when checking argumen...
FAILED tests/models/unets/test_models_unet_2d_condition.py::UNet2DConditionModelTests::test_load_sharded_checkpoint_from_hub_subfolder_0_hf_internal_testing_unet2d_sharded_dummy_subfolder - RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cuda:1! (when checking argumen...
FAILED tests/models/unets/test_models_unet_2d_condition.py::UNet2DConditionModelTests::test_load_sharded_checkpoint_from_hub_subfolder_1_hf_internal_testing_tiny_sd_unet_sharded_latest_format_subfolder - RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cuda:1! (when checking argumen...
FAILED tests/models/unets/test_models_unet_2d_condition.py::UNet2DConditionModelTests::test_sharded_checkpoints - RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cuda:1! (when checking argumen...
FAILED tests/models/unets/test_models_unet_2d_condition.py::UNet2DConditionModelTests::test_sharded_checkpoints_device_map - RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cuda:1! (when checking argumen...
FAILED tests/models/unets/test_models_unet_2d_condition.py::UNet2DConditionModelTests::test_sharded_checkpoints_with_variant - RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cuda:1! (when checking argument...

^^ passes when using with CUDA_VISIBLE_DEVICES=0 (same with main). Expected?

Same for following:

FAILED tests/models/unets/test_models_unet_2d_condition.py::UNet2DConditionModelTests::test_sharded_checkpoints_device_map - RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cuda:1!

And then I also ran:

RUN_SLOW=1 pytest tests/pipelines/stable_diffusion/test_stable_diffusion.py::StableDiffusionPipelineDeviceMapTests

Everything passes.

hlky · 2025-01-20T07:58:35Z

src/diffusers/models/model_loading_utils.py


-    for param_name, param in named_buffers:


We need to keep this or equivalent elsewhere, context: #10523

The changes I did should also cover this use case. The test you added should pass with my PR. The is mainly due to adding the dispatch_model function at the end.

@hlky are we cool here?

The tests are passing so all good

src/diffusers/models/modeling_utils.py

Co-authored-by: Sayak Paul <[email protected]>

…odel-loading-refactor

src/diffusers/models/modeling_utils.py

Co-authored-by: YiYi Xu <[email protected]>

…odel-loading-refactor

src/diffusers/models/modeling_utils.py

SunMarc · 2025-02-14T16:50:22Z

I am running the 4bit quantization tests currently. And so far things are looking nice! Some tests that might be worth including/consdering:

Device map with quantization
Effectiveness of keep_modules_in_fp32 when not using quantization.

Done ! Please check

tests/models/test_modeling_common.py

DN6

LGTM 👍🏽 Thanks @SunMarc. Just need to replace the model repo id in the keep_in_fp32 tests so they pass.

sayakpaul

Thanks a bunch, @SunMarc! I left some mixed comments across the board. But I don't think any of them are major.

Really great work!

src/diffusers/loaders/single_file_model.py

sayakpaul · 2025-02-17T04:36:28Z

src/diffusers/loaders/single_file_utils.py

@@ -1593,18 +1593,9 @@ def create_diffusers_clip_model_from_ldm(
        raise ValueError("The provided checkpoint does not seem to contain a valid CLIP model.")


@DN6 would you like to run the slow single-file tests to ensure we're not breaking anything here (if not already)?

src/diffusers/models/model_loading_utils.py

src/diffusers/models/modeling_utils.py

tests/models/test_modeling_common.py

sayakpaul · 2025-02-17T05:09:02Z

tests/models/test_modeling_common.py

+            for name, module in model.named_modules():
+                if isinstance(module, torch.nn.Linear):
+                    if name in model._keep_in_fp32_modules:
+                        self.assertTrue(module.weight.dtype == torch.float32)
+                    else:
+                        self.assertTrue(module.weight.dtype == torch_dtype)
+        SD3Transformer2DModel._keep_in_fp32_modules = fp32_modules


Two things:

Would it make sense to make this test a part of ModelTesterMixin? Not strongly opinionated about it.

Let's make sure we also perform inference -- this helps us validate the effectiveness of _keep_modules_in_fp32 even better. WDYT?

Would it make sense to make this test a part of ModelTesterMixin? Not strongly opinionated about it.

It can make sense to add a test in ModelTesterMixin for the models that do have _keep_in_fp32_modules specified. However, for now, none of the models use this arg. Maybe in a follow-up PR when a model actually needs this ?

Let's make sure we also perform inference -- this helps us validate the effectiveness of _keep_modules_in_fp32 even better. WDYT?

fixed

sayakpaul · 2025-02-17T05:09:54Z

tests/quantization/bnb/test_4bit.py

@@ -136,7 +136,7 @@ def setUp(self):
            bnb_4bit_compute_dtype=torch.float16,
        )
        self.model_4bit = SD3Transformer2DModel.from_pretrained(
-            self.model_name, subfolder="transformer", quantization_config=nf4_config
+            self.model_name, subfolder="transformer", quantization_config=nf4_config, device_map=torch_device


Would it make sense to test with "auto" device_map instead?

it would work also but I kept torch_device for simplicity as we do in transformers. Do you have multi-gpu runner for quantization tests ? We can create multi-gpu tests if needed.

Okay let's do that in a separate PR. I can do that. DO you have a reference?

you can look at this https://github.com/huggingface/transformers/blob/main/tests/quantization/fbgemm_fp8/test_fbgemm_fp8.py

Co-authored-by: Sayak Paul <[email protected]>

sayakpaul

Just some final set of questions and we should be good to go.

src/diffusers/loaders/single_file_utils.py

src/diffusers/models/model_loading_utils.py

sayakpaul · 2025-02-18T01:50:35Z

src/diffusers/models/model_loading_utils.py


-    for param_name, param in named_buffers:


@hlky are we cool here?

src/diffusers/models/model_loading_utils.py

tests/models/test_modeling_common.py

sayakpaul · 2025-02-18T01:54:01Z

tests/quantization/bnb/test_4bit.py

@@ -136,7 +136,7 @@ def setUp(self):
            bnb_4bit_compute_dtype=torch.float16,
        )
        self.model_4bit = SD3Transformer2DModel.from_pretrained(
-            self.model_name, subfolder="transformer", quantization_config=nf4_config
+            self.model_name, subfolder="transformer", quantization_config=nf4_config, device_map=torch_device


Okay let's do that in a separate PR. I can do that. DO you have a reference?

tests/quantization/bnb/test_4bit.py

sayakpaul · 2025-02-18T01:57:44Z

tests/quantization/torchao/test_torchao.py

+        expected_slice_auto = np.array(
+            [
+                0.34179688,
+                -0.03613281,
+                0.01428223,
+                -0.22949219,
+                -0.49609375,
+                0.4375,
+                -0.1640625,
+                -0.66015625,
+                0.43164062,
+            ]
+        )
+        expected_slice_offload = np.array(
+            [0.34375, -0.03515625, 0.0123291, -0.22753906, -0.49414062, 0.4375, -0.16308594, -0.66015625, 0.43554688]
+        )


Are these changing because of device changes? Cc: @a-r-r-o-w for a double-check.

tests/quantization/torchao/test_torchao.py

Co-authored-by: Sayak Paul <[email protected]>

…odel-loading-refactor

yiyixuxu

thanks!

sayakpaul · 2025-02-19T12:04:50Z

Failing test is unrelated and are being fixed by @hlky!

Thanks a lot @SunMarc!

SunMarc added 3 commits January 17, 2025 22:36

first draft model loading refactor

e54c540

revert name change

645abc9

fix bnb

bd81f50

SunMarc changed the title ~~[FEAT ] Model loading refactor~~ [FEAT] Model loading refactor Jan 17, 2025

SunMarc added 4 commits January 18, 2025 10:57

revert name

17c1be2

fix dduf

72b6259

fix huanyan

b4e4f3b

style

5a00dc6

SunMarc requested review from sayakpaul, DN6, yiyixuxu and a-r-r-o-w January 18, 2025 10:48

Merge branch 'main' into model-loading-refactor

3bcd6cc

sayakpaul reviewed Jan 20, 2025

View reviewed changes

src/diffusers/loaders/single_file_model.py Outdated Show resolved Hide resolved

sayakpaul reviewed Jan 20, 2025

View reviewed changes

sayakpaul mentioned this pull request Jan 20, 2025

FLUX error when loading with low_cpu_mem_usage=False and ignore_mismatched_sizes=True #9343

Open

hlky reviewed Jan 20, 2025

View reviewed changes

SunMarc and others added 3 commits January 20, 2025 16:23

Update src/diffusers/models/model_loading_utils.py

2f671af

Co-authored-by: Sayak Paul <[email protected]>

suggestions from reviews

7273a94

Merge remote-tracking branch 'upstream/model-loading-refactor' into m…

00f0bd1

…odel-loading-refactor

yiyixuxu reviewed Jan 21, 2025

View reviewed changes

src/diffusers/models/modeling_utils.py Outdated Show resolved Hide resolved

yiyixuxu reviewed Jan 21, 2025

View reviewed changes

src/diffusers/models/modeling_utils.py Show resolved Hide resolved

yiyixuxu reviewed Jan 21, 2025

View reviewed changes

src/diffusers/models/modeling_utils.py Outdated Show resolved Hide resolved

SunMarc and others added 3 commits January 21, 2025 10:56

Update src/diffusers/models/modeling_utils.py

c5da192

Co-authored-by: YiYi Xu <[email protected]>

remove safetensors check

039eef5

Merge remote-tracking branch 'upstream/model-loading-refactor' into m…

21f94a1

…odel-loading-refactor

DN6 reviewed Jan 21, 2025

View reviewed changes

src/diffusers/models/modeling_utils.py Outdated Show resolved Hide resolved

src/diffusers/models/modeling_utils.py Show resolved Hide resolved

src/diffusers/models/modeling_utils.py Outdated Show resolved Hide resolved

src/diffusers/models/modeling_utils.py Show resolved Hide resolved

yiyixuxu reviewed Jan 21, 2025

View reviewed changes

src/diffusers/models/modeling_utils.py Show resolved Hide resolved

SunMarc requested review from sayakpaul, yiyixuxu and DN6 February 14, 2025 16:50

SunMarc added 2 commits February 14, 2025 17:53

rename resolved_archive_file to resolved_model_file

d1c4a61

format

bcbd493

DN6 reviewed Feb 17, 2025

View reviewed changes

tests/models/test_modeling_common.py Outdated Show resolved Hide resolved

DN6 approved these changes Feb 17, 2025

View reviewed changes

sayakpaul reviewed Feb 17, 2025

View reviewed changes

SunMarc and others added 4 commits February 17, 2025 15:15

map_location default cpu

0b1e9f5

add utility function

a078342

switch to smaller model + test inference

52c7104

Apply suggestions from code review

28211a4

Co-authored-by: Sayak Paul <[email protected]>

SunMarc requested a review from sayakpaul February 17, 2025 15:18

rm comment

e6a8093

sayakpaul approved these changes Feb 18, 2025

View reviewed changes

SunMarc and others added 5 commits February 18, 2025 11:33

add log

69dda9a

Apply suggestions from code review

68a6211

Co-authored-by: Sayak Paul <[email protected]>

add decorator

c2a72d3

Merge remote-tracking branch 'upstream/model-loading-refactor' into m…

712f6b8

…odel-loading-refactor

cosine sim instead

e00d1c4

sayakpaul mentioned this pull request Feb 18, 2025

Device Combinations Bug of Flux Quantization With Bitsandbytes #10798

Closed

SunMarc added 2 commits February 18, 2025 12:33

fix use_keep_in_fp32_modules

65aec7f

comm

f1138d3

yiyixuxu approved these changes Feb 19, 2025

View reviewed changes

Merge branch 'main' into model-loading-refactor

176d30f

sayakpaul merged commit f5929e0 into main Feb 19, 2025
14 of 15 checks passed

sayakpaul deleted the model-loading-refactor branch February 19, 2025 12:05

guiyrt mentioned this pull request Feb 21, 2025

Unexpected keyword argument device for load_model_dict_into_meta when loading IP-Adapters #10848

Closed

		@@ -1593,18 +1593,9 @@ def create_diffusers_clip_model_from_ldm(
		raise ValueError("The provided checkpoint does not seem to contain a valid CLIP model.")

[FEAT] Model loading refactor #10604

[FEAT] Model loading refactor #10604

Uh oh!

Conversation

SunMarc commented Jan 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Uh oh!

SunMarc commented Jan 18, 2025

Uh oh!

Uh oh!

sayakpaul left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

sayakpaul commented Jan 20, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

SunMarc commented Feb 14, 2025

Uh oh!

Uh oh!

DN6 left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sayakpaul left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SunMarc Feb 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sayakpaul left a comment

Choose a reason for hiding this comment

SunMarc commented Jan 17, 2025 •

edited

Loading

DN6 left a comment •

edited

Loading

SunMarc Feb 18, 2025 •

edited

Loading

sayakpaul commented Feb 19, 2025 •

edited

Loading