[FEAT] Model loading refactor #10604

SunMarc · 2025-01-17T23:48:24Z

What does this PR do?

Fixes #10013 . This PR refactors model loading in diffusers. Here's a list of major changes in this PR.

only two loading paths (low_cpu_mem_usage=True and low_cpu_mem_usage = False). We don't rely on load_checkpoint_and_dispatch anymore and we don't merge sharded checkpoint also.
support for sharded checkpoints for both loading paths
keep_module_in_fp32 support for sharded checkpoints
better support for displaying warning due to error/unexpected/missing/mismatched keys

For low_cpu_mem_usage = False:

Faster initialization (thanks to skipping the init + assign_to_params_buffers). I didn't benchmarked it but it should be as fast as low_cpu_mem_usage=True or maybe even faster. We did a similar PR in transformers thanks to @muellerzr.
Better torch_dtype support We don't initialize anymore the model in fp32 then cast the model to a specific dtype after finishing to load the weights.

For low_cpu_mem_usage = True or device_map!=None:

one path, we don't rely anymore on load_checkpoint_and_dispatch
device_map support for quantization
non persistance buffer support through dispatch_model ( the test you added is passing cc @hlky )

Single format file:

Simplified the single file format loading through from_pretrained. This way we have the same features as this function (device_map, quantization ...). Feel free to share your opinion @DN6, I didn't expect to touch this but I felt that we could simplify a bit

TODO (some items can be done in follow-up PRs):

Check if we have any regression / tests issues
Add more tests
Deal with missing keys in the model for both paths (before, it only worked when low_cpu_mem_usage=False since we are initializing the whole model)
Fix typing
Better support for offload with safetensors (like in transformers)

Please let me know your thoughts on the PR !

cc @sayakpaul, @DN6 , @yiyixuxu , @hlky , @a-r-r-o-w

SunMarc · 2025-01-18T10:54:55Z

FLAX CPU failing test is unrelated, failing in other PRs too

sayakpaul · 2025-01-20T03:32:09Z

src/diffusers/loaders/single_file_model.py

 from huggingface_hub.utils import validate_hf_hub_args

-from ..quantizers import DiffusersAutoQuantizer
-from ..utils import deprecate, is_accelerate_available, logging
+from ..utils import deprecate, logging


Will let @DN6 comment on the single-file related changes.

sayakpaul

Thanks for starting this! Left some comments from a first pass.

I think we will need to also add tests for seeing if device_map works as expected for quantization. Okay to not test that a bit later once there is consensus about the design changes. Maybe we could add that as a TODO.

Other tests could include checking if we can do low_cpu_mem_usage=True along with some changed config values. This will ensure we're well tested for cases like #9343.

src/diffusers/models/model_loading_utils.py

sayakpaul · 2025-01-20T03:34:28Z

src/diffusers/models/model_loading_utils.py

@@ -134,15 +135,14 @@ def _fetch_remapped_cls_from_config(config, old_class):

 def load_state_dict(
    checkpoint_file: Union[str, os.PathLike],
-    variant: Optional[str] = None,


variant isn't used anyway in this method, so good for me.

But let's make sure the method is invoked properly with proper arguments.

src/diffusers/models/model_loading_utils.py

sayakpaul · 2025-01-20T05:29:09Z

src/diffusers/models/modeling_utils.py

+        logger.info(f"Instantiating {cls.__name__} model under default dtype {dtype}.")
+        dtype_orig = torch.get_default_dtype()
+        torch.set_default_dtype(dtype)


Have we fully considered the consequences of this especially under things like "layerwise upcasting"? (see #10347)

import torch torch.set_default_dtype(torch.float8_e4m3fn) # TypeError: couldn't find storage object Float8_e4m3fnStorage

Thanks for the feedback ! For torch.float8_e4m3fn dtype, we can just make an exception for this dtype and skip that.

it doesn't look like the layerwise upcasting PR passes torch_dtype = torch.float8_e4m3fn in from_pretrained. cc @a-r-r-o-w . LMK if I should still take care of this case or we can deal with that in a follow-up PR when we need that.

If you perhaps run the test_layerwise_casting_inference tests to confirm, that would be great.

all 37 tests are passing !

sayakpaul · 2025-01-20T05:35:20Z

src/diffusers/models/modeling_utils.py

-                        for pat in cls._keys_to_ignore_on_load_unexpected:
-                            unexpected_keys = [k for k in unexpected_keys if re.search(pat, k) is None]
+        if dtype_orig is not None:
+            torch.set_default_dtype(dtype_orig)


_set_default_torch_dtype() already calls set_default_dtype(), is that still needed here?

It is to set back the default dtype to the original dtype dtype_orig. This way, if the user continue to create tensors, it will be back to the default dtype they are expected e.g. FP32

src/diffusers/models/modeling_utils.py

src/diffusers/quantizers/bitsandbytes/bnb_quantizer.py

sayakpaul · 2025-01-20T06:37:59Z

@SunMarc,

Additionally, I ran some tests on audace (two RTX 4090s). Some tests that are failing (they fail on main too):

Failures

FAILED tests/models/unets/test_models_unet_2d_condition.py::UNet2DConditionModelTests::test_load_sharded_checkpoint_from_hub_0_hf_internal_testing_unet2d_sharded_dummy - RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cuda:1! (when checking argumen...
FAILED tests/models/unets/test_models_unet_2d_condition.py::UNet2DConditionModelTests::test_load_sharded_checkpoint_from_hub_1_hf_internal_testing_tiny_sd_unet_sharded_latest_format - RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cuda:1! (when checking argumen...
FAILED tests/models/unets/test_models_unet_2d_condition.py::UNet2DConditionModelTests::test_load_sharded_checkpoint_from_hub_local - RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cuda:1! (when checking argumen...
FAILED tests/models/unets/test_models_unet_2d_condition.py::UNet2DConditionModelTests::test_load_sharded_checkpoint_from_hub_local_subfolder - RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cuda:1! (when checking argumen...
FAILED tests/models/unets/test_models_unet_2d_condition.py::UNet2DConditionModelTests::test_load_sharded_checkpoint_from_hub_subfolder_0_hf_internal_testing_unet2d_sharded_dummy_subfolder - RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cuda:1! (when checking argumen...
FAILED tests/models/unets/test_models_unet_2d_condition.py::UNet2DConditionModelTests::test_load_sharded_checkpoint_from_hub_subfolder_1_hf_internal_testing_tiny_sd_unet_sharded_latest_format_subfolder - RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cuda:1! (when checking argumen...
FAILED tests/models/unets/test_models_unet_2d_condition.py::UNet2DConditionModelTests::test_sharded_checkpoints - RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cuda:1! (when checking argumen...
FAILED tests/models/unets/test_models_unet_2d_condition.py::UNet2DConditionModelTests::test_sharded_checkpoints_device_map - RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cuda:1! (when checking argumen...
FAILED tests/models/unets/test_models_unet_2d_condition.py::UNet2DConditionModelTests::test_sharded_checkpoints_with_variant - RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cuda:1! (when checking argument...

^^ passes when using with CUDA_VISIBLE_DEVICES=0 (same with main). Expected?

Same for following:

FAILED tests/models/unets/test_models_unet_2d_condition.py::UNet2DConditionModelTests::test_sharded_checkpoints_device_map - RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cuda:1!

And then I also ran:

RUN_SLOW=1 pytest tests/pipelines/stable_diffusion/test_stable_diffusion.py::StableDiffusionPipelineDeviceMapTests

Everything passes.

hlky · 2025-01-20T07:58:35Z

src/diffusers/models/model_loading_utils.py


-    for param_name, param in named_buffers:


We need to keep this or equivalent elsewhere, context: #10523

The changes I did should also cover this use case. The test you added should pass with my PR. The is mainly due to adding the dispatch_model function at the end.

hlky · 2025-01-20T08:03:52Z

src/diffusers/models/modeling_utils.py

+        logger.info(f"Instantiating {cls.__name__} model under default dtype {dtype}.")
+        dtype_orig = torch.get_default_dtype()
+        torch.set_default_dtype(dtype)


import torch torch.set_default_dtype(torch.float8_e4m3fn) # TypeError: couldn't find storage object Float8_e4m3fnStorage

Co-authored-by: Sayak Paul <[email protected]>

…odel-loading-refactor

src/diffusers/models/modeling_utils.py

Co-authored-by: YiYi Xu <[email protected]>

…odel-loading-refactor

src/diffusers/models/modeling_utils.py

DN6 · 2025-01-21T03:10:16Z

src/diffusers/models/modeling_utils.py

 logger = logging.get_logger(__name__)

 _REGEX_SHARD = re.compile(r"(.*?)-\d{5}-of-\d{5}")

+TORCH_INIT_FUNCTIONS = {


Not a merge blocker, but is it possible to dynamically create this mapping? Then we could avoid having to make manual updates in case new inits are added to torch.

Although I suppose that doesn't happen too often.

Something like this could work:

import torch.nn.init as init init_functions = { name: getattr(init, name) for name in dir(init) if callable(getattr(init, name)) and name.endswith("_") and not name.startswith("_") } print("Available initialization functions:") for name in init_functions: print(name)

Prints:

Available initialization functions: constant_ dirac_ eye_ kaiming_normal_ kaiming_uniform_ normal_ ones_ orthogonal_ sparse_ trunc_normal_ uniform_ xavier_normal_ xavier_uniform_ zeros_

WDYT @DN6 ? I'm fine with either. Also there are some missing function since you only choose the one finishing with "_", though I don't think these are deprecated now.

"uniform": nn.init.uniform, "normal": nn.init.normal, "xavier_uniform": nn.init.xavier_uniform, "xavier_normal": nn.init.xavier_normal, "kaiming_uniform": nn.init.kaiming_uniform, "kaiming_normal": nn.init.kaiming_normal,

DN6 · 2025-01-21T05:28:51Z

src/diffusers/models/modeling_utils.py

-            # in the case it is sharded, we have already the index
-            if is_sharded:
-                sharded_ckpt_cached_folder, sharded_metadata = _get_checkpoint_shard_files(
+        resolved_archive_file = None


Perhaps resolved_model_file can work here since most of the time this variable is used with _get_model_file?

DN6 · 2025-01-21T12:01:49Z

src/diffusers/models/modeling_utils.py

-                    dduf_entries=dduf_entries,
+        # set dtype to instantiate the model under:
+        # 1. If torch_dtype is not None, we use that dtype
+        dtype_orig = None


There is a use case where we might want to support loading checkpoints that are in mixed precision. e.g The Mochi Video model needs to preserve norms in FP32 (we can't load in FP16/BF16 and then cast back to FP32 with _keep_in_fp32_modules)
https://huggingface.co/Kijai/Mochi_preview_comfy/blob/main/mochi_preview_dit_fp8_e4m3fn.safetensors

We were thinking of introducing an auto dtype for such cases.

Additionally, torch FP8 is a valid and popular storage type in the Diffusion community that is dynamically upcast during inference time (a feature we will add soon).
https://github.com/huggingface/diffusers/pull/10347/files

I think this might break if a user tries something like `.from_pretrained(.., torch_dtype=torch.float8_e4m3fn), which would be a breaking change for us.

Think we need to update the casting here to account for these cases.

Additionally, torch FP8 is a valid and popular storage type in the Diffusion community that is dynamically upcast during inference time (a feature we will add soon).
https://github.com/huggingface/diffusers/pull/10347/files

I think this might break if a user tries something like `.from_pretrained(.., torch_dtype=torch.float8_e4m3fn), which would be a breaking change for us.

Think we need to update the casting here to account for these cases.

Yes, I will update the code to reflect this.

There is a use case where we might want to support loading checkpoints that are in mixed precision. e.g The Mochi Video model needs to preserve norms in FP32 (we can't load in FP16/BF16 and then cast back to FP32 with _keep_in_fp32_modules)
https://huggingface.co/Kijai/Mochi_preview_comfy/blob/main/mochi_preview_dit_fp8_e4m3fn.safetensors

Under low_cpu_mem_usage = True, it won't load the the model in FP16/BF16 then cast it back to FP32. With
_keep_in_fp32_modules, we should be able to make sure that the param stays in FP32.

Of course, if we have more complicated use case where the params are a mix of many stype then it would make sense to introduce the dtype auto, so that we use the dtype of the state dict.

We can introduce dtype="auto" after this PR is merged. Just wanted to flag

As said here, I don't think we need to change anything yet cc @a-r-r-o-w

src/diffusers/models/modeling_utils.py

sayakpaul

Some more comments.

I am running the 4bit quantization tests currently. And so far things are looking nice! Some tests that might be worth including/consdering:

Device map with quantization
Effectiveness of keep_modules_in_fp32 when not using quantization.

WDYT?

Edit: 4bit and 8bit tests (bitsandbytes) are passing.

sayakpaul · 2025-01-27T13:07:53Z

src/diffusers/loaders/single_file_model.py

@@ -362,17 +362,18 @@ def from_single_file(cls, pretrained_model_link_or_path_or_dict: Optional[str] =

        if is_accelerate_available():
            param_device = torch.device(device) if device else torch.device("cpu")
-            named_buffers = model.named_buffers()
-            unexpected_keys = load_model_dict_into_meta(
+            unexpected_keys = [


Are the single-file related changes to uniformize the use of load_model_dict_into_meta() (with the new signature)?

yeah that's right !

src/diffusers/models/model_loading_utils.py

sayakpaul · 2025-01-27T13:17:35Z

src/diffusers/models/model_loading_utils.py

-    if named_buffers is None:
-        return unexpected_keys


Maybe I am missing something but I couldn't spot dispatch_model() in https://github.com/huggingface/diffusers/blob/model-loading-refactor/src/diffusers/models/model_loading_utils.py.

sayakpaul · 2025-01-27T13:20:56Z

src/diffusers/models/model_loading_utils.py

-    if named_buffers is None:
-        return unexpected_keys


Nevermind, found it:

diffusers/src/diffusers/models/modeling_utils.py

Line 1144 in 18d61bb

dispatch_model(model, **device_map_kwargs)

It's a tad bit easier for reviewers if we could just provide these links going forward.

sayakpaul · 2025-01-27T13:25:03Z

src/diffusers/models/modeling_utils.py

 logger = logging.get_logger(__name__)

 _REGEX_SHARD = re.compile(r"(.*?)-\d{5}-of-\d{5}")

+TORCH_INIT_FUNCTIONS = {


Something like this could work:

import torch.nn.init as init init_functions = { name: getattr(init, name) for name in dir(init) if callable(getattr(init, name)) and name.endswith("_") and not name.startswith("_") } print("Available initialization functions:") for name in init_functions: print(name)

Prints:

Available initialization functions: constant_ dirac_ eye_ kaiming_normal_ kaiming_uniform_ normal_ ones_ orthogonal_ sparse_ trunc_normal_ uniform_ xavier_normal_ xavier_uniform_ zeros_

sayakpaul · 2025-01-27T13:26:18Z

src/diffusers/models/modeling_utils.py

+
+
+@contextmanager
+def no_init_weights():


Could you briefly then elaborate what happens in this codepath?

sayakpaul · 2025-01-27T13:28:28Z

src/diffusers/models/modeling_utils.py

+        logger.info(f"Instantiating {cls.__name__} model under default dtype {dtype}.")
+        dtype_orig = torch.get_default_dtype()
+        torch.set_default_dtype(dtype)


If you perhaps run the test_layerwise_casting_inference tests to confirm, that would be great.

sayakpaul · 2025-01-27T13:29:36Z

tests/models/test_modeling_common.py

-    def test_accelerate_loading_error_message(self):
-        with self.assertRaises(ValueError) as error_context:
+    def test_missing_key_loading_warning_message(self):
+        with self.assertLogs("diffusers.models.modeling_utils", level="WARNING") as logs:
            UNet2DConditionModel.from_pretrained("hf-internal-testing/stable-diffusion-broken", subfolder="unet")

        # make sure that error message states what keys are missing
-        assert "conv_out.bias" in str(error_context.exception)
+        assert "conv_out.bias" in " ".join(logs.output)


Explain the changes?

I switched from raising an error to just a warning for missing keys.

HuggingFaceDocBuilderDev · 2025-01-27T13:42:41Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

SunMarc added 3 commits January 17, 2025 22:36

first draft model loading refactor

e54c540

revert name change

645abc9

fix bnb

bd81f50

SunMarc changed the title ~~[FEAT ] Model loading refactor~~ [FEAT] Model loading refactor Jan 17, 2025

SunMarc added 4 commits January 18, 2025 10:57

revert name

17c1be2

fix dduf

72b6259

fix huanyan

b4e4f3b

style

5a00dc6

SunMarc requested review from sayakpaul, DN6, yiyixuxu and a-r-r-o-w January 18, 2025 10:48

Merge branch 'main' into model-loading-refactor

3bcd6cc

sayakpaul reviewed Jan 20, 2025

View reviewed changes

sayakpaul mentioned this pull request Jan 20, 2025

FLUX error when loading with low_cpu_mem_usage=False and ignore_mismatched_sizes=True #9343

Open

hlky reviewed Jan 20, 2025

View reviewed changes

SunMarc and others added 3 commits January 20, 2025 16:23

Update src/diffusers/models/model_loading_utils.py

2f671af

Co-authored-by: Sayak Paul <[email protected]>

suggestions from reviews

7273a94

Merge remote-tracking branch 'upstream/model-loading-refactor' into m…

00f0bd1

…odel-loading-refactor

yiyixuxu reviewed Jan 21, 2025

View reviewed changes

src/diffusers/models/modeling_utils.py Outdated Show resolved Hide resolved

yiyixuxu reviewed Jan 21, 2025

View reviewed changes

src/diffusers/models/modeling_utils.py Show resolved Hide resolved

yiyixuxu reviewed Jan 21, 2025

View reviewed changes

src/diffusers/models/modeling_utils.py Outdated Show resolved Hide resolved

SunMarc and others added 3 commits January 21, 2025 10:56

Update src/diffusers/models/modeling_utils.py

c5da192

Co-authored-by: YiYi Xu <[email protected]>

remove safetensors check

039eef5

Merge remote-tracking branch 'upstream/model-loading-refactor' into m…

21f94a1

…odel-loading-refactor

DN6 reviewed Jan 21, 2025

View reviewed changes

yiyixuxu reviewed Jan 21, 2025

View reviewed changes

src/diffusers/models/modeling_utils.py Show resolved Hide resolved

SunMarc added 6 commits January 23, 2025 15:47

fix default value

337b2fc

Merge remote-tracking branch 'upstream/main' into model-loading-refactor

aedf6af

more fix from suggestions

0df7010

revert logic for single file

d3a7dc8

style

fc4af16

Merge remote-tracking branch 'upstream/main' into model-loading-refactor

18d61bb

sayakpaul reviewed Jan 27, 2025

View reviewed changes

Merge branch 'main' into model-loading-refactor

26228eb

yiyixuxu mentioned this pull request Jan 29, 2025

possibly to avoid from_single_file loading in fp32 to save RAM #10679

Open

[FEAT] Model loading refactor #10604

Are you sure you want to change the base?

[FEAT] Model loading refactor #10604

Conversation

SunMarc commented Jan 17, 2025 • edited Loading

What does this PR do?

SunMarc commented Jan 18, 2025

Choose a reason for hiding this comment

sayakpaul left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sayakpaul commented Jan 20, 2025

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

SunMarc Jan 27, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

SunMarc Jan 21, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sayakpaul left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

SunMarc Jan 27, 2025 • edited Loading

Choose a reason for hiding this comment

HuggingFaceDocBuilderDev commented Jan 27, 2025

SunMarc commented Jan 17, 2025 •

edited

Loading

SunMarc Jan 27, 2025 •

edited

Loading

SunMarc Jan 21, 2025 •

edited

Loading

sayakpaul left a comment •

edited

Loading

SunMarc Jan 27, 2025 •

edited

Loading