[Quantization] enable multi-backend bitsandbytes #10574

hlky · 2025-01-14T11:03:19Z

What does this PR do?

Mainly copied from transformers PR.

May need to look at

diffusers/src/diffusers/pipelines/pipeline_utils.py

Lines 450 to 467 in fbff43a

    
           _, is_loaded_in_4bit_bnb, is_loaded_in_8bit_bnb = _check_bnb_status(module) 
        
           if (is_loaded_in_4bit_bnb or is_loaded_in_8bit_bnb) and dtype is not None: 
        
               logger.warning( 
        
                   f"The module '{module.__class__.__name__}' has been loaded in `bitsandbytes` {'4bit' if is_loaded_in_4bit_bnb else '8bit'} and conversion to {dtype} is not supported. Module is still in {'4bit' if is_loaded_in_4bit_bnb else '8bit'} precision." 
        
               ) 
        
           if is_loaded_in_8bit_bnb and device is not None: 
        
               logger.warning( 
        
                   f"The module '{module.__class__.__name__}' has been loaded in `bitsandbytes` 8bit and moving it to {device} via `.to()` is not supported. Module is still on {module.device}." 
        
               ) 
        
           # This can happen for `transformer` models. CPU placement was added in 
        
           # https://github.com/huggingface/transformers/pull/33122. So, we guard this accordingly. 
        
           if is_loaded_in_4bit_bnb and device is not None and is_transformers_version(">", "4.44.0"): 
        
               module.to(device=device) 
        
           elif not is_loaded_in_4bit_bnb and not is_loaded_in_8bit_bnb: 
        
               module.to(device, dtype)

Test results are same as nightly https://github.com/huggingface/diffusers/actions/runs/12758480172/job/35560601164

RUN_SLOW=1 pytest -v -s tests/quantization/bnb/
===================================================================================== short test summary info =====================================================================================
FAILED tests/quantization/bnb/test_mixed_int8.py::SlowBnb8bitTests::test_generate_quality_dequantize - NotImplementedError: Only row-major format inputs are supported, but got format `col32`
FAILED tests/quantization/bnb/test_mixed_int8.py::SlowBnb8bitTests::test_quality - AssertionError: False is not true
===================================================================== 2 failed, 42 passed, 26 warnings in 1601.02s (0:26:41) ======================================================================

Fixes #10395

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

@sayakpaul

HuggingFaceDocBuilderDev · 2025-01-14T11:10:02Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

sayakpaul · 2025-01-14T15:45:02Z

src/diffusers/quantizers/bitsandbytes/bnb_quantizer.py

+            if set(device_map.values()) == {"cpu"} and bnb_multibackend_is_enabled:
+                pass


Because bnb is supported on intel CPUs?

sayakpaul · 2025-01-14T15:46:29Z

src/diffusers/quantizers/bitsandbytes/bnb_quantizer.py

-            if "cpu" in device_map_without_no_convert.values() or "disk" in device_map_without_no_convert.values():
+            if set(device_map.values()) == {"cpu"} and bnb_multibackend_is_enabled:
+                pass
+            elif "cpu" in device_map_without_no_convert.values() or "disk" in device_map_without_no_convert.values():


The common piece of code between the two utilities could be clubbed into a small function and reused?

Previously we didn't do because it was relatively small and was better off in-line.

sayakpaul · 2025-01-14T15:46:38Z

src/diffusers/quantizers/bitsandbytes/bnb_quantizer.py

-            if "cpu" in device_map_without_no_convert.values() or "disk" in device_map_without_no_convert.values():
+            if set(device_map.values()) == {"cpu"} and bnb_multibackend_is_enabled:
+                pass
+            elif "cpu" in device_map_without_no_convert.values() or "disk" in device_map_without_no_convert.values():


The common piece of code between the two utilities could be clubbed into a small function and reused?

Previously we didn't do because it was relatively small and was better off in-line.

sayakpaul · 2025-01-14T15:47:09Z

src/diffusers/quantizers/bitsandbytes/utils.py

@@ -183,7 +194,7 @@ def dequantize_bnb_weight(weight: "torch.nn.Parameter", state=None):
    if state.CxB is None:
        state.CxB, state.SB = bnb.functional.transform(weight.data, to_order=state.formatB)
    out32, Sout32 = bnb.functional.igemmlt(im, state.CxB, Sim, state.SB)
-    return bnb.functional.mm_dequant(out32, Sout32, SCim, state.SCB, bias=None).t()
+    return bnb.functional.mm_dequant(out32, Sout32, SCim, state.SCB, bias=None).t().to(dtype)


Note: #10401

Will rebase after that PR has merged.

sayakpaul · 2025-01-14T15:48:37Z

src/diffusers/quantizers/bitsandbytes/utils.py

@@ -304,3 +318,80 @@ def _check_bnb_status(module) -> Union[bool, bool]:
        and getattr(module, "quantization_method", None) == QuantizationMethod.BITS_AND_BYTES
    )
    return is_loaded_in_4bit_bnb or is_loaded_in_8bit_bnb, is_loaded_in_4bit_bnb, is_loaded_in_8bit_bnb
+
+
+def _validate_bnb_multi_backend_availability(raise_exception):


@matthewdouglas I wonder if it makes sense have these as utility functions in bitsandbytes so that they can be reused in transformers and diffusers (and any other libraries)?

sayakpaul · 2025-01-14T15:52:02Z

src/diffusers/utils/import_utils.py

+    return True
+
+
+@lru_cache


We usually don't do lru_cache in import_utils.py. Any specific reasons?

Copied from transformers, not sure on the context.

sayakpaul

Left some comments but this is already very good!

May need to look at

Anything specific? Not seeing anything CUDA-specific.

sayakpaul · 2025-01-14T15:56:27Z

Test results are same as nightly

Were the tests run on the aws-g6e-xlarge-plus runner? If so, tests/quantization/bnb/test_mixed_int8.py::SlowBnb8bitTests::test_quality should have passed. Will take a look.

[Quantization] enable multi-backend bitsandbytes

2048646

sayakpaul reviewed Jan 14, 2025

View reviewed changes

sayakpaul requested a review from matthewdouglas January 14, 2025 15:53

Merge branch 'main' into bnb-multi-backend

0d39ea4

Merge branch 'main' into bnb-multi-backend

1079d18

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Quantization] enable multi-backend bitsandbytes #10574

[Quantization] enable multi-backend bitsandbytes #10574

hlky commented Jan 14, 2025

HuggingFaceDocBuilderDev commented Jan 14, 2025

sayakpaul Jan 14, 2025

sayakpaul Jan 14, 2025

sayakpaul Jan 14, 2025

sayakpaul Jan 14, 2025

hlky Jan 14, 2025

sayakpaul Jan 14, 2025

sayakpaul Jan 14, 2025

hlky Jan 14, 2025

sayakpaul left a comment •

edited

Loading

sayakpaul commented Jan 14, 2025

	_, is_loaded_in_4bit_bnb, is_loaded_in_8bit_bnb = _check_bnb_status(module)

	if (is_loaded_in_4bit_bnb or is_loaded_in_8bit_bnb) and dtype is not None:
	logger.warning(
	f"The module '{module.__class__.__name__}' has been loaded in `bitsandbytes` {'4bit' if is_loaded_in_4bit_bnb else '8bit'} and conversion to {dtype} is not supported. Module is still in {'4bit' if is_loaded_in_4bit_bnb else '8bit'} precision."
	)

	if is_loaded_in_8bit_bnb and device is not None:
	logger.warning(
	f"The module '{module.__class__.__name__}' has been loaded in `bitsandbytes` 8bit and moving it to {device} via `.to()` is not supported. Module is still on {module.device}."
	)

	# This can happen for `transformer` models. CPU placement was added in
	# https://github.com/huggingface/transformers/pull/33122. So, we guard this accordingly.
	if is_loaded_in_4bit_bnb and device is not None and is_transformers_version(">", "4.44.0"):
	module.to(device=device)
	elif not is_loaded_in_4bit_bnb and not is_loaded_in_8bit_bnb:
	module.to(device, dtype)

		if set(device_map.values()) == {"cpu"} and bnb_multibackend_is_enabled:
		pass

[Quantization] enable multi-backend bitsandbytes #10574

Are you sure you want to change the base?

[Quantization] enable multi-backend bitsandbytes #10574

Conversation

hlky commented Jan 14, 2025

What does this PR do?

Who can review?

HuggingFaceDocBuilderDev commented Jan 14, 2025

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sayakpaul left a comment • edited Loading

Choose a reason for hiding this comment

sayakpaul commented Jan 14, 2025

sayakpaul left a comment •

edited

Loading