-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Quantization] enable multi-backend bitsandbytes #10574
base: main
Are you sure you want to change the base?
Conversation
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
if set(device_map.values()) == {"cpu"} and bnb_multibackend_is_enabled: | ||
pass |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because bnb is supported on intel CPUs?
if "cpu" in device_map_without_no_convert.values() or "disk" in device_map_without_no_convert.values(): | ||
if set(device_map.values()) == {"cpu"} and bnb_multibackend_is_enabled: | ||
pass | ||
elif "cpu" in device_map_without_no_convert.values() or "disk" in device_map_without_no_convert.values(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The common piece of code between the two utilities could be clubbed into a small function and reused?
Previously we didn't do because it was relatively small and was better off in-line.
if "cpu" in device_map_without_no_convert.values() or "disk" in device_map_without_no_convert.values(): | ||
if set(device_map.values()) == {"cpu"} and bnb_multibackend_is_enabled: | ||
pass | ||
elif "cpu" in device_map_without_no_convert.values() or "disk" in device_map_without_no_convert.values(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The common piece of code between the two utilities could be clubbed into a small function and reused?
Previously we didn't do because it was relatively small and was better off in-line.
@@ -183,7 +194,7 @@ def dequantize_bnb_weight(weight: "torch.nn.Parameter", state=None): | |||
if state.CxB is None: | |||
state.CxB, state.SB = bnb.functional.transform(weight.data, to_order=state.formatB) | |||
out32, Sout32 = bnb.functional.igemmlt(im, state.CxB, Sim, state.SB) | |||
return bnb.functional.mm_dequant(out32, Sout32, SCim, state.SCB, bias=None).t() | |||
return bnb.functional.mm_dequant(out32, Sout32, SCim, state.SCB, bias=None).t().to(dtype) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note: #10401
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will rebase after that PR has merged.
@@ -304,3 +318,80 @@ def _check_bnb_status(module) -> Union[bool, bool]: | |||
and getattr(module, "quantization_method", None) == QuantizationMethod.BITS_AND_BYTES | |||
) | |||
return is_loaded_in_4bit_bnb or is_loaded_in_8bit_bnb, is_loaded_in_4bit_bnb, is_loaded_in_8bit_bnb | |||
|
|||
|
|||
def _validate_bnb_multi_backend_availability(raise_exception): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@matthewdouglas I wonder if it makes sense have these as utility functions in bitsandbytes
so that they can be reused in transformers
and diffusers
(and any other libraries)?
return True | ||
|
||
|
||
@lru_cache |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We usually don't do lru_cache
in import_utils.py
. Any specific reasons?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Copied from transformers
, not sure on the context.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left some comments but this is already very good!
May need to look at
Anything specific? Not seeing anything CUDA-specific.
Were the tests run on the |
What does this PR do?
Mainly copied from
transformers
PR.May need to look at
diffusers/src/diffusers/pipelines/pipeline_utils.py
Lines 450 to 467 in fbff43a
Test results are same as nightly https://github.com/huggingface/diffusers/actions/runs/12758480172/job/35560601164
Fixes #10395
Who can review?
Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.
@sayakpaul