Safe serialization #116

dacorvo · 2024-03-12T16:45:19Z

This extends the QModuleMixin class to support two new serialization mode (in addition to the default Pickle mode):

pytorch weight_only serialization (fixes Incompatible with torch.load(...,weights_only=True) #93),
safetensors serialization (fixes Incompatible with safetensors serialization #100).

The docstring also explains why this method is required.

Now that activations are always quantized per-tensor, the shape of the input and output scales is always a scalar.

This will help serialization.

We make sure the parameters are serialized as string to be compatible with the stricter serialization formats.

This will allow to support other serializers than Pickle. This also removes the need to force assignment when loading a state_dict.

Flattened tensors meta is serialized using only strings, which are compatible with Abstract Syntax Trees evluation for converting back to python generic types. The only exception is qtype, for which we use a custom helper. This makes QModuleMixin serialization compatible with the weights_only parameter of torch.load. Note that this does NOT make QTensor compatible with safe serialization, as we don't override the pickle code used when directly saving QTensor and it uses its own representation of flattened tensors using non-primitive types.

dacorvo force-pushed the flattened_state_dict branch from 3076298 to 0267158 Compare March 12, 2024 16:47

dacorvo requested review from younesbelkada and SunMarc March 12, 2024 16:48

dacorvo force-pushed the flattened_state_dict branch 2 times, most recently from 81c79a9 to 54b62ab Compare March 12, 2024 16:58

dacorvo added 8 commits March 12, 2024 18:07

test(mlp): fix device memory test

23f013e

refactor(QModuleMixin): homogeneize parameters

6f031e4

refactor(QModuleMixin): qweight is now a property

137f133

The docstring also explains why this method is required.

refactor(QModuleMixin): remove state_dict hook

23a800c

Now that activations are always quantized per-tensor, the shape of the input and output scales is always a scalar.

feat(qtype): add helper to get a qtype by name

64f0167

This will help serialization.

feat(mnist): add float8 activations to example

092aec3

fix(QModuleMixin): add quantization parameters to state_dict

42e9514

We make sure the parameters are serialized as string to be compatible with the stricter serialization formats.

fix(PackedTensor): typo in flatten

8ca8eab

dacorvo force-pushed the flattened_state_dict branch from 54b62ab to 52a9455 Compare March 12, 2024 17:07

dacorvo added 5 commits March 13, 2024 08:59

feat(QModuleMixin): flatten tensors on serialization

c71a61f

This will allow to support other serializers than Pickle. This also removes the need to force assignment when loading a state_dict.

test(mlp): add safetensors serialization

27cc806

fix(examples): remove explicit state_dict assignments

9a4e7b9

doc: update README

55e15b0

dacorvo force-pushed the flattened_state_dict branch from 52a9455 to 55e15b0 Compare March 13, 2024 08:10

dacorvo merged commit 661336d into main Mar 13, 2024
3 checks passed

dacorvo deleted the flattened_state_dict branch March 13, 2024 08:14

SunMarc mentioned this pull request Mar 14, 2024

[Quantization] Quanto quantizer huggingface/transformers#29023

Merged

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Safe serialization #116

Safe serialization #116

dacorvo commented Mar 12, 2024 •

edited

Loading

Safe serialization #116

Safe serialization #116

Conversation

dacorvo commented Mar 12, 2024 • edited Loading

dacorvo commented Mar 12, 2024 •

edited

Loading