You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
May I ask what the actual compressed model size is, considering that it is a partial binarization way and there are some 8-bit parameters inside each weight matrix? Can we compress the model using techniques like bitpacking?
The text was updated successfully, but these errors were encountered:
Regarding your inquiry, for the 1-bit weights, we indeed use the packed format. As for the 8-bit parameters within each weight matrix, due to their sparse nature with a low percentage of density, conventional techniques like CSR may not be the most suitable. Currently, we are exploring the modified run-length encoding (RLE) to achieve an efficient compression ratio for the 8-bit sparse data.
In our modified RLE, each 8-bit data point is represented by a pair of values: the actual 8-bit data and the count of consecutive occurrences of leading zeros. For example,
original sequence: 0 0 0 0 0 0 5 0 0 1.
RLE representation: (6, 2) (5, 1).
Considering the storage cost, the RLE representation typically involves storing the value and count as pairs, and each pair might require 12 bits (8 bits for the value and 4 bits for the count).
For 10% outlier, if we quantize the weights to 8-bit, the average bits for each value is 1+(8+4)*0.1=2.2 bits (compression ratio=1-2.2/16=86.3%). If we quantize the weights to 4-bit, the average bits for each value can be reduced to 1+(4+4)*0.1=1.8 bits (compression ratio=1-1.8/16=88.8%).
Really solid work!
May I ask what the actual compressed model size is, considering that it is a partial binarization way and there are some 8-bit parameters inside each weight matrix? Can we compress the model using techniques like bitpacking?
The text was updated successfully, but these errors were encountered: