Compression Ratio #1

NicoNico6 · 2023-10-11T08:27:17Z

Really solid work!

May I ask what the actual compressed model size is, considering that it is a partial binarization way and there are some 8-bit parameters inside each weight matrix? Can we compress the model using techniques like bitpacking?

hahnyuan · 2024-01-04T08:57:31Z

Apologies for the delayed response.

Regarding your inquiry, for the 1-bit weights, we indeed use the packed format. As for the 8-bit parameters within each weight matrix, due to their sparse nature with a low percentage of density, conventional techniques like CSR may not be the most suitable. Currently, we are exploring the modified run-length encoding (RLE) to achieve an efficient compression ratio for the 8-bit sparse data.

In our modified RLE, each 8-bit data point is represented by a pair of values: the actual 8-bit data and the count of consecutive occurrences of leading zeros. For example,
original sequence: 0 0 0 0 0 0 5 0 0 1.
RLE representation: (6, 2) (5, 1).

Considering the storage cost, the RLE representation typically involves storing the value and count as pairs, and each pair might require 12 bits (8 bits for the value and 4 bits for the count).

For 10% outlier, if we quantize the weights to 8-bit, the average bits for each value is 1+(8+4)*0.1=2.2 bits (compression ratio=1-2.2/16=86.3%). If we quantize the weights to 4-bit, the average bits for each value can be reduced to 1+(4+4)*0.1=1.8 bits (compression ratio=1-1.8/16=88.8%).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Compression Ratio #1

Compression Ratio #1

NicoNico6 commented Oct 11, 2023

hahnyuan commented Jan 4, 2024 •

edited

Loading

Compression Ratio #1

Compression Ratio #1

Comments

NicoNico6 commented Oct 11, 2023

hahnyuan commented Jan 4, 2024 • edited Loading

hahnyuan commented Jan 4, 2024 •

edited

Loading