LLM optimization documentation fixes and updates. #28212

AlexKoff88 · 2024-12-26T12:32:53Z

No description provided.

AlexKoff88 · 2025-01-21T07:50:10Z

@kblaszczak-intel, @tsavina, can you take a look?

kblaszczak-intel · 2025-01-31T12:51:05Z

docs/articles_en/openvino-workflow/model-optimization-guide/weight-compression.rst

-For instance the 7 billion parameter Llama 2 model can be reduced
-from about 25GB to 4GB using 4-bit weight compression.
+For instance the 8 billion parameter Llama 3 model can be reduced
+from about 16.1 GB to 4.8 GB using 4-bit weight quantization on top of bfloat16 model.


Suggested change

from about 16.1 GB to 4.8 GB using 4-bit weight quantization on top of bfloat16 model.

from about 16.1 GB to 4.8 GB using 4-bit weight quantization on top of a bfloat16 model.

kblaszczak-intel · 2025-01-31T12:52:40Z

docs/articles_en/openvino-workflow/model-optimization-guide/weight-compression.rst

   compression may result in more accuracy reduction than with larger models.
-   Therefore, weight compression is recommended for use with LLMs only.

 LLMs and other GenAI models that require


Suggested change

LLMs and other GenAI models that require

LLMs and other generative AI models that require

kblaszczak-intel · 2025-01-31T12:59:38Z

.../openvino-workflow/model-optimization-guide/weight-compression/4-bit-weight-quantization.rst

+   NNCF allows stacking the supported optimization methods. For example, AWQ, Scale Estimation
+   and GPTQ methods can be enabled all together to achieve better accuracy.


Suggested change

NNCF allows stacking the supported optimization methods. For example, AWQ, Scale Estimation

and GPTQ methods can be enabled all together to achieve better accuracy.

NNCF enables you to stack the supported optimization methods. For example, AWQ, Scale Estimation

and GPTQ methods may be enabled all together to achieve better accuracy.

LLM optimization documentation fixes and updates.

4291306

AlexKoff88 requested a review from a team as a code owner December 26, 2024 12:32

AlexKoff88 requested review from kblaszczak-intel and tsavina and removed request for a team December 26, 2024 12:32

github-actions bot added the category: docs OpenVINO documentation label Dec 26, 2024

AlexKoff88 and others added 2 commits December 27, 2024 12:13

Fixed issue

0b53045

Merge branch 'master' into ak/llm_docs_fixes

6b82b56

Merge branch 'master' into ak/llm_docs_fixes

6ec0f7d

AlexKoff88 enabled auto-merge January 31, 2025 06:34

kblaszczak-intel approved these changes Jan 31, 2025

View reviewed changes

AlexKoff88 added this pull request to the merge queue Jan 31, 2025

Merged via the queue into openvinotoolkit:master with commit 897ea48 Jan 31, 2025
147 checks passed

AlexKoff88 deleted the ak/llm_docs_fixes branch January 31, 2025 13:07

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LLM optimization documentation fixes and updates. #28212

LLM optimization documentation fixes and updates. #28212

AlexKoff88 commented Dec 26, 2024

AlexKoff88 commented Jan 21, 2025

kblaszczak-intel Jan 31, 2025

kblaszczak-intel Jan 31, 2025

kblaszczak-intel Jan 31, 2025

	from about 16.1 GB to 4.8 GB using 4-bit weight quantization on top of bfloat16 model.
	from about 16.1 GB to 4.8 GB using 4-bit weight quantization on top of a bfloat16 model.

	LLMs and other GenAI models that require
	LLMs and other generative AI models that require

		NNCF allows stacking the supported optimization methods. For example, AWQ, Scale Estimation
		and GPTQ methods can be enabled all together to achieve better accuracy.

LLM optimization documentation fixes and updates. #28212

LLM optimization documentation fixes and updates. #28212

Conversation

AlexKoff88 commented Dec 26, 2024

AlexKoff88 commented Jan 21, 2025

kblaszczak-intel Jan 31, 2025

Choose a reason for hiding this comment

kblaszczak-intel Jan 31, 2025

Choose a reason for hiding this comment

kblaszczak-intel Jan 31, 2025

Choose a reason for hiding this comment