Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LLM optimization documentation fixes and updates. #28212

Merged
merged 4 commits into from
Jan 31, 2025

Conversation

AlexKoff88
Copy link
Contributor

No description provided.

@AlexKoff88 AlexKoff88 requested a review from a team as a code owner December 26, 2024 12:32
@AlexKoff88 AlexKoff88 requested review from kblaszczak-intel and tsavina and removed request for a team December 26, 2024 12:32
@github-actions github-actions bot added the category: docs OpenVINO documentation label Dec 26, 2024
@AlexKoff88
Copy link
Contributor Author

@kblaszczak-intel, @tsavina, can you take a look?

@AlexKoff88 AlexKoff88 enabled auto-merge January 31, 2025 06:34
For instance the 7 billion parameter Llama 2 model can be reduced
from about 25GB to 4GB using 4-bit weight compression.
For instance the 8 billion parameter Llama 3 model can be reduced
from about 16.1 GB to 4.8 GB using 4-bit weight quantization on top of bfloat16 model.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
from about 16.1 GB to 4.8 GB using 4-bit weight quantization on top of bfloat16 model.
from about 16.1 GB to 4.8 GB using 4-bit weight quantization on top of a bfloat16 model.

compression may result in more accuracy reduction than with larger models.
Therefore, weight compression is recommended for use with LLMs only.

LLMs and other GenAI models that require
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
LLMs and other GenAI models that require
LLMs and other generative AI models that require

Comment on lines +140 to +141
NNCF allows stacking the supported optimization methods. For example, AWQ, Scale Estimation
and GPTQ methods can be enabled all together to achieve better accuracy.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
NNCF allows stacking the supported optimization methods. For example, AWQ, Scale Estimation
and GPTQ methods can be enabled all together to achieve better accuracy.
NNCF enables you to stack the supported optimization methods. For example, AWQ, Scale Estimation
and GPTQ methods may be enabled all together to achieve better accuracy.

@AlexKoff88 AlexKoff88 added this pull request to the merge queue Jan 31, 2025
Merged via the queue into openvinotoolkit:master with commit 897ea48 Jan 31, 2025
147 checks passed
@AlexKoff88 AlexKoff88 deleted the ak/llm_docs_fixes branch January 31, 2025 13:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
category: docs OpenVINO documentation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants