-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[MODULE] A module on quantization #169
base: main
Are you sure you want to change the base?
Conversation
Hi @michaelshekasta . Sorry to go quiet on this. I've been wrapped up on an agents course for HF learn this week. I will review it tomorrow. |
@burtenshaw a gentle reminder |
@michaelshekasta This is a great start. I've implemented a more typical structure. I would suggest that you now follow on with next stage:
Once you're ready, I'll review and complete the module's prose. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would suggest moving on to the notebook and implementing two very simple notebooks where you:
- use llamacpp
- use a cpu inference of your choice
I will take a pass at the existing prose in fundamentals and update that.
Need to look for more resources | ||
|
||
## Exercise Notebooks | ||
I'm unsure about what exactly we should include here. Below are a few options, along with my humble thoughts: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you look at the other modules you'll see table with example notebooks. In this module we will need 2. One on GGUF and one on CPU inference.
| Title | Description | Exercise | Link | Colab | | ||
|-------|-------------|----------|------|-------| | ||
| Quantization with LlamaCPP | Description| Exercise| [link](./notebooks/example.ipynb) | <a target="_blank" href="link"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a> | | ||
| CPU Inference (Intel or MLX) | Description| Exercise| [link](./notebooks/example.ipynb) | <a target="_blank" href="link"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a> | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This table is sufficient. You can remove the mention of exercise notebooks in sub pages and replace with links.
I’d like to propose a new module aimed at optimizing language models for efficient CPU-based inference, reducing reliance on GPUs. The module covers three key areas: quantization techniques, the GGUF model format, and utilizing Intel and MLX accelerators for optimized inference.
What are you thinking?