Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MODULE] A module on quantization #169

Draft
wants to merge 13 commits into
base: main
Choose a base branch
from

Conversation

michaelshekasta
Copy link

@michaelshekasta michaelshekasta commented Jan 12, 2025

I’d like to propose a new module aimed at optimizing language models for efficient CPU-based inference, reducing reliance on GPUs. The module covers three key areas: quantization techniques, the GGUF model format, and utilizing Intel and MLX accelerators for optimized inference.

What are you thinking?

@michaelshekasta michaelshekasta marked this pull request as draft January 12, 2025 15:03
@burtenshaw
Copy link
Collaborator

Hi @michaelshekasta . Sorry to go quiet on this. I've been wrapped up on an agents course for HF learn this week. I will review it tomorrow.

@michaelshekasta
Copy link
Author

@burtenshaw a gentle reminder

@burtenshaw burtenshaw changed the title Draft!! Quantization [MODULE] A module on quantization Jan 16, 2025
@burtenshaw
Copy link
Collaborator

@michaelshekasta This is a great start. I've implemented a more typical structure. I would suggest that you now follow on with next stage:

  • find references for each section of the module.
  • add them to the references section of the markdown pages.
  • add bullet point note to each section of the page with key topics.
  • highlight sections that you don't understand or need help on

Once you're ready, I'll review and complete the module's prose.

Copy link
Collaborator

@burtenshaw burtenshaw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest moving on to the notebook and implementing two very simple notebooks where you:

  1. use llamacpp
  2. use a cpu inference of your choice

I will take a pass at the existing prose in fundamentals and update that.

Need to look for more resources

## Exercise Notebooks
I'm unsure about what exactly we should include here. Below are a few options, along with my humble thoughts:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you look at the other modules you'll see table with example notebooks. In this module we will need 2. One on GGUF and one on CPU inference.

| Title | Description | Exercise | Link | Colab |
|-------|-------------|----------|------|-------|
| Quantization with LlamaCPP | Description| Exercise| [link](./notebooks/example.ipynb) | <a target="_blank" href="link"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a> |
| CPU Inference (Intel or MLX) | Description| Exercise| [link](./notebooks/example.ipynb) | <a target="_blank" href="link"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a> |
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This table is sufficient. You can remove the mention of exercise notebooks in sub pages and replace with links.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants