-
Notifications
You must be signed in to change notification settings - Fork 27.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Tensor Parallel
support for ALL models
#34789
Comments
If it's okay, I want to take the Gemma and Gemma2 @ArthurZucker |
@ArthurZucker I will add support for Granite. Thanks |
Hey @ArthurZucker! I am going to work on Mistral. |
Hey @ArthurZucker! I am going to start working on Qwen2 |
For tensor parallel, when changing the dimension while applying the heads, why is num_heads made to -1. This essentially means that the head computations are being distributed, which is fine. But, an alternate approach could be to keep the num_heads same per device, and let head_dim be dynamic (-1). Is there a problem with the latter? |
I think it might affect the rotatory embeddings so it's better to split the heads |
Just opening this to add support for all models following #34184
Lets bring support to all model! 🤗
It would be great to add the support for more architectures such as
... and many more
For anyone who wants to contribute just open a PR and link it to this issue, and ping me for a review!! 🤗
The text was updated successfully, but these errors were encountered: