-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Does it support the gguf format model of Qwen2-VL-2B-Instruct #1895
Comments
Yes the latest commit integrates the llama.cpp changes for Qwen2-VL support. edit: although I'm having some trouble processing/running inference on images. |
hello,Is the 'latest commit' referring to the llama-cpp-python version 0.3.6? How should the chat_handler be referenced in llm = Llama()? I looked at the source code and didn't find any support for qwen2_vl_2B. |
To my understanding, the actual llama.cpp lib doesn't actually and never has supported multimodal inference server with qwen2vl so curious as to how you made it work at all? |
@MoRocety There's been support for a few months (ggerganov/llama.cpp#10361) but the llama-cpp-python prompt format needs some tweaking |
I think they mention in there that llama-server doesn't support multimodal yet, so again curious as to how llama-cpp-python makes it work? |
Is your feature request related to a problem? Please describe.
I would like to support whether the current llama-cpp-python supports the gguf format model of Qwen2-VL-2B-Instruct?
Describe the solution you'd like
Does it support the gguf format model of Qwen2-VL-2B-Instruct.
Describe alternatives you've considered
Does it support the gguf format model of Qwen2-VL-2B-Instruct.
Additional context
Add any other context or screenshots about the feature request here.
The text was updated successfully, but these errors were encountered: