Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Does it support the gguf format model of Qwen2-VL-2B-Instruct #1895

Open
helloHKTK opened this issue Jan 9, 2025 · 5 comments
Open

Does it support the gguf format model of Qwen2-VL-2B-Instruct #1895

helloHKTK opened this issue Jan 9, 2025 · 5 comments

Comments

@helloHKTK
Copy link

Is your feature request related to a problem? Please describe.
I would like to support whether the current llama-cpp-python supports the gguf format model of Qwen2-VL-2B-Instruct?

Describe the solution you'd like
Does it support the gguf format model of Qwen2-VL-2B-Instruct.

Describe alternatives you've considered
Does it support the gguf format model of Qwen2-VL-2B-Instruct.

Additional context
Add any other context or screenshots about the feature request here.

@rob-field1
Copy link

rob-field1 commented Jan 9, 2025

Yes the latest commit integrates the llama.cpp changes for Qwen2-VL support.
e.g.
llm = Llama( model_path="models\Qwen2-VL-7B-Instruct-Q6_K_L.gguf", n_gpu_layers=30, n_threads=6, seed=42, n_ctx=7500, flash_attn=True, )
Model metadata: {'general.name': 'Qwen2 VL 7B Instruct', 'general.architecture': 'qwen2vl', ...

edit: although I'm having some trouble processing/running inference on images.

@fanfansoft
Copy link

Yes the latest commit integrates the llama.cpp changes for Qwen2-VL support. e.g. llm = Llama( model_path="models\Qwen2-VL-7B-Instruct-Q6_K_L.gguf", n_gpu_layers=30, n_threads=6, seed=42, n_ctx=7500, flash_attn=True, ) Model metadata: {'general.name': 'Qwen2 VL 7B Instruct', 'general.architecture': 'qwen2vl', ...

edit: although I'm having some trouble processing/running inference on images.

hello,Is the 'latest commit' referring to the llama-cpp-python version 0.3.6? How should the chat_handler be referenced in llm = Llama()? I looked at the source code and didn't find any support for qwen2_vl_2B.

@MoRocety
Copy link

Yes the latest commit integrates the llama.cpp changes for Qwen2-VL support. e.g. llm = Llama( model_path="models\Qwen2-VL-7B-Instruct-Q6_K_L.gguf", n_gpu_layers=30, n_threads=6, seed=42, n_ctx=7500, flash_attn=True, ) Model metadata: {'general.name': 'Qwen2 VL 7B Instruct', 'general.architecture': 'qwen2vl', ...

edit: although I'm having some trouble processing/running inference on images.

To my understanding, the actual llama.cpp lib doesn't actually and never has supported multimodal inference server with qwen2vl so curious as to how you made it work at all?

@rob-field1
Copy link

@MoRocety There's been support for a few months (ggerganov/llama.cpp#10361) but the llama-cpp-python prompt format needs some tweaking

@MoRocety
Copy link

@MoRocety There's been support for a few months (ggerganov/llama.cpp#10361) but the llama-cpp-python prompt format needs some tweaking

I think they mention in there that llama-server doesn't support multimodal yet, so again curious as to how llama-cpp-python makes it work?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants