Does it support the gguf format model of Qwen2-VL-2B-Instruct #1895

helloHKTK · 2025-01-09T08:55:14Z

Is your feature request related to a problem? Please describe.
I would like to support whether the current llama-cpp-python supports the gguf format model of Qwen2-VL-2B-Instruct？

Describe the solution you'd like
Does it support the gguf format model of Qwen2-VL-2B-Instruct.

Describe alternatives you've considered
Does it support the gguf format model of Qwen2-VL-2B-Instruct.

Additional context
Add any other context or screenshots about the feature request here.

rob-field1 · 2025-01-09T16:29:25Z

Yes the latest commit integrates the llama.cpp changes for Qwen2-VL support.
e.g.
llm = Llama( model_path="models\Qwen2-VL-7B-Instruct-Q6_K_L.gguf", n_gpu_layers=30, n_threads=6, seed=42, n_ctx=7500, flash_attn=True, )
Model metadata: {'general.name': 'Qwen2 VL 7B Instruct', 'general.architecture': 'qwen2vl', ...

edit: although I'm having some trouble processing/running inference on images.

fanfansoft · 2025-01-17T08:33:37Z

Yes the latest commit integrates the llama.cpp changes for Qwen2-VL support. e.g. llm = Llama( model_path="models\Qwen2-VL-7B-Instruct-Q6_K_L.gguf", n_gpu_layers=30, n_threads=6, seed=42, n_ctx=7500, flash_attn=True, ) Model metadata: {'general.name': 'Qwen2 VL 7B Instruct', 'general.architecture': 'qwen2vl', ...

edit: although I'm having some trouble processing/running inference on images.

hello，Is the 'latest commit' referring to the llama-cpp-python version 0.3.6? How should the chat_handler be referenced in llm = Llama()? I looked at the source code and didn't find any support for qwen2_vl_2B.

MoRocety · 2025-01-24T15:57:33Z

Yes the latest commit integrates the llama.cpp changes for Qwen2-VL support. e.g. llm = Llama( model_path="models\Qwen2-VL-7B-Instruct-Q6_K_L.gguf", n_gpu_layers=30, n_threads=6, seed=42, n_ctx=7500, flash_attn=True, ) Model metadata: {'general.name': 'Qwen2 VL 7B Instruct', 'general.architecture': 'qwen2vl', ...

edit: although I'm having some trouble processing/running inference on images.

To my understanding, the actual llama.cpp lib doesn't actually and never has supported multimodal inference server with qwen2vl so curious as to how you made it work at all?

rob-field1 · 2025-01-25T18:12:04Z

@MoRocety There's been support for a few months (ggerganov/llama.cpp#10361) but the llama-cpp-python prompt format needs some tweaking

MoRocety · 2025-01-27T18:25:37Z

@MoRocety There's been support for a few months (ggerganov/llama.cpp#10361) but the llama-cpp-python prompt format needs some tweaking

I think they mention in there that llama-server doesn't support multimodal yet, so again curious as to how llama-cpp-python makes it work?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Does it support the gguf format model of Qwen2-VL-2B-Instruct #1895

Does it support the gguf format model of Qwen2-VL-2B-Instruct #1895

helloHKTK commented Jan 9, 2025

rob-field1 commented Jan 9, 2025 •

edited

Loading

fanfansoft commented Jan 17, 2025

MoRocety commented Jan 24, 2025

rob-field1 commented Jan 25, 2025

MoRocety commented Jan 27, 2025

Does it support the gguf format model of Qwen2-VL-2B-Instruct #1895

Does it support the gguf format model of Qwen2-VL-2B-Instruct #1895

Comments

helloHKTK commented Jan 9, 2025

rob-field1 commented Jan 9, 2025 • edited Loading

fanfansoft commented Jan 17, 2025

MoRocety commented Jan 24, 2025

rob-field1 commented Jan 25, 2025

MoRocety commented Jan 27, 2025

rob-field1 commented Jan 9, 2025 •

edited

Loading