You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Send a request to a text-only model with a payload containing a text + image content. Here is a curl command to reproduce it. It sends an image as image_url and "Describe this image in one sentence." as text.
curl -X POST \
-H 'Content-Type: application/json' \
-H 'authorization: Bearer <HF TOKEN>' \
-d '{ "model": "mistralai/Mistral-Nemo-Instruct-2407", "messages": [ { "role": "user", "content": [ {"type": "image_url", "image_url": {"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"}}, {"type": "text", "text": "Describe this image in one sentence."} ] } ] }' \
https://api-inference.huggingface.co/models/mistralai/Mistral-Nemo-Instruct-2407/v1/chat/completions
{"object":"chat.completion","id":"","created":1727279137,"model":"mistralai/Mistral-Nemo-Instruct-2407","system_fingerprint":"2.2.1-dev0-sha-a0b6a24","choices":[{"index":0,"message":{"role":"assistant","content":"The Statue of Liberty stands tall and proud on its pedestal, facing the New York Bay."},"logprobs":null,"finish_reason":"stop"}],"usage":{"prompt_tokens":57,"completion_tokens":19,"total_tokens":76}}
Expected behavior
Currently TGI returns successfully with the sentence "The Statue of Liberty stands tall and proud on its pedestal, facing the New York Bay.". It seems that since the model is not capable of handling the image, the image url is directly passed to the model. Since the url contains Statue-of-Liberty-Island-New-York-Bay.jpg, the answer looks correct but is not generated from the image itself.
{"object":"chat.completion","id":"","created":1727279137,"model":"mistralai/Mistral-Nemo-Instruct-2407","system_fingerprint":"2.2.1-dev0-sha-a0b6a24","choices":[{"index":0,"message":{"role":"assistant","content":"The Statue of Liberty stands tall and proud on its pedestal, facing the New York Bay."},"logprobs":null,"finish_reason":"stop"}],"usage":{"prompt_tokens":57,"completion_tokens":19,"total_tokens":76}}%
In such a case I would expect either a 400 Bad request or a 422 Unprocessable entity.
I also tried with a base64-encoded URL and the model fails (max tokens exceeded) since the full base64 encoding seems to be tokenized.
The text was updated successfully, but these errors were encountered:
(noticed this error while working on huggingface/huggingface_hub#2556)
System Info
Using TGI through Inference API (e.g. mistralai/Mistral-Nemo-Instruct-2407). At the time I open this issue
/info
returnsInformation
Tasks
Reproduction
Send a request to a text-only model with a payload containing a text + image content. Here is a curl command to reproduce it. It sends an image as
image_url
and"Describe this image in one sentence."
astext
.Expected behavior
Currently TGI returns successfully with the sentence
"The Statue of Liberty stands tall and proud on its pedestal, facing the New York Bay."
. It seems that since the model is not capable of handling the image, the image url is directly passed to the model. Since the url containsStatue-of-Liberty-Island-New-York-Bay.jpg
, the answer looks correct but is not generated from the image itself.In such a case I would expect either a 400 Bad request or a 422 Unprocessable entity.
I also tried with a base64-encoded URL and the model fails (max tokens exceeded) since the full base64 encoding seems to be tokenized.
The text was updated successfully, but these errors were encountered: