Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Passing an image_url to a text-only model should fail explicitly #2565

Open
4 tasks
Wauplin opened this issue Sep 25, 2024 · 0 comments
Open
4 tasks

Passing an image_url to a text-only model should fail explicitly #2565

Wauplin opened this issue Sep 25, 2024 · 0 comments

Comments

@Wauplin
Copy link
Contributor

Wauplin commented Sep 25, 2024

(noticed this error while working on huggingface/huggingface_hub#2556)

System Info

Using TGI through Inference API (e.g. mistralai/Mistral-Nemo-Instruct-2407). At the time I open this issue /info returns

{
"model_id": "mistralai/Mistral-Nemo-Instruct-2407",
"model_sha": "e17a136e1dcba9c63ad771f2c85c1c312c563e6b",
"model_pipeline_tag": "text-generation",
"max_concurrent_requests": 128,
"max_best_of": 2,
"max_stop_sequences": 4,
"max_input_tokens": 16000,
"max_total_tokens": 32768,
"validation_workers": 2,
"max_client_batch_size": 4,
"router": "text-generation-router",
"version": "2.2.1-dev0",
"sha": "a0b6a2434503afa5da5f25fa47a3e4589c80941c",
"docker_label": "sha-a0b6a24"
}

Information

  • Docker
  • The CLI directly

Tasks

  • An officially supported command
  • My own modifications

Reproduction

Send a request to a text-only model with a payload containing a text + image content. Here is a curl command to reproduce it. It sends an image as image_url and "Describe this image in one sentence." as text.

curl -X POST \
  -H 'Content-Type: application/json' \
  -H 'authorization: Bearer <HF TOKEN>' \
  -d '{
    "model": "mistralai/Mistral-Nemo-Instruct-2407",
    "messages": [
      {
        "role": "user",
        "content": [
          {"type": "image_url", "image_url": {"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"}},
          {"type": "text", "text": "Describe this image in one sentence."}
        ]
      }
    ]
  }' \
  https://api-inference.huggingface.co/models/mistralai/Mistral-Nemo-Instruct-2407/v1/chat/completions
{"object":"chat.completion","id":"","created":1727279137,"model":"mistralai/Mistral-Nemo-Instruct-2407","system_fingerprint":"2.2.1-dev0-sha-a0b6a24","choices":[{"index":0,"message":{"role":"assistant","content":"The Statue of Liberty stands tall and proud on its pedestal, facing the New York Bay."},"logprobs":null,"finish_reason":"stop"}],"usage":{"prompt_tokens":57,"completion_tokens":19,"total_tokens":76}}

Expected behavior

Currently TGI returns successfully with the sentence "The Statue of Liberty stands tall and proud on its pedestal, facing the New York Bay.". It seems that since the model is not capable of handling the image, the image url is directly passed to the model. Since the url contains Statue-of-Liberty-Island-New-York-Bay.jpg, the answer looks correct but is not generated from the image itself.

{"object":"chat.completion","id":"","created":1727279137,"model":"mistralai/Mistral-Nemo-Instruct-2407","system_fingerprint":"2.2.1-dev0-sha-a0b6a24","choices":[{"index":0,"message":{"role":"assistant","content":"The Statue of Liberty stands tall and proud on its pedestal, facing the New York Bay."},"logprobs":null,"finish_reason":"stop"}],"usage":{"prompt_tokens":57,"completion_tokens":19,"total_tokens":76}}% 

In such a case I would expect either a 400 Bad request or a 422 Unprocessable entity.

I also tried with a base64-encoded URL and the model fails (max tokens exceeded) since the full base64 encoding seems to be tokenized.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant