Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix TGI (Text Generation Inference) Endpoint Inference and TGI JSON Grammar Generation #502

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

cpcdoy
Copy link

@cpcdoy cpcdoy commented Jan 15, 2025

Description

While implementing a custom task using lighteval, I needed to use constrained grammar generation with TGI and it seems that TGI integration is not up-to-date and not working.

Fixes for TGI Endpoint Inference

  • The /info route of TGI 3.0.1 doesn't always return required fields such as model_dtype, so it was set to None by default if not found:
$ curl http://localhost:8080/info
{"model_id":"unsloth/Qwen2.5-0.5B-Instruct","model_sha":"6a7b5090fc11df0706c796b7ba76762d7beb688b","model_pipeline_tag":"text-generation","max_concurrent_requests":128,"max_best_of":2,"max_stop_sequences":4,"max_input_tokens":32767,"max_total_tokens":32768,"validation_workers":2,"max_client_batch_size":4,"router":"text-generation-router","version":"3.0.1","sha":"bb9095aae339579fbf3b4e7be3909932de26a7ee","docker_label":"sha-bb9095a"}
  • AsyncClient from TGI has a generate function that expects multiple parameters and not a structure.
    • I've set do_sample, return_full_text and watermark parameters as False by default since they come from huggingface_hub which accepts a None default parameters but TGI doesn't accept them
      • Question for a maintainer : Should they be set as such by default? I don't see them being provided to _async_process_request anyway and maybe this should be fixed in another PR. Same for adapter_id for LoRA heads.
  • ModelClient's usage has been fixed to use the config: TGIModelConfig by default instead of named parameters

Fixes for TGI JSON Grammar Generation

  • Updated text_generation to 0.7.0
  • Added support for the grammar field to enable JSON grammar generation

Environment

Command

uv run lighteval endpoint tgi tgi.yaml "custom|...|0|0" --custom-tasks "ner_eval.py" --output-dir "results" --max-samples 10 --override-batch-size 1 --use-chat-template --save-details --no-public-run

Dependencies

dependencies = [
    "datasets>=3.2.0",
    "huggingface-hub>=0.27.1",
    "lighteval[tgi]>=0.7.0",
    "numpy>=1.26.4",
    "pandas>=2.2.3",
    "pydantic>=1.10.21",
    "text-generation==0.6.0",
    "torch>=2.4.1",
    "torchvision>=0.19.1",
]

[tool.uv.sources]
lighteval = { path = "../../../../lighteval", editable = true } # This branch

model_config_path argument for TGI

tgi.yaml:

model:
  instance:
    inference_server_address: "http://localhost:8080"
    inference_server_auth: null
    model_id: null # Optional, only required if the TGI container was launched with model_id pointing to a local directory

Test Results

It works as can be seen from the logs.

TGI Logs with JSON Grammar Generation

2025-01-15T17:09:34.811955Z  INFO compat_generate{default_return_full_text=true compute_type=Extension(ComputeType("1-nvidia-geforce-rtx-3060"))}:generate{parameters=GenerateParameters { best_of: None, temperature: None, repetition_penalty: None, frequency_penalty: None, top_k: None, top_p: None, typical_p: None, do_sample: false, max_new_tokens: Some(128), return_full_text: Some(false), stop: ["\n\n", "<|im_end|>"], truncate: None, watermark: false, details: true, decoder_input_details: true, seed: None, top_n_tokens: None, grammar: Some(Json(Object {"type": String("object"), "properties": Object {"entities": Object {"type": String("array"), "items": Object {"type": String("object"), "properties": Object {"entity": Object {"type": String("string")}, "classification": Object {"type": String("string"), "enum": Array [String("merchant"), String("bank"), String("individual"), String("date"), String("location"), String("unknown")]}}, "required": Array [String("entity"), String("classification")]}}}, "required": Array [String("entities")]})), adapter_id: None } total_time="428.587752ms" validation_time="716.935µs" queue_time="82.504µs" inference_time="427.788413ms" time_per_token="25.164024ms" seed="None"}: text_generation_router::server: router/src/server.rs:422: Success

Lighteval Logs

(py3.11.3) cpcdoy@cpcdoy-desktop:~/projects/.../llm_tasks_eval$ uv run lighteval endpoint tgi tgi.yaml "custom|...|0|0" --custom-tasks "ner_eval.py" --output-dir "results" --max-samples 10 --override-batch-size 1 --use-chat-template --save-details --no-public-run
warning: `VIRTUAL_ENV=/home/cpcdoy/py3.11.3` does not match the project environment path `.venv` and will be ignored
[2025-01-15 15:11:24,861] [    INFO]: PyTorch version 2.4.1 available. (config.py:54)
[2025-01-15 15:11:28,418] [ WARNING]: --max_samples WAS SET. THESE NUMBERS ARE ONLY PARTIAL AND SHOULD NOT BE USED FOR COMPARISON UNLESS YOU KNOW WHAT YOU ARE DOING. (pipeline.py:132)
[2025-01-15 15:11:28,418] [    INFO]: --- LOADING MODEL --- (pipeline.py:168)
[2025-01-15 15:11:28,418] [    INFO]: Load model from inference server: http://localhost:8080 (model_loader.py:110)
[2025-01-15 15:11:28,846] [    INFO]: --- LOADING TASKS --- (pipeline.py:195)
[2025-01-15 15:11:28,858] [ WARNING]: If you want to use extended_tasks, make sure you installed their dependencies using `pip install -e .[extended_tasks]`. (registry.py:136)
[2025-01-15 15:11:28,858] [    INFO]: Found 1 custom tasks in /home/cpcdoy/.cache/huggingface/modules/datasets_modules/datasets/ner_eval/1739d6fd80c40f11df64fba54bf39bd05b1b1408659c4325f28f0ca9ee2a04b0/ner_eval.py (registry.py:141)
[2025-01-15 15:11:28,861] [    INFO]: ... default (lighteval_task.py:187)
[2025-01-15 15:11:28,861] [ WARNING]: Careful, the task ... is using evaluation data to build the few shot examples. (lighteval_task.py:261)
[2025-01-15 15:11:28,898] [    INFO]: --- INIT SEEDS --- (pipeline.py:224)
[2025-01-15 15:11:28,899] [    INFO]: --- RUNNING MODEL --- (pipeline.py:267)
[2025-01-15 15:11:28,899] [    INFO]: Running RequestType.GREEDY_UNTIL requests (pipeline.py:271)
[2025-01-15 15:11:28,903] [ WARNING]: You cannot select the number of dataset splits for a generative evaluation at the moment. Automatically inferring. (data.py:260)
Splits: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:04<00:00,  4.90s/it]
[2025-01-15 15:11:33,800] [    INFO]: --- COMPUTING METRICS --- (pipeline.py:299)                                                                  
[2025-01-15 15:11:33,802] [    INFO]: --- DISPLAYING RESULTS --- (pipeline.py:342)
|            Task             |Version|        Metric         |Value|   |Stderr|
|-----------------------------|------:|-----------------------|----:|---|-----:|
...

[2025-01-15 15:11:33,824] [    INFO]: --- SAVING AND PUSHING RESULTS --- (pipeline.py:332)
[2025-01-15 15:11:33,825] [    INFO]: Saving experiment tracker (evaluation_tracker.py:154)
[2025-01-15 15:11:33,848] [    INFO]: Saving results to ... (evaluation_tracker.py:208)
[2025-01-15 15:11:33,851] [    INFO]: Saving details to ... (evaluation_tracker.py:216)
Creating parquet from Arrow format: 100%|████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 82.46ba/s]

Note: I have anonymized parts of the logs

@cpcdoy cpcdoy changed the title Fix TGI (Text Generation Inference) Endpoint Inference Fix TGI (Text Generation Inference) Endpoint Inference and TGI JSON Grammar Generation Jan 15, 2025
@cpcdoy
Copy link
Author

cpcdoy commented Jan 15, 2025

Updated the PR to add support for JSON Grammar Constrained Generation for TGI

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant