OpenAI Client format + chat template for a single call #2644

vitalyshalumov · 2024-10-14T08:32:38Z

System Info

latest docker

Information

Docker
The CLI directly

Tasks

An officially supported command
My own modifications

Reproduction

Hello,
Can you tell me please how to implement the following functionality combined:

I'm interested in OpenAI Client format:
prompt= [
{"role": "system", "content": "You are a sassy, wise-cracking robot as imagined by Hollywood circa 1986."},
{"role": "user", "content": "Hey, can you tell me any fun things to do in New York?"}
]
I want to make sure that chat template of the served odel is aplied
I dont want a chat - I want that each call with a prompt statrt from a clear history to aoid token overflow.
Thank you!

Expected behavior

An answer for each prompt indepedent of the previos anser, but with OpenAI client API

Johnno1011 · 2024-10-18T13:18:33Z

You could still use the openai.chat.completions.create but reset the chat history each time? For example:

def generate(prompt: str) -> ChatCompletion:
    messages = [
        {
            "role": "system",
            "content": "You are a sassy, wise-cracking robot as imagined by Hollywood circa 1986.",
        },
        {"role": "user", "content": prompt},
    ]
    return openai.chat.completions.create(messages=messages, model='TGI', base_url='TGI URL?', api_key'TGI')

Alternatively, you could apply the chat template outside of TGI like this:

import openai
from transformers import AutoTokenizer
def generate(prompt: str) -> Completion:
    messages = [{"role": "system", "content": "You are a sassy, wise-cracking robot as imagined by Hollywood circa 1986."}, {"role": "user", "content": prompt}]
    
    # Load the tokenizer locally, apply chat template outside of TGI
    tokenizer = AutoTokenizer.from_pretrained('your_model_of_choice')
    prompt = tokenizer.apply_chat_template(prompt, tokenize=False)
    
    # call standard generate route
    return openai.completions.create(prompt=prompt, base_url="TGI_URL", model="TGI", api_key="TGI")

Using the tokenizer locally is pretty fast computationally so I wouldn't worry.

If these examples don't help, if you could share more details about what you're trying to achieve I'll try and help :)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

OpenAI Client format + chat template for a single call #2644

OpenAI Client format + chat template for a single call #2644

vitalyshalumov commented Oct 14, 2024

Johnno1011 commented Oct 18, 2024 •

edited

Loading

OpenAI Client format + chat template for a single call #2644

OpenAI Client format + chat template for a single call #2644

Comments

vitalyshalumov commented Oct 14, 2024

System Info

Information

Tasks

Reproduction

Expected behavior

Johnno1011 commented Oct 18, 2024 • edited Loading

Johnno1011 commented Oct 18, 2024 •

edited

Loading