Python: history is not extended with tool call related messages automatically #9408

bbence84 · 2024-10-23T19:57:35Z

Describe the bug

Not exactly sure if this is a bug, but at least a gap. I have moved away from using the Planners and rely on "vanilla" function calling. Yet my tools that I defined can be chained after each other, and the LLM does come up with a plan how to chain them, and can call them after each other pretty well.
Now my problem is that my tool functions usually return a JSON which contains i.e. the IDs of certain business objects that were created, and then the followup tool has a parameter that should use this (technical ID). The LLM response (the textual response) based on the function return value usually does not contain this ID (rightly so, as it's indeed a technical detail). The problem is that now followup functions have no clue about the ID, because the ChatHistory does not contain the tool calls.

Rerefencing issue #8071 and #8020.

To Reproduce

I tested this with a simpler example, as per recommendation the following:
https://github.com/microsoft/semantic-kernel/blob/main/python/samples/concepts/filtering/auto_function_invoke_filters.py
Also here printing the ChatHistory does not contain any tool call or tool call results:

    chat_hist_json = history.serialize()
    print(f"Chat history: {chat_hist_json}")

Expected behavior

Chat history contains FunctionResultContent and FunctionCallContent automatically, as that is needed if tool functions are to be chained and one tool uses the result of a previous tool call.

Platform

semantic-kernel==1.11.0

Additional context
Add any other context about the problem here.

The text was updated successfully, but these errors were encountered:

moonbox3 · 2024-10-24T13:52:14Z

Hi @bbence84, thanks for continuing the conversation here. As I mentioned in our previous thread in #8071, all of the relevant calls are contained in the metadata's chat history. After each kernel invocation, you have access to the underlying content types (whether FunctionCallContent or FunctionResultContent -- even if not using a filter).

For the caller's ChatHistory, that's their responsibility to manage. The caller shall populate their chat history with the information required to either maintain context (or not).

Can you please help me understand what the gap is?

bbence84 · 2024-11-01T22:02:57Z

Sorry for the late response and thanks for the reply.
I think I know what confused me: I was using kernel.invoke_stream, and I could not find these in the messages returned by the kernel.invoke_stream result messages. The result of kernel.invoke indeed contains what you mentioned, but I can't seem to be able to get this with invoke_stream.
Am I overlooking something? :)
Thanks!

bbence84 · 2024-11-05T22:05:43Z

Hi @bbence84, thanks for continuing the conversation here. As I mentioned in our previous thread in #8071, all of the relevant calls are contained in the metadata's chat history. After each kernel invocation, you have access to the underlying content types (whether FunctionCallContent or FunctionResultContent -- even if not using a filter).

For the caller's ChatHistory, that's their responsibility to manage. The caller shall populate their chat history with the information required to either maintain context (or not).

Can you please help me understand what the gap is?

You are right, I understand now, that the caller needs to fill the ChatHistory. However as I mentioned above, there seems to be a difference between how the kernel.invoke and kernel.invoke_stream returns relevant info. The stream based invoke does not seem to contain the tool call related messages. I can do the following for the kernel.invoke version though:
chat_history.add_message(answer.metadata['messages'].messages[-2])
chat_history.add_message(answer.metadata['messages'].messages[-1])

But this is not available for the invoke_stream.

Am I missing something? Thanks!

moonbox3 · 2024-11-05T23:29:56Z

Thanks for the ping on this. Will have a look tomorrow morning and get back to you.

moonbox3 · 2024-11-06T14:43:25Z

Hi @bbence84, I've been looking into this more. Sorry for the delay.

In the chat_gpt_api_function_calling.py concept sample, I added the following line to combine the streaming chunks we receive from the LLM:

if result_content:
    # this line is new to view content types
    streaming_chat_message = reduce(lambda first, second: first + second, result_content)
    return "".join([str(content) for content in result_content])

I can see that we get the FunctionCallContent and then the result as StreamingTextContent:

We are yielding those two content types out as they are what we receive streaming from the model. The gap here is that we aren't yielding out the FunctionResultContent. We will add this work to our backlog so we can expose the streaming FunctionResultContent. This would then allow you to add the proper content types to your chat history that you manage.

bbence84 · 2024-11-10T21:04:02Z

Thanks a lot! :) I was beginning to have doubts that I don't understand how this supposed to work, but thanks for confirming that it does not work the same way with streaming.

bbence84 · 2024-11-26T07:34:48Z

@moonbox3 sorry for pinging you, but is the yielding of FunctionResultContent already part of the backlog for an upcoming sprint?
The problem is that with streaming, I am doing a workaround by adding the FunctionResultContent as a regular (hidden) assistant message, and with this, the result of the function call can be later used by the agent. However sometimes the agent gets confused with subsequent function calls, and will output the function calls in string, as opposed to actually calling the function. So it starts to assume that the way to call functions is to just output them as text in the response, instead of returning a function_call response.
Without streaming, like you said, it works, but I would like to have both streaming and FunctionResultContent available in the result.
Thanks!

moonbox3 · 2024-12-04T22:58:48Z

Hi @bbence84, apologies for the delay. Let me have a look at this in the next few days. We should be able to get it in soon.

moonbox3 · 2024-12-14T05:48:04Z

I am looking at this now -- it's a bit more complicated than I anticipated, but we should have this in next week.

moonbox3 · 2024-12-18T01:08:55Z

@ymuichiro @bbence84 We're going to get this fix into main, but as we were working on this we noticed one complexity around how we're handling yielding the content back to the caller: right now, during auto function calling, we're yielding all messages back without any indication as to which invocation index they are related to.

If we look at the following code from chat_completion_client_base.py:

https://github.com/microsoft/semantic-kernel/blob/4650d27938ebbe8c1fa076fac6b78b6b184ef037/python/semantic_kernel/connectors/ai/chat_completion_client_base.py#L264C13-L274C35

# Auto invoke loop
with use_span(self._start_auto_function_invocation_activity(kernel, settings), end_on_exit=True) as _:
    for request_index in range(settings.function_choice_behavior.maximum_auto_invoke_attempts):
        # Hold the messages, if there are more than one response, it will not be used, so we flatten
        all_messages: list["StreamingChatMessageContent"] = []
        function_call_returned = False
        async for messages in self._inner_get_streaming_chat_message_contents(chat_history, settings):
            for msg in messages:
                if msg is not None:
                    all_messages.append(msg)
                    if any(isinstance(item, FunctionCallContent) for item in msg.items):
                        function_call_returned = True
            yield messages

Depending upon the behavior of auto function calling, the request_index iterates up to the maximum_auto_invoke_attempts. The caller doesn't know today which function auto invoke attempt they're currently on -- so simply handing all yielded messages can be confusing. In a new PR, we will handle adding the request_index (perhaps with a different name) to make it easier to know which streaming message chunks to concatenate, which should help reduce the confusion down the line.

…. Update tests. (#9974) ### Motivation and Context Currently, if using SK's Python streaming chat completion path, we only yield the following content types: `StreamingChatMessageContent` and `FunctionCallContent`. We are not yielding `FunctionResultContent` which is valuable for some use cases.  ### Description This PR updates the code to yield `FunctionResultContent` if it exists in the streaming chat completion path. When we merge the function call result content together into a `StreamingChatMessageContent` type, we check if that message has items (which are of type `FunctionResultContent`) and if so, we yield them. The filter path still works because once they're yielded, we break out of the function calling loop. We need to include the `ai_model_id` if it exists for the current PromptExecutionSettings because if performing a `reduce` operation to add two streaming chat message chunks together, the `StreamingChatMessageContent` that has the function results will break if the `ai_model_id` is not set (the error is thrown in the `__add__` override for the `StreamingChatMessageContent`. Some unit tests that cover function calling were also updated -- during the test, the test JSON function args were breaking in the `json.loads` call because they contained single quotes and not double. We're now sanitizing the args, just in case, so we don't break there. This PR fixes: - #9408 - #9968  ### Contribution Checklist  - [X] The code builds clean without any errors or warnings - [X] The PR follows the [SK Contribution Guidelines](https://github.com/microsoft/semantic-kernel/blob/main/CONTRIBUTING.md) and the [pre-submission formatting script](https://github.com/microsoft/semantic-kernel/blob/main/CONTRIBUTING.md#development-scripts) raises no violations - [X] All unit tests pass, and I have added new tests where possible - [X] I didn't break anyone 😄

moonbox3 · 2024-12-18T06:47:20Z

Completed as #9974.

bbence84 added the bug Something isn't working label Oct 23, 2024

markwallace-microsoft added python Pull requests for the Python Semantic Kernel triage labels Oct 23, 2024

moonbox3 self-assigned this Oct 24, 2024

moonbox3 removed the triage label Oct 24, 2024

moonbox3 added this to Semantic Kernel Oct 24, 2024

moonbox3 added the ai connector Anything related to AI connectors label Dec 4, 2024

moonbox3 mentioned this issue Dec 14, 2024

Python: Yield FunctionResultContent in streaming chat completion path. Update tests. #9974

Merged

4 tasks

moonbox3 closed this as completed Dec 18, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Python: history is not extended with tool call related messages automatically #9408

Python: history is not extended with tool call related messages automatically #9408

bbence84 commented Oct 23, 2024

moonbox3 commented Oct 24, 2024

bbence84 commented Nov 1, 2024

bbence84 commented Nov 5, 2024

moonbox3 commented Nov 5, 2024

moonbox3 commented Nov 6, 2024 •

edited

Loading

bbence84 commented Nov 10, 2024

bbence84 commented Nov 26, 2024

moonbox3 commented Dec 4, 2024

moonbox3 commented Dec 14, 2024

moonbox3 commented Dec 18, 2024

moonbox3 commented Dec 18, 2024

Python: history is not extended with tool call related messages automatically #9408

Python: history is not extended with tool call related messages automatically #9408

Comments

bbence84 commented Oct 23, 2024

moonbox3 commented Oct 24, 2024

bbence84 commented Nov 1, 2024

bbence84 commented Nov 5, 2024

moonbox3 commented Nov 5, 2024

moonbox3 commented Nov 6, 2024 • edited Loading

bbence84 commented Nov 10, 2024

bbence84 commented Nov 26, 2024

moonbox3 commented Dec 4, 2024

moonbox3 commented Dec 14, 2024

moonbox3 commented Dec 18, 2024

moonbox3 commented Dec 18, 2024

moonbox3 commented Nov 6, 2024 •

edited

Loading