[Bug] 多轮对话历史中如果包含tool_calls，则response中的tool_calls arguments会被额外嵌套一层string #3058

ExenVitor · 2025-01-21T03:31:28Z

Checklist

1. I have searched related issues but cannot get the expected help.
2. The bug has not been fixed in the latest version.
3. Please note that if the bug-related issue you submitted lacks corresponding environment info and a minimal reproducible demo, it will be challenging for us to reproduce and resolve the issue, reducing the likelihood of receiving feedback.

Describe the bug

Model：Qwen2.5-32B-Instruct-AWQ
尝试在多轮对话的历史中保留带有tool_calls的message，结果新的response中tool_calls arguments被额外嵌套了一层string。
我根据Tools Calling文档中的Qwen2.5 demo复现了一下，结果如下：

History Messages:

[{'content': "Today is 2024-11-14, What's the temperature in San Francisco "
             'now?',
  'role': 'user'},
 ChatCompletionMessage(content='', refusal=None, role='assistant', audio=None, function_call=None, tool_calls=[ChatCompletionMessageToolCall(id='0', function=Function(arguments='{"location": "San Francisco, California, USA"}', name='get_current_temperature'), type='function')]),
 {'content': {'location': 'San Francisco, California, USA',
              'temperature': 26.1,
              'unit': 'celsius'},
  'name': 'get_current_temperature',
  'role': 'tool',
  'tool_call_id': '0'},
 ChatCompletionMessage(content='The current temperature in San Francisco, California, USA is 26.1 degrees Celsius.', refusal=None, role='assistant', audio=None, function_call=None, tool_calls=None),
 {'content': "Today is 2024-11-14, What's the temperature in Beijing now?",
  'role': 'user'}]

Respose:

ChatCompletion(id='648', choices=[Choice(finish_reason='tool_calls', index=0, logprobs=None, message=ChatCompletionMessage(content='', refusal=None, role='assistant', audio=None, function_call=None, tool_calls=[ChatCompletionMessageToolCall(id='0', function=Function(arguments='"{\\"location\\": \\"Beijing, China\\"}"', name='get_current_temperature'), type='function')]))], created=1737428106, model='/models/Qwen2.5-32B-Instruct-AWQ-turbomind', object='chat.completion', service_tier=None, system_fingerprint=None, usage=CompletionUsage(completion_tokens=25, prompt_tokens=530, total_tokens=555, completion_tokens_details=None, prompt_tokens_details=None))

可以看到 arguments='"{\\"location\\": \\"Beijing, China\\"}"'，这会导致后续调用tool的时候arguments解析失败。

我查看了一下源码，发现lmdeploy在拼接prompt template的时候会为tool_calls arguments额外进行一次json.dumps，除了qwen2.5以外，Internlm2也做了类似的处理，请问下这里是否是个bug？
https://github.com/InternLM/lmdeploy/blob/3f8b079224d109aaa1b512867203a47ea600aa7d/lmdeploy/model.py#L1066C11-L1077C34

Reproduction

from openai import OpenAI
import json
from pprint import pprint

def get_current_temperature(location: str, unit: str = "celsius"):
    """Get current temperature at a location.

    Args:
        location: The location to get the temperature for, in the format "City, State, Country".
        unit: The unit to return the temperature in. Defaults to "celsius". (choices: ["celsius", "fahrenheit"])

    Returns:
        the temperature, the location, and the unit in a dict
    """
    return {
        "temperature": 26.1,
        "location": location,
        "unit": unit,
    }


def get_temperature_date(location: str, date: str, unit: str = "celsius"):
    """Get temperature at a location and date.

    Args:
        location: The location to get the temperature for, in the format "City, State, Country".
        date: The date to get the temperature for, in the format "Year-Month-Day".
        unit: The unit to return the temperature in. Defaults to "celsius". (choices: ["celsius", "fahrenheit"])

    Returns:
        the temperature, the location, the date and the unit in a dict
    """
    return {
        "temperature": 25.9,
        "location": location,
        "date": date,
        "unit": unit,
    }

def get_function_by_name(name):
    if name == "get_current_temperature":
        return get_current_temperature
    if name == "get_temperature_date":
        return get_temperature_date

tools = [{
    'type': 'function',
    'function': {
        'name': 'get_current_temperature',
        'description': 'Get current temperature at a location.',
        'parameters': {
            'type': 'object',
            'properties': {
                'location': {
                    'type': 'string',
                    'description': 'The location to get the temperature for, in the format \'City, State, Country\'.'
                },
                'unit': {
                    'type': 'string',
                    'enum': [
                        'celsius',
                        'fahrenheit'
                    ],
                    'description': 'The unit to return the temperature in. Defaults to \'celsius\'.'
                }
            },
            'required': [
                'location'
            ]
        }
    }
}, {
    'type': 'function',
    'function': {
        'name': 'get_temperature_date',
        'description': 'Get temperature at a location and date.',
        'parameters': {
            'type': 'object',
            'properties': {
                'location': {
                    'type': 'string',
                    'description': 'The location to get the temperature for, in the format \'City, State, Country\'.'
                },
                'date': {
                    'type': 'string',
                    'description': 'The date to get the temperature for, in the format \'Year-Month-Day\'.'
                },
                'unit': {
                    'type': 'string',
                    'enum': [
                        'celsius',
                        'fahrenheit'
                    ],
                    'description': 'The unit to return the temperature in. Defaults to \'celsius\'.'
                }
            },
            'required': [
                'location',
                'date'
            ]
        }
    }
}]
messages = [{'role': 'user', 'content': 'Today is 2024-11-14, What\'s the temperature in San Francisco now?'}]
client = OpenAI(api_key='YOUR_API_KEY', base_url='http://192.168.16.11:23333/v1')
model_name = client.models.list().data[0].id
response = client.chat.completions.create(
    model=model_name,
    messages=messages,
    temperature=0.8,
    top_p=0.8,
    stream=False,
    tools=tools)
print(response.choices[0].message.tool_calls)
messages.append(response.choices[0].message)

for tool_call in response.choices[0].message.tool_calls:
    tool_call_args = json.loads(tool_call.function.arguments)
    tool_call_result =  get_function_by_name(tool_call.function.name)(**tool_call_args)
    messages.append({
        'role': 'tool',
        'name': tool_call.function.name,
        'content': tool_call_result,
        'tool_call_id': tool_call.id
    })

response = client.chat.completions.create(
    model=model_name,
    messages=messages,
    temperature=0.8,
    top_p=0.8,
    stream=False,
    tools=tools)
pprint(response.choices[0].message.content)
messages.append(response.choices[0].message)

messages.append({'role': 'user', 'content': 'Today is 2024-11-14, What\'s the temperature in Beijing now?'})

pprint("--------MESSAGES----------")
pprint(messages)
pprint("--------MESSAGES----------")

response = client.chat.completions.create(
    model=model_name,
    messages=messages,
    temperature=0.8,
    top_p=0.8,
    stream=False,
    tools=tools)
pprint(response)

Environment

sys.platform: linux
Python: 3.10.12 (main, Nov  6 2024, 20:22:13) [GCC 11.4.0]
CUDA available: True
MUSA available: False
numpy_random_seed: 2147483648
GPU 0: NVIDIA GeForce RTX 4090 D
CUDA_HOME: /usr/local/cuda
NVCC: Cuda compilation tools, release 12.4, V12.4.131
GCC: x86_64-linux-gnu-gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0
PyTorch: 2.4.1+cu121
PyTorch compiling details: PyTorch built with:
  - GCC 9.3
  - C++ Version: 201703
  - Intel(R) oneAPI Math Kernel Library Version 2022.2-Product Build 20220804 for Intel(R) 64 architecture applications
  - Intel(R) MKL-DNN v3.4.2 (Git Hash 1137e04ec0b5251ca2b4400a4fd3c667ce843d67)
  - OpenMP 201511 (a.k.a. OpenMP 4.5)
  - LAPACK is enabled (usually provided by MKL)
  - NNPACK is enabled
  - CPU capability usage: AVX2
  - CUDA Runtime 12.1
  - NVCC architecture flags: -gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86;-gencode;arch=compute_90,code=sm_90
  - CuDNN 90.1  (built against CUDA 12.4)
  - Magma 2.6.1
  - Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=12.1, CUDNN_VERSION=9.1.0, CXX_COMPILER=/opt/rh/devtoolset-9/root/usr/bin/c++, CXX_FLAGS= -D_GLIBCXX_USE_CXX11_ABI=0 -fabi-version=11 -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -DNDEBUG -DUSE_KINETO -DLIBKINETO_NOROCTRACER -DUSE_FBGEMM -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -O2 -fPIC -Wall -Wextra -Werror=return-type -Werror=non-virtual-dtor -Werror=bool-operation -Wnarrowing -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-unused-parameter -Wno-unused-function -Wno-unused-result -Wno-strict-overflow -Wno-strict-aliasing -Wno-stringop-overflow -Wsuggest-override -Wno-psabi -Wno-error=pedantic -Wno-error=old-style-cast -Wno-missing-braces -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_VERSION=2.4.1, USE_CUDA=ON, USE_CUDNN=ON, USE_CUSPARSELT=1, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_GLOO=ON, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=1, USE_NNPACK=ON, USE_OPENMP=ON, USE_ROCM=OFF, USE_ROCM_KERNEL_ASSERT=OFF,

TorchVision: 0.19.1+cu121
LMDeploy: 0.6.5+af0fcf2
transformers: 4.47.1
gradio: 5.9.1
fastapi: 0.115.6
pydantic: 2.10.4
triton: 3.0.0
NVIDIA Topology:
        GPU0    CPU Affinity    NUMA Affinity   GPU NUMA ID
GPU0     X      0-7     0               N/A

Legend:

  X    = Self
  SYS  = Connection traversing PCIe as well as the SMP interconnect between NUMA nodes (e.g., QPI/UPI)
  NODE = Connection traversing PCIe as well as the interconnect between PCIe Host Bridges within a NUMA node
  PHB  = Connection traversing PCIe as well as a PCIe Host Bridge (typically the CPU)
  PXB  = Connection traversing multiple PCIe bridges (without traversing the PCIe Host Bridge)
  PIX  = Connection traversing at most a single PCIe bridge
  NV#  = Connection traversing a bonded set of # NVLinks

Error traceback

The text was updated successfully, but these errors were encountered:

AllentDan · 2025-01-21T10:55:41Z

It should have no impact on function_call since every json cotent should be used after json.loads. Double backslashes can be removed by json.loads.

(Pdb) response.choices[0].message.tool_calls[0].function.arguments
'"{\\"location\\": \\"Beijing, China\\", \\"unit\\": \\"celsius\\"}"'
(Pdb) cc = response.choices[0].message.tool_calls[0].function.arguments
(Pdb) import json
(Pdb) json.loads(cc)
'{"location": "Beijing, China", "unit": "celsius"}'

ExenVitor · 2025-01-22T03:24:51Z

其实我想说的问题是，按照openai API的定义，arguments字段的类型就是str，因此server端在拼接chat template的时候再次对其进行json.dumps会导致模型在后续的response时返回类似json.dumps(json.dumps({"xxx": "xxx"}))的结果。
在使用类似langchain这样的第三方框架时，由于其使用了Pydantic Model来定义Message，它会对arguments执行一次json.loads并期望得到一个dict，此时会导致ValidationError。

AllentDan · 2025-01-22T03:29:05Z

json.loads 不是会将多余的反斜杠去掉并得到一个 dict 吗？或者说可以给一个返回的内容没法用 json.loads 读取的例子？

ExenVitor · 2025-01-22T03:33:56Z

It should have no impact on function_call since every json cotent should be used after json.loads. Double backslashes can be removed by json.loads.
(Pdb) response.choices[0].message.tool_calls[0].function.arguments
'"{\\"location\\": \\"Beijing, China\\", \\"unit\\": \\"celsius\\"}"'
(Pdb) cc = response.choices[0].message.tool_calls[0].function.arguments
(Pdb) import json
(Pdb) json.loads(cc)
'{"location": "Beijing, China", "unit": "celsius"}'

这里 '{"location": "Beijing, China", "unit": "celsius"}'依然是个str吧

3mpt · 2025-01-22T07:04:57Z

我也遇到了同样的问题，个人研究了一下是因为internlm生成的格式不符合预期导致使用Pydantic报错。

这两张图片能明显看出返回格式的区别，我换用了vllm部署问题得到成功解决

但是现在书生3出了还没有兼容到vllm

AllentDan · 2025-01-22T09:31:58Z

这里 '{"location": "Beijing, China", "unit": "celsius"}'依然是个str吧

这里是因为原始内容就是个 json 加载后仍旧为 str 的内容。理论上 json.loads 后再执行一次 json.dumps，会恢复原本的内容。

ExenVitor · 2025-01-22T10:27:06Z

这里 '{"location": "Beijing, China", "unit": "celsius"}'依然是个str吧

这里是因为原始内容就是个 json 加载后仍旧为 str 的内容。理论上 json.loads 后再执行一次 json.dumps，会恢复原本的内容。

这里不是说能不能恢复原本内容的问题，而是说按照Open API的设计，对于Response，json.loads(arguments)就应该得到一个dict，显然第三方的client也都是这么预期的。

然而lmdepoly却返回了json.dumps(json.dumps(arguments))，原因就在于lmdepoly的实现认为请求 /chat/completions时messages.tool_calls.function.arguments是一个dict，然后在拼接prompt template的时候默认为arguments都执行了json.dumps(arguments)，这个后果就相当于给模型提供了few-shots，让模型认为返回的格式就应该是json.dumps(json.dumps(arguments))。

根据OpenAI API Reference，messages.tool_calls.function.arguments就是个str，也就是说无论是例子里的情况还是第三方client的实现，都会确保在client端将arguments(dict)进行一次json.dumps(arguments)转化为arguments(str)，而lmdeploy在接受到请求后再次对arguments(str)执行一次json.dumps是不必要的，也是造成这个现象的原因。

lvhan028 assigned AllentDan Jan 21, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug] 多轮对话历史中如果包含tool_calls，则response中的tool_calls arguments会被额外嵌套一层string #3058

[Bug] 多轮对话历史中如果包含tool_calls，则response中的tool_calls arguments会被额外嵌套一层string #3058

ExenVitor commented Jan 21, 2025

AllentDan commented Jan 21, 2025

ExenVitor commented Jan 22, 2025

AllentDan commented Jan 22, 2025

ExenVitor commented Jan 22, 2025

3mpt commented Jan 22, 2025

AllentDan commented Jan 22, 2025 •

edited

Loading

ExenVitor commented Jan 22, 2025

[Bug] 多轮对话历史中如果包含tool_calls，则response中的tool_calls arguments会被额外嵌套一层string #3058

[Bug] 多轮对话历史中如果包含tool_calls，则response中的tool_calls arguments会被额外嵌套一层string #3058

Comments

ExenVitor commented Jan 21, 2025

Checklist

Describe the bug

Reproduction

Environment

Error traceback

AllentDan commented Jan 21, 2025

ExenVitor commented Jan 22, 2025

AllentDan commented Jan 22, 2025

ExenVitor commented Jan 22, 2025

3mpt commented Jan 22, 2025

AllentDan commented Jan 22, 2025 • edited Loading

ExenVitor commented Jan 22, 2025

AllentDan commented Jan 22, 2025 •

edited

Loading