Human Vs. Agent: A Comprehensive Analysis of Task-Oriented Conversations with LLMs

Data

We provide all the data files under the real_human_user and LLM_agent_user folder.

The conversation dataset is divided into several parts according to the task and the user. Each conversation is stored in a .json file under the corresponding task folder.

The format of the .json file is:

{
    "task": "The name of the task for this conversation.",
    "preference_id": "The ID of the user profile used in this conversation.",
    "task_context_id": "The ID of the task description used in this conversation.",
    "preference": "Preference used in this conversation.",
    "preference_zh": "Preference used in this conversation translated in Chinese.",
    "task_context": "The scenario-specific task description used in this conversation",
    "task_context_zh": "The scenario-specific task description used in this conversation translated in Chinese.",
    "history": [
        {
            "role": "user",
            "content": "Some text generated by user simulator LLM.",
            "content_zh": "Some text generated by user simulator LLM translated in Chinese.",
            "intent": "The intent annotation of the user simulator LLM.",
            "intent_zh": "The intent annotation of the user simulator LLM translated in Chinese.",
        },
        {
            "role": "assistant",
            "content": "Some text generated by assistant LLM.",
            "content_zh": "Some text generated by assistant LLM translated in Chinese.",
            "hallucination": {
                "hallucination": "Whether the assistant hallucinates in this turn.",
                "memo": "The memo of the hallucination.",
            },
        },
        // ...
    ],
    "conflict": false,
    "preference_summary": "The summary of the user simulator's preference extracted from the conversation.",
    "rating": {
        // Some rating information.
    },
    // Some other information.
}

User Simulator Framework

We provide the code for the framework in the RecUserSim folder.

Check the readme.md file in the RecUserSim folder for more details.

File Structure

real_human_user: The conversation data constructed from real human users.
LLM_agent_user: The conversation data constructed by the LLM agent users.
RecUserSim: The code for the user simulator framework.
README.md: This file.
user_behavior: Automatically label method for user behavior analysis.
user_profiling: The code for our profile identification experiments.
statistics: The code for the statistics of the dataset.

Some Statistics

Simulation Quality across Different Conversation Turns

Turns	# Conv.	Preference Alignment	Role-Playing Completeness
1	1	1.0000	1.0000
2	316	1.2089	1.2911
3	575	1.3496	1.6417
4	452	1.6018	1.6416
5	279	1.7097	1.6093
6	138	1.8043	1.7246
7	69	1.7536	1.7101
8	15	1.8667	1.7333
9	6	1.6667	1.3333
10	5	1.6000	1.2000

Annotation Criteria

Detail Level

2 (High): In-depth responses with detailed explanations (e.g., a recipe including specific ingredients and step-by-step instructions).
0 (Low): Vague or brief responses (e.g., listing destination names without descriptions in a travel plan).
1 (Medium): Responses between these extremes.

Diversity

Low (0): Only one or two options provided per query.
Medium (1): Some queries have three or more options, while others are limited.
High (2): All queries include three or more diverse options.

Practical Utility

Not usable (0): Unrealistic plans (e.g., an overly tight travel schedule or learning plans with unreasonably fast progress).
Somewhat usable (1): Partially applicable suggestions (e.g., gift ideas that provide inspiration but are difficult to implement).
Highly usable (2): Practical and actionable plans (e.g., a Python learning plan with achievable milestones).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Human Vs. Agent: A Comprehensive Analysis of Task-Oriented Conversations with LLMs

Data

User Simulator Framework

File Structure

Some Statistics

Simulation Quality across Different Conversation Turns

Annotation Criteria

Detail Level

Diversity

Practical Utility

About

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
LLM_agent_user		LLM_agent_user
RecUserSim		RecUserSim
real_human_user		real_human_user
statistics		statistics
user_behavior		user_behavior
user_profiling		user_profiling
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

License

wzf2000/RecLLMSim

Folders and files

Latest commit

History

Repository files navigation

Human Vs. Agent: A Comprehensive Analysis of Task-Oriented Conversations with LLMs

Data

User Simulator Framework

File Structure

Some Statistics

Simulation Quality across Different Conversation Turns

Annotation Criteria

Detail Level

Diversity

Practical Utility

About

Topics

Resources

License

Stars

Watchers

Forks

Languages