Skip to content

Official code and data repository of MathChat: MathChat: Benchmarking Mathematical Reasoning and Instruction Following in Multi-Turn Interactions

License

Notifications You must be signed in to change notification settings

Zhenwen-NLP/MathChat

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 

Repository files navigation

MathChat

Official code and data repository of MathChat: MathChat: Benchmarking Mathematical Reasoning and Instruction Following in Multi-Turn Interactions

Welcome to the repository! This directory contains four JSONL files in our proposed MathChat benchmark:

First part is the benchmark files:

File Descriptions

1. follow_up.jsonl

This file contains entries that facilitate follow-up questioning. Each line consists of three keys:

  • question: Sourced from the GSM8k testing set.
  • answer: Corresponding answer from the GSM8k testing set.
  • followup: Includes two rounds of follow-up questions and reference answers, formatted as a conversation between a user (A:) and an assistant (B:).

2. error_correction.jsonl

This file is designed for error correction tasks. Each line consists of three keys:

  • question: Sourced from the GSM8k testing set.
  • answer: Corresponding answer from the GSM8k testing set.
  • error_correction: Contains a conversation between a user (A:) and an assistant (B:), which includes the original question, an incorrect answer, and the process of correcting the error.

3. error_analysis.jsonl

This file also focuses on error correction but employs a different prompt strategy. Each line consists of three keys:

  • question: Sourced from the GSM8k testing set.
  • answer: Corresponding answer from the GSM8k testing set.
  • error_analysis: Includes a conversation between a user (A:) and an assistant (B:), where the model is prompted to independently determine the correctness of the answer without being explicitly told.

4. p2p_generation.jsonl

This file contains entries for problem generation tasks. Each line consists of three keys:

  • question: Sourced from the GSM8k testing set.
  • answer: Corresponding answer from the GSM8k testing set.
  • new_problem: A new problem generated by GPT-4 to serve as a reference answer.

Second part is the MathChat_sync dataset that can be used to fine-tune your own LLMs.

Due to the large size of the dataset, we put the file on this google drive:

https://drive.google.com/file/d/1nkAXAL9EpmDiceoV_qv6M00Lj0LA7MKX/view?usp=sharing

We hope these files aid in your analysis and development efforts. For any questions or contributions, please feel free to open an issue or submit a pull request. Happy coding!

About

Official code and data repository of MathChat: MathChat: Benchmarking Mathematical Reasoning and Instruction Following in Multi-Turn Interactions

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages