Official code and data repository of MathChat: MathChat: Benchmarking Mathematical Reasoning and Instruction Following in Multi-Turn Interactions
Welcome to the repository! This directory contains four JSONL files in our proposed MathChat benchmark:
First part is the benchmark files:
This file contains entries that facilitate follow-up questioning. Each line consists of three keys:
- question: Sourced from the GSM8k testing set.
- answer: Corresponding answer from the GSM8k testing set.
- followup: Includes two rounds of follow-up questions and reference answers, formatted as a conversation between a user (A:) and an assistant (B:).
This file is designed for error correction tasks. Each line consists of three keys:
- question: Sourced from the GSM8k testing set.
- answer: Corresponding answer from the GSM8k testing set.
- error_correction: Contains a conversation between a user (A:) and an assistant (B:), which includes the original question, an incorrect answer, and the process of correcting the error.
This file also focuses on error correction but employs a different prompt strategy. Each line consists of three keys:
- question: Sourced from the GSM8k testing set.
- answer: Corresponding answer from the GSM8k testing set.
- error_analysis: Includes a conversation between a user (A:) and an assistant (B:), where the model is prompted to independently determine the correctness of the answer without being explicitly told.
This file contains entries for problem generation tasks. Each line consists of three keys:
- question: Sourced from the GSM8k testing set.
- answer: Corresponding answer from the GSM8k testing set.
- new_problem: A new problem generated by GPT-4 to serve as a reference answer.
Second part is the MathChat_sync dataset that can be used to fine-tune your own LLMs.
Due to the large size of the dataset, we put the file on this google drive:
https://drive.google.com/file/d/1nkAXAL9EpmDiceoV_qv6M00Lj0LA7MKX/view?usp=sharing
We hope these files aid in your analysis and development efforts. For any questions or contributions, please feel free to open an issue or submit a pull request. Happy coding!