AutoMathIC: Automatic Mathematic In-Context Example Generation for LLM Using Multi-Modal Consistency

This repository contains implementation source code and experimental results for automatic in-context example generation for advancing math-solving capability of LLM as described in the following paper:

Paper: Automatic Mathematic In-Context Example Generation for LLM Using Multi-Modal Consistency

AutoMathIC is a framework that automatically generates high-quality In- Context examples to enhance LLMs’ mathematical reasoning. In this implementation, AutoMathIC mutates an arithmatic questions and selects a subset of mutated questions for using it as In-Context examples using consistencies over multi-modalities over 4 math problem datasets(ASDiv, SVAMP, GSM8k and MultiArith). In this work, we use modality of Chain-Of-Thought, Code and Mathematical Equation. Results of the AutoMathIC is here. Supplemental artifacts for the results can be downloaded from here

Prerequisites

This application is written for Python=3.9.17. All requirements are listed in requirements.txt, and they are installed by pip with the following command.

pip install -r requirements.txt

Organization

This artifact repository consists of the following files and folders:

./src/python/*: Directory for source code in python

./_results/*: Directory for results running the source code

./_downloads/*: Directory for datasets used for running the source code

Usage

1. Mutation of target question for In-Context Examples

This step is to generate mutated math problems. The math problems are generated with the following command:

cd AuthoMathIC
# llm_model is between gpt3.5 for GPT-3.5 for and gpt4omini for GPT-4o-mini
# SVAMP
python -m src.python.main \
      --run mutate_nl \
      --llm_model_name "${llm_model}" \
      --dataset_name 'svamp'
# ASDiv
python -m src.python.main \
      --run mutate_nl \
      --llm_model_name "${llm_model}" \
      --dataset_name 'asdiv'

# MultiArith
python -m src.python.main \
      --run mutate_nl \
      --llm_model_name "${llm_model}" \
      --dataset_name 'multiarith'

# GSM8k
python -m src.python.main \
      --run mutate_nl \
      --llm_model_name "${llm_model}" \
      --dataset_name 'gsm8k'

Output after running the command are in the result directories of {PROJ_DIR}/_results/nl2nl/{DATASET}/mutation/ where DATASET is the name of math problem dataset among ASDiv, SVAMP, GSM8k and MultiArith. For the task and its result directory, the following files are generated:

_results/
|- nl2nl/
|  |- {DATASET}}/
|  |  |- mutation/
|  |  |  |- mut-nl-{CKSUM}.json

Where {CKSUM} represents the checksum value of each unique math problem. The mut-nl-{CKSUM}.json contains original math problem and its mutated math problems.

2. Generation of Multi-Modal LLM responses

This step is to obtain the LLM responses over multiple modalities. You can run it by executing the following command:

cd AuthoMathIC
# llm_model is between gpt3.5 for GPT-3.5 for and gpt4omini for GPT-4o-mini
# SVAMP
python -m src.python.main \
      --run evaluate_mm_llm \
      --llm_model_name "${llm_model}" \
      --dataset_name 'svamp'
# ASDiv
python -m src.python.main \
      --run evaluate_mm_llm \
      --llm_model_name "${llm_model}" \
      --dataset_name 'asdiv'

# MultiArith
python -m src.python.main \
      --run evaluate_mm_llm \
      --llm_model_name "${llm_model}" \
      --dataset_name 'multiarith'

# GSM8k
python -m src.python.main \
      --run evaluate_mm_llm \
      --llm_model_name "${llm_model}" \
      --dataset_name 'gsm8k'

Output after running the command are in the result directories of {PROJ_DIR}/_results/nl2nl/{DATASET}/evaluate_consistency/{LLM_MODEL} where DATASET is the name of math problem dataset among ASDiv, SVAMP, GSM8k and MultiArith and LLM_MODEL represents the name of LLMs. For the task and its result directory, the following files are generated:

_results/
|- nl2nl/
|  |- {DATASET}}/
|  |  |- evaluate_consistency/
|  |  |  |- {LLM_MODEL}/
|  |  |  |  |- fg-eval-{CKSUM}.json
|  |  |  |  |- eval-{CKSUM}.json

Where {CKSUM} represents the checksum value of each unique math problem. The eval-{CKSUM}.json and fg-eval-{CKSUM}.json contains LLM responses for original math problem and its mutated math problems over different modalities.

3. Optimization of LLM Responses Using Mutated In-Context Examples

This step is to select In-Context Examples among the mutated questions and generate the LLM responses using them. You can run it by executing the following command:

cd AuthoMathIC
# llm_model is between gpt3.5 for GPT-3.5 for and gpt4omini for GPT-4o-mini
# SVAMP
python -m src.python.main \
      --run genetic_fg_alg \
      --llm_model_name "${llm_model}" \
      --dataset_name 'svamp'
# ASDiv
python -m src.python.main \
      --run genetic_fg_alg \
      --llm_model_name "${llm_model}" \
      --dataset_name 'asdiv'

# MultiArith
python -m src.python.main \
      --run genetic_fg_alg \
      --llm_model_name "${llm_model}" \
      --dataset_name 'multiarith'

# GSM8k
python -m src.python.main \
      --run genetic_fg_alg \
      --llm_model_name "${llm_model}" \
      --dataset_name 'gsm8k'

Output after running the command are in the result directories of {PROJ_DIR}/_results/genetic_fg/{DATASET}/evaluate_consistency/{LLM_MODEL} where DATASET is the name of math problem dataset among ASDiv, SVAMP, GSM8k and MultiArith and LLM_MODEL represents the name of LLMs. For the task and its result directory, the following files are generated:

_results/
|- genetic_fg/
|  |- {DATASET}}/
|  |  |- evaluate_consistency/
|  |  |  |- {LLM_MODEL}/
|  |  |  |  |- final_answers.json

Where {CKSUM} represents the checksum value of each unique math problem. The final_answers.json contains final LLM responses for original math problems using selected mutations as In-Context examples for the original math problems.

Artifact

Supplemental artifacts for the results can be downloaded from here

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

AutoMathIC: Automatic Mathematic In-Context Example Generation for LLM Using Multi-Modal Consistency

Table of Contents

Prerequisites

Organization

Usage

1. Mutation of target question for In-Context Examples

2. Generation of Multi-Modal LLM responses

3. Optimization of LLM Responses Using Mutated In-Context Examples

Artifact

Files

README.md

Latest commit

History

README.md

File metadata and controls

AutoMathIC: Automatic Mathematic In-Context Example Generation for LLM Using Multi-Modal Consistency

Table of Contents

Prerequisites

Organization

Usage

1. Mutation of target question for In-Context Examples

2. Generation of Multi-Modal LLM responses

3. Optimization of LLM Responses Using Mutated In-Context Examples

Artifact