Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Sympy equivalence for MATH / GSM8K? #170

Open
lewtun opened this issue Apr 23, 2024 · 2 comments
Open

Add Sympy equivalence for MATH / GSM8K? #170

lewtun opened this issue Apr 23, 2024 · 2 comments

Comments

@lewtun
Copy link
Member

lewtun commented Apr 23, 2024

In the Minerva and LLeMMa papers, sympy is used to ensure equivalence of predicted / gold answers, e.g. ensuring $1/ \sqrt{3}$ and $\sqrt{3}/3$ are treated the same. From the Minerva paper:

After applying this normalization function, we checked whether the formatted target and prediction strings are SymPy-equivalent. SymPy equivalence is determined by parsing the answers via sympy.parsing.latex.parse_latex and then checking whether substracting the two resulting SymPy objects and applying sympy.simplify gives zero. We set a timeout of 5s when calling sympy.simplify, and labeled strings as nonequivalent if this timeout was exceeded.
For MATH problems, SymPy equivalence improved overall accuracy by around 1%. See Table 6 for the accuracies in MATH with only exact string match vs. SymPy equivalence.

Although the difference between Minerva & OpenAI models was only 1%, would it make sense to add sympy to the MATH metric for both correctness and potentially uncovering larger variation among open models?

@clefourrier
Copy link
Member

Hi!
Yes, I think that's a very good idea.

@clefourrier
Copy link
Member

cc @NathanHB since you've been working on this a bit?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants