Add Sympy equivalence for MATH / GSM8K? #170

lewtun · 2024-04-23T14:09:48Z

In the Minerva and LLeMMa papers, sympy is used to ensure equivalence of predicted / gold answers, e.g. ensuring $1/ \sqrt{3}$ and $\sqrt{3}/3$ are treated the same. From the Minerva paper:

After applying this normalization function, we checked whether the formatted target and prediction strings are SymPy-equivalent. SymPy equivalence is determined by parsing the answers via sympy.parsing.latex.parse_latex and then checking whether substracting the two resulting SymPy objects and applying sympy.simplify gives zero. We set a timeout of 5s when calling sympy.simplify, and labeled strings as nonequivalent if this timeout was exceeded.
For MATH problems, SymPy equivalence improved overall accuracy by around 1%. See Table 6 for the accuracies in MATH with only exact string match vs. SymPy equivalence.

Although the difference between Minerva & OpenAI models was only 1%, would it make sense to add sympy to the MATH metric for both correctness and potentially uncovering larger variation among open models?

The text was updated successfully, but these errors were encountered:

clefourrier · 2024-05-03T09:04:04Z

Hi!
Yes, I think that's a very good idea.

clefourrier · 2024-12-05T10:19:11Z

cc @NathanHB since you've been working on this a bit?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Sympy equivalence for MATH / GSM8K? #170

Add Sympy equivalence for MATH / GSM8K? #170

lewtun commented Apr 23, 2024

clefourrier commented May 3, 2024

clefourrier commented Dec 5, 2024

Add Sympy equivalence for MATH / GSM8K? #170

Add Sympy equivalence for MATH / GSM8K? #170

Comments

lewtun commented Apr 23, 2024

clefourrier commented May 3, 2024

clefourrier commented Dec 5, 2024