Skip to content

Commit

Permalink
Merge pull request #1 from amazon-science/update_v1.1
Browse files Browse the repository at this point in the history
add v1.1 changes and changelog
  • Loading branch information
benathi authored Dec 27, 2022
2 parents 404f95c + 6b76f03 commit c035b0a
Show file tree
Hide file tree
Showing 5 changed files with 148 additions and 141 deletions.
7 changes: 7 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
Changelog

## v1.1
* Bugfix:<br>
Fix unit test cases for 47 problems’ test assertions for C#/TypeScript/Go, which represents ~5% of all problems:<br>
Root cause of the issue is a possibility for the input parameters to the canonical solutions get mutated as a side-affect which cause the captured input to mismatch.<br>
We fix this issue by saving another copy of the function input before passing it for execution.
8 changes: 4 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Execution-based evaluation of code in 10+ languages

This repository contains code to perform execution-based multi-lingual evaluation of code generation capabilities and the corresponding data, namely, a multi-lingual benchmark MBXP, multi-lingual MathQA and multi-lingual HumanEval. Results and findings can be found in the paper [Multi-lingual Evaluation of Code Generation Models](https://arxiv.org/abs/2210.14868).
This repository contains code to perform execution-based multi-lingual evaluation of code generation capabilities and the corresponding data, namely, a multi-lingual benchmark MBXP, multi-lingual MathQA and multi-lingual HumanEval. Results and findings can be found in the paper "Multi-lingual Evaluation of Code Generation Models" (https://arxiv.org/abs/2210.14868).


## Paper summary
Expand Down Expand Up @@ -85,8 +85,8 @@ You can check the programming-language dependency installation by running the ab
| Dataset | pass@1 |
|---------|--------|
| MBCPP | 79.60% |
| MBCSP | 65.81% |
| MBGP | 40.79% |
| MBCSP | 63.63% |
| MBGP | 39.19% |
| MBJP | 85.30% |
| MBJSP | 78.67% |
| MBKP | 63.77% |
Expand All @@ -96,7 +96,7 @@ You can check the programming-language dependency installation by running the ab
| MBRBP | 58.90% |
| MBSCP | 42.96% |
| MBSWP | 29.40% |
| MBTSP | 89.67% |
| MBTSP | 87.29% |



Expand Down
Loading

0 comments on commit c035b0a

Please sign in to comment.