add support for run-bug-run runbugrun #39 WIP #166

cadddr · 2024-10-30T01:28:00Z

#39 WIP

setup.sh

andre15silva · 2024-12-03T08:28:26Z

Hi @cadddr !

There is currently a failure in the RunBugRun tests.

See https://github.com/ASSERT-KTH/repairbench-framework/actions/runs/12126690735/job/33810455158?pr=166#step:13:337

Seems to be an error in loading the dataframe.

cadddr · 2024-12-03T17:33:15Z

Since this isn’t file not found, could be a version/deprecation issue with pandas? What version is being installed, so I can reproduce? Thanks

…

On Mon, Dec 2, 2024 at 10:28 PM André Silva ***@***.***> wrote: Hi @cadddr <https://github.com/cadddr> ! There is currently a failure in the RunBugRun tests. See https://github.com/ASSERT-KTH/repairbench-framework/actions/runs/12126690735/job/33810455158?pr=166#step:13:337 Seems to be an error in loading the dataframe. — Reply to this email directly, view it on GitHub <#166 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ABWMCVO4CSBZWZCTHME2NUT2DVTUBAVCNFSM6AAAAABQ3CF6BWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDKMJTHA2TCNZQGE> . You are receiving this because you were mentioned.Message ID: ***@***.***>

andre15silva · 2024-12-03T17:37:58Z

What version is being installed, so I can reproduce?

When you run poetry install it will use the version that is defined in the poetry.lock file.

Right now that is 2.2.3

cadddr · 2024-12-16T22:47:38Z

I also have pandas==2.2.3 In the log there is a deprecation warning for using path string as argument to read_json. Wrapped into file stream. Hopefully this passes, otherwise, not sure how to debug this.

andre15silva · 2024-12-17T08:44:59Z

Now the problem seems to be related with a FileNotFound.

cadddr · 2024-12-18T00:16:37Z

Now the problem seems to be related with a FileNotFound.

My path: /workspaces/elle-elle-aime/benchmarks/run_bug_run/python_valid0.jsonl

Double checked commands in setup.sh download and unpack the file correctly:

mkdir benchmarks/run_bug_run
cd benchmarks/run_bug_run
wget https://github.com/giganticode/run_bug_run_data/releases/download/v0.0.1/python_valid0.jsonl.gz
wget https://github.com/giganticode/run_bug_run_data/releases/download/v0.0.1/tests_all.jsonl.gz

gzip -d python_valid0.jsonl.gz

Why is the working dir in the log repairbench-framework/repairbench-framework different from elle-elle-aime ? Could this be the problem?

andre15silva · 2024-12-18T11:02:01Z

Trying to fix path, and rebased to latest master. Let's see if we can fix this.

andre15silva · 2024-12-18T13:28:27Z

Fixed the file not found problem by changing the benchmark directory to a submodule.

We not get another error, during the execution of a RunBugRun bug.

cadddr · 2024-12-27T21:55:09Z

Thanks for fixing the paths. Bug-related errors do not consistently reproduce since we're taking 3 bugs from an unordered dict. After fixing the order (and running 20 bugs instead of 3), I'm getting the first failure fail on p02273_118997. The reason is fixed solution isn't passing due to the lack of an exact string match. (I mentioned earlier that inputs/outputs in run bug run are being passed via standard io as strings). Example:

print (result)
0 0
11.111111111111112 0.0
16.666666666666668 9.622504486493764
22.222222222222225 0.0
33.333333333333336 0.0
38.88888888888889 9.622504486493762
33.333333333333336 19.24500897298752
44.44444444444444 19.245008972987524
50.0 28.867513459481287
55.55555555555556 19.245008972987527
66.66666666666667 19.245008972987527
61.111111111111114 9.622504486493764
66.66666666666667 0.0
77.77777777777779 0.0
83.33333333333334 9.622504486493753
88.88888888888889 0.0
100 0

print (test_output)
0.00000000 0.00000000
11.11111111 0.00000000
16.66666667 9.62250449
22.22222222 0.00000000
33.33333333 0.00000000
38.88888889 9.62250449
33.33333333 19.24500897
44.44444444 19.24500897
50.00000000 28.86751346
55.55555556 19.24500897
66.66666667 19.24500897
61.11111111 9.62250449
66.66666667 0.00000000
77.77777778 0.00000000
83.33333333 9.62250449
88.88888889 0.00000000
100.00000000 0.00000000

andre15silva · 2024-12-28T16:59:04Z

Thanks for noticing the randomness bug! I fixed the issue for all benchmarks and rebased this PR with the latest commits from master.

For the problem of comparing outputs, the straigh-forward solution would be to eval the strings and compare the values.
Are the outputs always floats or can they be of other types?

cadddr mentioned this pull request Oct 30, 2024

add support for run-bug-run runbugrun #39

Open

cadddr force-pushed the master branch from febe8e4 to 57b2c05 Compare November 13, 2024 02:05

andre15silva reviewed Nov 27, 2024

View reviewed changes

setup.sh Outdated Show resolved Hide resolved

andre15silva force-pushed the master branch from 1b2ebea to b9b8c6f Compare December 18, 2024 11:01

andre15silva force-pushed the master branch from b36ef87 to 766bcec Compare December 28, 2024 15:29

cadddr and others added 16 commits December 28, 2024 17:50

initial run bug run

a9e7dd1

run bug run tests and prompts

1b101dc

skip running tests if error

3a12eb1

clean up prompt tests format

7264252

cache test results on first run

c161b34

fix parsing of multiline test inputs/outputs

19b4169

prompt tests for run bug run

28cc9c0

upload cached test outputs; remove submodule; uncomment setup;

1622d83

hardcode buggy subdir

a481160

run black; fix tgz

57ca277

wrap filename into stream for pd.read_json

9945af3

update setup.sh

16a8cd8

update setup.sh

4c7a31e

remove run-bug-run dir

721efa5

add run-bug-run as submodule

6959970

fix setup?

b83f855

andre15silva and others added 3 commits December 28, 2024 17:50

fix setup?

88e92ba

sort bugs for test reproducibility

00a78fe

remove list() from get_bugs()

d411e0f

andre15silva force-pushed the master branch from 766bcec to d411e0f Compare December 28, 2024 16:53

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add support for run-bug-run runbugrun #39 WIP #166

add support for run-bug-run runbugrun #39 WIP #166

cadddr commented Oct 30, 2024

andre15silva commented Dec 3, 2024

cadddr commented Dec 3, 2024 via email

andre15silva commented Dec 3, 2024

cadddr commented Dec 16, 2024

andre15silva commented Dec 17, 2024

cadddr commented Dec 18, 2024

andre15silva commented Dec 18, 2024

andre15silva commented Dec 18, 2024

cadddr commented Dec 27, 2024

andre15silva commented Dec 28, 2024

add support for run-bug-run runbugrun #39 WIP #166

Are you sure you want to change the base?

add support for run-bug-run runbugrun #39 WIP #166

Conversation

cadddr commented Oct 30, 2024

andre15silva commented Dec 3, 2024

cadddr commented Dec 3, 2024 via email

andre15silva commented Dec 3, 2024

cadddr commented Dec 16, 2024

andre15silva commented Dec 17, 2024

cadddr commented Dec 18, 2024

andre15silva commented Dec 18, 2024

andre15silva commented Dec 18, 2024

cadddr commented Dec 27, 2024

andre15silva commented Dec 28, 2024