Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add support for run-bug-run runbugrun #39 WIP #166

Open
wants to merge 19 commits into
base: master
Choose a base branch
from

Conversation

cadddr
Copy link

@cadddr cadddr commented Oct 30, 2024

#39 WIP

setup.sh Outdated Show resolved Hide resolved
@andre15silva
Copy link
Member

Hi @cadddr !

There is currently a failure in the RunBugRun tests.

See https://github.com/ASSERT-KTH/repairbench-framework/actions/runs/12126690735/job/33810455158?pr=166#step:13:337

Seems to be an error in loading the dataframe.

@cadddr
Copy link
Author

cadddr commented Dec 3, 2024 via email

@andre15silva
Copy link
Member

What version is being installed, so I can reproduce?

When you run poetry install it will use the version that is defined in the poetry.lock file.

Right now that is 2.2.3

@cadddr
Copy link
Author

cadddr commented Dec 16, 2024

I also have pandas==2.2.3 In the log there is a deprecation warning for using path string as argument to read_json. Wrapped into file stream. Hopefully this passes, otherwise, not sure how to debug this.

@andre15silva
Copy link
Member

Now the problem seems to be related with a FileNotFound.

@cadddr
Copy link
Author

cadddr commented Dec 18, 2024

Now the problem seems to be related with a FileNotFound.

My path: /workspaces/elle-elle-aime/benchmarks/run_bug_run/python_valid0.jsonl

Double checked commands in setup.sh download and unpack the file correctly:

mkdir benchmarks/run_bug_run
cd benchmarks/run_bug_run
wget https://github.com/giganticode/run_bug_run_data/releases/download/v0.0.1/python_valid0.jsonl.gz
wget https://github.com/giganticode/run_bug_run_data/releases/download/v0.0.1/tests_all.jsonl.gz

gzip -d python_valid0.jsonl.gz

Why is the working dir in the log repairbench-framework/repairbench-framework different from elle-elle-aime ? Could this be the problem?

@andre15silva
Copy link
Member

Trying to fix path, and rebased to latest master. Let's see if we can fix this.

@andre15silva
Copy link
Member

Fixed the file not found problem by changing the benchmark directory to a submodule.

We not get another error, during the execution of a RunBugRun bug.

@cadddr
Copy link
Author

cadddr commented Dec 27, 2024

Thanks for fixing the paths. Bug-related errors do not consistently reproduce since we're taking 3 bugs from an unordered dict. After fixing the order (and running 20 bugs instead of 3), I'm getting the first failure fail on p02273_118997. The reason is fixed solution isn't passing due to the lack of an exact string match. (I mentioned earlier that inputs/outputs in run bug run are being passed via standard io as strings). Example:

print (result)
0 0
11.111111111111112 0.0
16.666666666666668 9.622504486493764
22.222222222222225 0.0
33.333333333333336 0.0
38.88888888888889 9.622504486493762
33.333333333333336 19.24500897298752
44.44444444444444 19.245008972987524
50.0 28.867513459481287
55.55555555555556 19.245008972987527
66.66666666666667 19.245008972987527
61.111111111111114 9.622504486493764
66.66666666666667 0.0
77.77777777777779 0.0
83.33333333333334 9.622504486493753
88.88888888888889 0.0
100 0

print (test_output)
0.00000000 0.00000000
11.11111111 0.00000000
16.66666667 9.62250449
22.22222222 0.00000000
33.33333333 0.00000000
38.88888889 9.62250449
33.33333333 19.24500897
44.44444444 19.24500897
50.00000000 28.86751346
55.55555556 19.24500897
66.66666667 19.24500897
61.11111111 9.62250449
66.66666667 0.00000000
77.77777778 0.00000000
83.33333333 9.62250449
88.88888889 0.00000000
100.00000000 0.00000000

@andre15silva
Copy link
Member

Thanks for noticing the randomness bug! I fixed the issue for all benchmarks and rebased this PR with the latest commits from master.

For the problem of comparing outputs, the straigh-forward solution would be to eval the strings and compare the values.
Are the outputs always floats or can they be of other types?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants