Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ensure that database value of md5 is retracted when re-calculating hash. #76

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

cmungall
Copy link
Member

Fixes #75
However, causes md5.time test to fail

@cmungall cmungall requested a review from ihh December 15, 2019 22:26
@cmungall
Copy link
Member Author

@ihh I'm not totally grokking the md5.time test which now fails, will try and look into this more later

@ihh
Copy link
Member

ihh commented Dec 15, 2019

I guess there are two possibilities: the test is now broken, or it was broken before and you just fixed it, but the "what it's meant to look like" file dates from when it was broken....

I'll look when I get a chance.....

@rulatir
Copy link

rulatir commented Apr 5, 2020

Boomp?

@rulatir
Copy link

rulatir commented May 29, 2020

Any chance this might be looked at again?

@ihh
Copy link
Member

ihh commented May 29, 2020

@cmungall The purpose of this test (test-159) was to check that biomake would recompute the MD5 hash if the file in which the hash was cached had an older timestamp than the file whose hash was supposedly cached there. The correct behavior in this case is to recompute the hash but NOT rebuild the file (it is not the file that is incorrect, but the hash).

This particular test-159 tries to check this behavior as follows:

  • the Makefile has rules that will echo the world "hello" to the file "hello", the word "world" to the file "world", and then the concatenation of the files "hello" and "world" to the file "hello_world"
  • the md5 checksums that the test copies into its working directory effectively assert that all this has already happened, even though the test has actually written "wrong" to "hello" and "wrong_wrong" to "hello_world"
  • the test then waits for a second and writes the word "wrong" to the file "world". So now the hash file (in .biomake/md5/world) has an older timestamp than the file it refers to.
  • Running biomake at this point should rebuild the hash file without rebuilding the "world" file, and then (because the hashfile for "hello_world" now indicates that it is out of date with respect to "world") it should rebuild "hello_world" without rebuilding anything else
  • The end result of this should, therefore, be that "hello_world" (still) contains the string "wrong wrong", which is the concatenation of the files "hello" and "world".

Internally, a call to update_md5_file to recompute the hash is triggered from read_md5_file when the hash file's timestamp is seen to be old. This occurs correctly. What seems to be happening that is INCORRECT is that the file "world" is then getting rebuilt anyway. This didn't matter before because the rebuild was (buggily) not updating the Prolog database's copy of the MD5 hash. So the bug was being masked. What happens as a result of your patch (which forces update_md5_file to retract the hash) is that the accidental rebuild of the target then causes "hello_world" to be rebuilt as well.

The problem is exacerbated by a poorly designed test. If the initial value of "hello_world" was not "wrong_wrong" but something else, then the problem would have been exposed earlier.

You can see debug messages to this effect turning on the --debug md5 option. E.g. to run this test manually from the top-level directory:

mkdir t/mytest
cd t/mytest
cp ../ref/md5.time/* .
echo wrong > hello
echo wrong_wrong > hello_world
sleep 1
cp ../target/md5.checksums/* .biomake/md5
sleep 1
echo wrong > world
/Users/yam/biomake/bin/biomake --debug md5 -H hello_world

Do this with and without your extra retract_md5_hash line, and you will see a line echo world >world in both cases, but only with your retract_md5_hash does this cause the hash to be recomputed.

I now need to figure out why update_md5_hash is causing a rebuild of the target.

@ihh
Copy link
Member

ihh commented May 29, 2020

The more I think about it, the more I think that rebuilding the target file actually is the correct behavior if its MD5 hash file turns out to be stale. It should only not rebuild the file if the MD5 hash, when recomputed, turns out to be the same as the cached value in the file. So I updated test-159 to reflect this scenario (recomputed hash is the same as what was already stored), and added a new test (test-160) for the converse situation when the recomputed hash is different from what was stored, triggering a rebuild.

@cmungall if you could add a test confirming that this fixes #75, that'd be great

In the meantime @rulatir I think this should address your needs, hopefully. Thanks for your persistence.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Dependency paths longer than 2 nodes unsupported?
3 participants