Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SYNPY-1316] Updating to caching logic to take in MD5 key #1008

Merged
merged 8 commits into from
Nov 6, 2023

Conversation

BryanFauble
Copy link
Contributor

Problem:
In the .cacheMap file there is a key-value pair of path to file for a particular file handle ID and it's associated modified time, for example:
{"/home/bfauble/synapseTestFiles/bryansTestProject/temp/file_1.txt": "2023-11-03T21:46:48.000Z"}

In cases where we are downloading files into the same directory and a file with the same name is matching the current cache mechanism can incorrectly assume the file is already present in the directory and not attempt to download a new copy with ifcollision='overwrite.local

Solution:

  • When we are adding new items to the cache store the file MD5 at the time of caching too. When we are checking for a cache hit verify if the local file has a different modification time and (if present) that the md5's match. If neither pass treat this as a cache miss.
  • The way this is written will allow for backwards compatibility if a cache entry only has a timestamp - it will continue to remain in the timestamp format until there is a cache miss.

Testing:

  • All passing integration and unit tests
  • I also verified the before/after functionality of my .cacheMap when the new key is and is not present and the expected cache hit/miss occurs
  • I also verified changing the MD5 value in the .cacheMap correct sees it as a cache miss
  • I also verified that changing the file prompts a cache miss

@pep8speaks
Copy link

pep8speaks commented Nov 3, 2023

Hello @BryanFauble! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

Line 364:89: E501 line too long (95 > 88 characters)

Line 561:89: E501 line too long (92 > 88 characters)
Line 582:5: F811 redefinition of unused 'test_cache_item_unmodified_modified_items_is_modified_timestamp' from line 552

Comment last updated at 2023-11-03 23:54:15 UTC

@BryanFauble BryanFauble marked this pull request as ready for review November 3, 2023 23:55
@BryanFauble BryanFauble requested a review from a team as a code owner November 3, 2023 23:55
Copy link
Member

@thomasyu888 thomasyu888 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💯 I'll take a more in depth look later, but the test cases and the code make sense. I had forgotten about the how we leverage the cachemap even when using those other parameters. Thanks @brucehoff for the catch!

We will only write to the cachemap if there is a cache miss correct?

@brucehoff
Copy link
Member

We will only write to the cachemap if there is a cache miss correct?

Yes: The cachemap records the state at the time of a file download (capturing the local time stamp). So writing only occurs when a file is downloaded (when there is a "cache miss".

@BryanFauble BryanFauble merged commit e283f88 into develop Nov 6, 2023
35 checks passed
@BryanFauble BryanFauble deleted the SYNPY-1316-caching-changes branch November 6, 2023 17:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants