Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Importing files in a more tolerant way #20

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

gatl
Copy link

@gatl gatl commented Jan 4, 2020

It is currently impossible to import files that are mixed with files that cannot be read by the current user. This patch will allow such situations to be handled gracefully and collect all the files that can be found, instead of throwing an exception and halting the process.

gatl added 4 commits January 1, 2020 00:41
It is possible that the user decides to import files that are mixed with other,
unrelated data. Some of those files may be unreadable by the user (e.g. files
owned by another user in a backup environment).
By catching said events the importation process will not fail and will proceed
by ignoring such files.
@Kalanyr
Copy link
Owner

Kalanyr commented Jan 5, 2020 via email

@gatl
Copy link
Author

gatl commented Jan 5, 2020

A rewrite? That is such a good news!

I made a list of things I would like to make different after taking a look at the code, but to implement them it would need a significant rewriting and before I could do that I would need to study the code in more detail just to ensure I would not break anything important. All that would add-up to about two months of work, just to future-proof the code.

As it stands, I got it to make what I needed, so all the rest is not so important, but I'm glad you are improving the code. If you don't mind, I'll leave a few thoughts for your consideration.

@Kalanyr
Copy link
Owner

Kalanyr commented Jan 5, 2020 via email

@gatl
Copy link
Author

gatl commented Jan 5, 2020

(Since this project has no issue tracking enabled, I'll leave my list in short in here):

  • Use JSON to store the manifest. This allows for other projects to read the manifest and closes the security hole that is the eval() being called on a data file.

  • Is the resume manifest really necessary? If we store a JSON object with the list of all games at the top of the manifest, and then add one JSON object entry (e.g. line) per game, we can always know where we left off and proceed. If we want to start over, just delete the manifest file! I think it is pretty clean.

  • When importing files, you don't need to compute the MD5 on all files. If the size of the file does not match the size of a file on your manifest, you can be pretty sure the MD5 won't match either. So filter for file size first and for MD5 second. You'll see big improvements on importing speed.

  • Still when importing, if the file size and the file name are the same, and no MD5 information is provided by GOG, then I perhaps I could trust that the file is the same. I would like to see this option added to make it easy to move/change my repository around.

  • Personally, I would consider dropping support for Python 2 (people who need it can continue to use the current version of gogrepo) to get a much cleaner, simpler and more modern codebase. Then the option of using Async IO could be put on the table. There would be no need to adjust the number of concurrent workers and the system would scale better with the bandwidth that is available. But that is just my opinion.

Thank you for your continuing effort. Hit me up when a new version is out or if you want want to discuss ideas with someone. :-)

@Kalanyr
Copy link
Owner

Kalanyr commented Jan 6, 2020 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants