-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
File hashes #84
Comments
The general sequence of the data loader is
Ideally, we want to use the .npz file if it exists, and skip all the rest. If the .npz doesn't exist, we should check to see if the .hdf5 file exists. If not, we will check to see if the .hdf5.gz exists. If not, we will download. However, we can't just rely on seeing if a file exists, we need to make sure that it is the correct file. A few changes I've been making in PR #91 .
|
This has been addressed in the linked PRs. Closing for now. |
When we got to train, we will load up a curated dataset, creating local files which we cache for future use. We need to store/compare the hash of these files to ensure that we are not working with an incorrect or partially generated file. This should be trivial for the gzipped hdf5 files, but the npz files we generate locally, we will need to probably generate some metadata file after it is created that stores the hash, since this hash is not known beforehand.
The text was updated successfully, but these errors were encountered: