-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Dataset caching overhaul and additional datasets #91
Dataset caching overhaul and additional datasets #91
Conversation
At this point still cannot train with this dataset; need to resolve some issues related to default properties. Currently causes an error due to Q not being defined in ani2x (logic needs to check if "none" is defined...started changing datastructure for properties). |
…ing_and_more_data
…e. atomic self energies now are defined with energy units (but returned without units, in our base unit system when used in removing self-energy). ANI2x now loads
… python version used to generate them causing issues.
…ing_and_more_data
Copying from the issue #84: The general sequence of loading data is
Ideally, we want to use the .npz file if it exists, and skip all the rest. If the .npz doesn't exist, we should check to see if the .hdf5 file exists. If not, we will check to see if the .hdf5.gz exists. If not, we will download. However, we can't just rely on seeing if a file exists, we need to make sure that it is the correct file. A few changes I've been making here:
|
Each Dataset can be initialized by setting the "local_cache_dir" which will specify where the datafiles end up. |
@wiederm the equivalent tests seem to be failing for sake on macOS python 3.10. I resolved these issues earlier by seeding the random number generator in the equivalence_test_utils. It appears you commented this out (so the tests may not be identical each time). I think there might be an issue with using Euler angles to generate the rotation, rather than quaternions. |
…ers to Apple Silicon
…s (e.g., limiting to max of 10 per record, for a total of 1000, for unit testing).
…d test_dataset.py tests
…sing to have my improved class that uses units.
…l datasets at the current moment).
…l datasets at the current moment).
…l datasets at the current moment).
…ts tested in test_models.py to qm9 and ani2x test sets.
…ts tested in test_models.py to qm9 and ani2x test sets.
Ok, tests are all set up now to handle looping over multiple datasets. I had to cut down a few of the datasets for the test_models; I think we were running out of memory on some of the tests on CI, as they were passing locally (the point is to test the NNP, not the dataset; I still keep qm9 and ani2x so we have some variety). |
Description
This PR will focus on two key things:
Todos
Notable points that this PR has either accomplished or will accomplish.
Status