You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Curation routines for SPICE, QM9, and ANI1x are already in place; we still need to have a curation routine for ANI2x.
A dataset (that inherits from HDF5Dataset), has thus far only been set up for QM9; these need to be defined for the other datasets.
A few notes regarding SPICE:
Currently, we have curation schemes set up for both SPICE 1.1.4 (i.e., the data associated with the paper) and what I've just called "openff" spice. Openff spice is effectively same data as in 1.1.4, but calculated at the openff level of theory and retrieved from QCArchive.
Since openff spice is being retrieved from QCArchive, we can easily associated the "source" with each entry (e.g., "SPICE PubChem Set 1 Single Points Dataset v1.2"). This may be very useful for future testing purposes to be able to easily filter out subsets of data.
The HDF5 file of version 1.1.4 for SPICE includes filtering out of configurations with very high forces. It might be good to also identify those molecules in the openff_spice; we might not want to completely exclude them but rather provide an attribute in the hdf5 file to allow us to remove them if desired
We will also add to openff spice curation a quantity "DFT_total_force" rather than gradient (so we don't have to change the sign of this at training, and ensure that gradient is not accidentally used in place of force).
The text was updated successfully, but these errors were encountered:
Curation routines for SPICE, QM9, and ANI1x are already in place; we still need to have a curation routine for ANI2x.
A dataset (that inherits from HDF5Dataset), has thus far only been set up for QM9; these need to be defined for the other datasets.
A few notes regarding SPICE:
Currently, we have curation schemes set up for both SPICE 1.1.4 (i.e., the data associated with the paper) and what I've just called "openff" spice. Openff spice is effectively same data as in 1.1.4, but calculated at the openff level of theory and retrieved from QCArchive.
Since openff spice is being retrieved from QCArchive, we can easily associated the "source" with each entry (e.g., "SPICE PubChem Set 1 Single Points Dataset v1.2"). This may be very useful for future testing purposes to be able to easily filter out subsets of data.
The HDF5 file of version 1.1.4 for SPICE includes filtering out of configurations with very high forces. It might be good to also identify those molecules in the openff_spice; we might not want to completely exclude them but rather provide an attribute in the hdf5 file to allow us to remove them if desired
We will also add to openff spice curation a quantity "DFT_total_force" rather than gradient (so we don't have to change the sign of this at training, and ensure that gradient is not accidentally used in place of force).
The text was updated successfully, but these errors were encountered: