Data sets, scripts, and analyses of many-body machine learning (mbML) potentials for water, acetonitrile, and methanol.
Our workflow takes advantage of both NumPy npz
and exdir
files along with open-source software developed by members of our group (i.e., mbgdml and reptar).
Below is a short explanation of what resides in each directory which contains other README
files when necessary.
-
data
: contains (almost) all data pertaining towards the development and application of mbML potentials considered here. This includes GFN2-xTB MD simulations for$n$ -body sampling; MP2/def2-TZVP energies and forces of$n$ -body structures and isomers; MD simulations driven by MP2 and various mbML potentials. Due to GitHub file size limitations, the large MD trajectory files are archived on Zenodo. -
training-logs
: contains GDML, GAP, and SchNet training scripts and logs for 1-, 2-, and 3-body models for the solvents considered here. The resulting models are archived on Zenodo. -
scripts
: all Python scripts used to prepare the manuscript. This includes scripts to train models, run molecular dynamics simulations, convert file types, analyze model predictions, create plots, etc. All Python scripts that generate figures with matplotlib are labeled with afigure-
prefix. -
analysis
: mainly figures used for the results and discussion along with some postprocessing data.
All information, data, and figures presented in our manuscript can be directly reproduced with the relevant code and data stored in this repository.
Obviously there are inherent limitations on reproducibility such as different environments, computers, and long-term hosting of these repositories.
We cannot do much about the last two, but we do provide a requirements.txt
file the specifies packages and versions we used with Python 3.10.4 on Ubuntu 20.04.
As mentioned previously, we were not able to fit every file into this repository. Trained models and MD simulation data are archived separately on Zenodo. These repositories would need to be downloaded and extracted in the same directory as this repository. You may need to adjust the relative paths to data in the provided scripts.
If you are trying to reproduce this work, we thank you for your service to the academic community! We tried to be as transparent about our data, code, and analyses as possible; however, please contact the corresponding authors with any questions or difficulties.
This repository supports the following work:
Maldonado, A. M.; Poltavsky, I.; Vassilev-Galindo, V.; Tkatchenko, A.; Keith, J. A. Modeling molecular ensembles with gradient-domain machine learning force fields. DOI: 10.26434/chemrxiv-2023-wdd1r
This work is licensed under a Creative Commons Attribution 4.0 International License.