Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question about how to evaluate a trained model #12

Open
huishan-cen opened this issue Jun 13, 2021 · 3 comments
Open

Question about how to evaluate a trained model #12

huishan-cen opened this issue Jun 13, 2021 · 3 comments

Comments

@huishan-cen
Copy link

Hi, I recently started trying this repo and found it really cool!
I have managed to run the example in examples/example_scripts/train_model/ on some data and would like to use the final model to evaluate some other molecules. I know that the neuralxc sc ... command can do the testing if I provide a testing.traj.
However, I'd like to use the neuralxc eval ... command so I that I don't have to re-train the same model.
The --hdf5 argument requires the path to hdf5 file, baseline data, reference data. I assume the last one refers to a testing.traj like the one used with neuralxc sc ... in the example. However, I not sure what the first two files refer to and how to get them and couldn't find an example in the repo. Could you please give some advice or examples?

Moreover, I'm wondering how to set n_max and l_max as mentioned in the paper. I can't seem to find these options in the hyperparameters.json or the basis.json file.

@huishan-cen huishan-cen changed the title Question about how to evaluate a model Question about how to evaluate a trained model Jun 13, 2021
@semodi
Copy link
Owner

semodi commented Jul 5, 2021

Hi,

The gist is that, once you have a fully trained model, you can use it in self-consistent calculations with PySCF to "evaluate some other molecules". The fastest/easiest way to do so would be to put all molecules you want to compute into a .xyz or .traj file and use neuralxc engine on it. Inside the configuration file, you need to add a keyword "nxc": "path_to_model" in the engine section to instruct PySCF to use your newly trained model.
I updated both the README file as well as the documentation in the last pull request. You should find a more detailed answer to your question in the Model deployment section.

As to the n_max and l_max, these options are only relevant if one is using a polynomial basis as we did in the paper. In the newer examples provided, we use a Gaussian (PySCF) basis set for which these options have no effect. Again sorry that this wasn't adressed in the previous docs but it should now be covered in the updated version here and here.

Let me know if that answers your question!

@huishan-cen
Copy link
Author

huishan-cen commented Jul 14, 2021

Hi,
Thanks for the updated docs. I have managed to use the trained models with PySCF now. A couple of other questions:

  1. The energies provided for water in the quickstart tutorials are all quite small. I assume these are target energies, i.e. the difference between CCSDT and DFT+PBE in eV, since the zero point energy of water should be around -76 hartree?
  2. I have been trying to extract the descriptors c and d with the code. When I use the neuralxc pre command, I can see that the output is a numpy array stored in a .npy file. I think these are the unsymmetrised descriptors c of the structures in the input .xyz, however, I can't figure out the order of the array, i.e. which indices correspond to which atom or which n,l,m etc.
  3. Can you confirm that the polynomial basis projector only available with SIESTA as QM engine? I get the impression that when using PySCF as the QM engine, only GTO basis projector is avaliable.
    Thanks again for answering my questions.

@semodi
Copy link
Owner

semodi commented Oct 16, 2021

Hi, to answer your questions:

  1. For the water dataset, the monomer energies are in fact not taken from CCSDT calculations but from a highly accurate water monomer PES which sets the energy of the water monomer in its equilibrium geometry to zero. Right now NeuralXC doesn't really care about absolute energy differences but tries to get the relative energies between different conformers of the same molecule right.
  2. The output is produced by three for loops like:
      for n in range(1, n_max+1):
            for l in range(n):
                for m in range(-l,l+1):
                      ...

This should clarify the ordering
3. Any basis can be used with both SIESTA and PySCF as QM engine, however keep in mind that the grid kind has to be set to the right value: euclidean grid for Siesta and radial grid for PySCF. If you encounter problems, please feel free to attach your error messages here so that I can help troubleshoot.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants