[QUESTION]: Small dataset for training #49

varys50 · 2024-02-21T21:48:19Z

What are you trying to do?
My training data is small (19 observations) and my lookup pool is only 1500 structures. I am wondering how I should set up the various parameters in the config file to account for this?

davidegraff · 2024-02-21T22:41:05Z

In some small experiments that were never published, we found that GP regressor with a matern-5/2 kernel on morgan/pair fingerprints works pretty well in very low-sample regimes. It's not in the repo, but if you're willing to implement it yourself, we also found that a Tanimoto kernel worked even better. One of the limitations with the GP (and why we never included its results in the original paper) is that it doesn't scale to very large pools without significant engineering and some approximations, but 1500 structures is more than small enough to quickly generate predictions even. At that scale, I'd also recommend going with batch_size=1.

varys50 · 2024-02-21T22:59:38Z

Thanks! Would I need to modify the args.py file to allow for different kernels?

varys50 added the question Further information is requested label Feb 21, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[QUESTION]: Small dataset for training #49

[QUESTION]: Small dataset for training #49

varys50 commented Feb 21, 2024

davidegraff commented Feb 21, 2024

varys50 commented Feb 21, 2024

[QUESTION]: Small dataset for training #49

[QUESTION]: Small dataset for training #49

Comments

varys50 commented Feb 21, 2024

davidegraff commented Feb 21, 2024

varys50 commented Feb 21, 2024