Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[QUESTION]: Small dataset for training #49

Open
varys50 opened this issue Feb 21, 2024 · 2 comments
Open

[QUESTION]: Small dataset for training #49

varys50 opened this issue Feb 21, 2024 · 2 comments
Labels
question Further information is requested

Comments

@varys50
Copy link

varys50 commented Feb 21, 2024

What are you trying to do?
My training data is small (19 observations) and my lookup pool is only 1500 structures. I am wondering how I should set up the various parameters in the config file to account for this?

@varys50 varys50 added the question Further information is requested label Feb 21, 2024
@davidegraff
Copy link
Collaborator

In some small experiments that were never published, we found that GP regressor with a matern-5/2 kernel on morgan/pair fingerprints works pretty well in very low-sample regimes. It's not in the repo, but if you're willing to implement it yourself, we also found that a Tanimoto kernel worked even better. One of the limitations with the GP (and why we never included its results in the original paper) is that it doesn't scale to very large pools without significant engineering and some approximations, but 1500 structures is more than small enough to quickly generate predictions even. At that scale, I'd also recommend going with batch_size=1.

@varys50
Copy link
Author

varys50 commented Feb 21, 2024

Thanks! Would I need to modify the args.py file to allow for different kernels?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants