Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tmQM dataset splitting according to metal elements #329

Open
Tracked by #333
MarshallYan opened this issue Dec 20, 2024 · 1 comment
Open
Tracked by #333

tmQM dataset splitting according to metal elements #329

MarshallYan opened this issue Dec 20, 2024 · 1 comment

Comments

@MarshallYan
Copy link
Collaborator

We would like to have a feature in the dataset module (read from dataset.toml) to split tmQM dataset according to metal elements. e.g. Load all molecules with Pt as the metal center.

We would like this module to support binary operators. e.g. Load molecules containing Pt or Cu as the metal center.

More features shall be discussed and added to this module. We would like to split other datasets as well, but details need to be discussed.

@chrisiacovella
Copy link
Member

As an additional workaround, I'll add some additional datasets that limit to:

  • Pd, Zn, Fe, Cu: 27685 molecules
  • Pd, Zn, Fe, Cu and then exclude any molecules that contain elements not in the set: C, H, P, S, O, N, F, Cl, Br: 18515 molecules
  • Pd, Zn, Fe, Cu, Ni, Pt, Ir, Rh, Cr, Ag: 60637
  • Pd, Zn, Fe, Cu, Ni, Pt, Ir, Rh, Cr, Ag and then exclude any molecules that contain elements not in the set: C, H, P, S, O, N, F, Cl, Br::: 40866

Putting in the data filter is a more generally useful and dynamic approach we should definitely implement as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants