Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cross validation is slow #322

Open
pdimens opened this issue Feb 3, 2022 · 3 comments
Open

cross validation is slow #322

pdimens opened this issue Feb 3, 2022 · 3 comments

Comments

@pdimens
Copy link
Contributor

pdimens commented Feb 3, 2022

Hello,

Forgive the generality of the title. I have a DAPC cross validation running on a dataset of approx 400 samples x 5000 loci. The cross validation is testing values (n.pca.max) between 50 and 300 in increments of 10. This process has been running on 30 cores for 77+ days on a Linux system as of today with no clear end in sight. Is there a way to make this faster? Perhaps there are alternatives I can try? The full call is:

xvalDapc(
    tab(data, NA.method = "mean"),
    grp = pop(data),
    n.da = 3,
    n.pca.max = seq(50,300,10),
    n.rep = 20,
    parallel = "multicore",
    ncpus = 30
 )
@caitiecollins
Copy link
Collaborator

Unfortunately, this sounds like you’re running into an error rather than a slow process.
I wouldn’t expect a single run of xvalDapc on a dataset of your size to take more than ~ 30 seconds.

The n.pca.max argument only expects a single value. I think this may be the source of your error. It’s designed to run DAPC with values of n.PCs selected from 1 to n.pca.max. So, if you chose n.pca.max=300, it should inherently explore the values in seq(50,300,10).

Could you try running the function with a single value for the n.pca.max argument (e.g., n.pca.max=300), to check if that produces results in a timely fashion?

You shouldn’t normally need to repeat the xvalDapc analysis with different values of n.pca.max. But if you have your own rationale for doing this, you would need to re-run the function in a loop, inputting only one value for n.pca.max in each iteration.

@pdimens
Copy link
Contributor Author

pdimens commented Feb 3, 2022

Wow, that explains a lot. I can't believe I waited this long to address this. Closing the issue, thank you!

@pdimens pdimens closed this as completed Feb 3, 2022
@pdimens
Copy link
Contributor Author

pdimens commented Feb 3, 2022

I would recommend a sanity check for length(n.pca.max) == 1 to warn against something like this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants