cross validation is slow #322

pdimens · 2022-02-03T15:25:31Z

Hello,

Forgive the generality of the title. I have a DAPC cross validation running on a dataset of approx 400 samples x 5000 loci. The cross validation is testing values (n.pca.max) between 50 and 300 in increments of 10. This process has been running on 30 cores for 77+ days on a Linux system as of today with no clear end in sight. Is there a way to make this faster? Perhaps there are alternatives I can try? The full call is:

xvalDapc(
    tab(data, NA.method = "mean"),
    grp = pop(data),
    n.da = 3,
    n.pca.max = seq(50,300,10),
    n.rep = 20,
    parallel = "multicore",
    ncpus = 30
 )

The text was updated successfully, but these errors were encountered:

caitiecollins · 2022-02-03T17:12:18Z

Unfortunately, this sounds like you’re running into an error rather than a slow process.
I wouldn’t expect a single run of xvalDapc on a dataset of your size to take more than ~ 30 seconds.

The n.pca.max argument only expects a single value. I think this may be the source of your error. It’s designed to run DAPC with values of n.PCs selected from 1 to n.pca.max. So, if you chose n.pca.max=300, it should inherently explore the values in seq(50,300,10).

Could you try running the function with a single value for the n.pca.max argument (e.g., n.pca.max=300), to check if that produces results in a timely fashion?

You shouldn’t normally need to repeat the xvalDapc analysis with different values of n.pca.max. But if you have your own rationale for doing this, you would need to re-run the function in a loop, inputting only one value for n.pca.max in each iteration.

pdimens · 2022-02-03T18:08:12Z

Wow, that explains a lot. I can't believe I waited this long to address this. Closing the issue, thank you!

pdimens · 2022-02-03T20:03:12Z

I would recommend a sanity check for length(n.pca.max) == 1 to warn against something like this

pdimens closed this as completed Feb 3, 2022

pdimens reopened this Feb 3, 2022

pdimens mentioned this issue Feb 9, 2022

add sanity check to n.pca.max to default to max value if vector #323

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cross validation is slow #322

cross validation is slow #322

pdimens commented Feb 3, 2022

caitiecollins commented Feb 3, 2022

pdimens commented Feb 3, 2022

pdimens commented Feb 3, 2022 •

edited

Loading

cross validation is slow #322

cross validation is slow #322

Comments

pdimens commented Feb 3, 2022

caitiecollins commented Feb 3, 2022

pdimens commented Feb 3, 2022

pdimens commented Feb 3, 2022 • edited Loading

pdimens commented Feb 3, 2022 •

edited

Loading