-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Speed up large sample/panel analysis #118
Comments
In advance of profvis, I know two of the more consuming processes here:
For the first one, For the second one, I also notice that "non-optimal" panels will score much slower than cytosel-recommended panels. This can be seen by comparing the speed of running the Lymph Node set without an uploaded list vs. the CD set. We may want to try a different scorer but this won't be a trivial replacement and will likely require a significant code re-write for compatibility. The real crux here is R's single-threaded nature. When combined with deployment on a server with shared instances for multiple users, we are limited in the amount of "parallel" computing that we want do without 1. crashing everything completely, or 2. slowing down shared instances significantly. If we really need to add parallel computing and multiple threads, we will probably need to move away from shinyapps. |
Ok thanks for the insights, this is really helpful. For Two questions
For the panel scoring, I've been really happy that scores approximately correspond to how good the panel is for that cell type, so don't want to break this too much. Could you push a version that uses Looking at other solutions: > system.time({nnet::multinom(y~., data = df, trace = FALSE, MaxNWts = 100000)})
user system elapsed
23.014 0.132 23.235 > system.time({Rfast::multinom.reg(y,x)})
## gave up because it took so long not so Rfast...will keep looking |
cytosel_profvis_lymph_node_cd_1_200_all_genes.zip Uploaded zip file of the profvis profile for Lymph Node with 100+ CD markers. As discussed, reducing the number of genes to profile to only protein coding genes can drastically speed up the marker finding process. |
Analysis speed could be improved significantly by switching from scran |
Using the lymph node tablula sapiens dataset (all cells) + the uploaded 100 CD list takes ~3-4 minutes to create the plots / analysis (after clicking "run analysis") which I think is too long. Can
profvis
be run on this example to see what's taking so long? We then have some options:If scoring
If UMAP
Other?
The text was updated successfully, but these errors were encountered: