You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
hi,
I am running unioncom on integrate two datasets one with 5k cells and one with 50k cells. I gave the full feature matrix and it was going slowly, so I decided to reduce the dimensionality to 10 and 15 for each. But now it is even slower and not really doing anything. I am wondering if you have an idea of the runtime and if you recommend doing any dimensionality reduction. I did NMF to reduce dimensionality.
The text was updated successfully, but these errors were encountered:
hi sorry,
to clarify the case where I saw the algorithm making some progress had 2k and 20k cells. But this had all the features. When I went to 5k and 50k, things have not been moving forward. Just wanted to clarify that dimensionality reduction of 5, and 50k did not help either.
Hi,
For distance-based or kernel-based algorithm, it is difficult to be scalable to very large-scale datasets with cells up to ~10^6 because the computational complexity depends on the number of samples. We are still working on it. But I think 5k or 50k cells can be handled by UnionCom.
Here are some ideas:
Because UnionCom involves a lot of matrix operations, it can be accelerated by an efficient GPU device. If you have a GPU, you can give it a try.
You can set the parameter "log_pd" to "1". This forces the program to print once for every step it runs.
You can first randomly sample some cells (e.g., 1k cells) and run UnionCom and see how efficient is it.
Besides, have you tried other dimensionality reduction methods such as PCA?
We recently have developed a new framework named Pamona for single-cell multi-omics integration, which is based on optimal transport. Pamona can be computed by CPU efficiently. If you are interested, you can have a try, too. (https://github.com/caokai1073/Pamona)
hi,
I am running unioncom on integrate two datasets one with 5k cells and one with 50k cells. I gave the full feature matrix and it was going slowly, so I decided to reduce the dimensionality to 10 and 15 for each. But now it is even slower and not really doing anything. I am wondering if you have an idea of the runtime and if you recommend doing any dimensionality reduction. I did NMF to reduce dimensionality.
The text was updated successfully, but these errors were encountered: