Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Run time and dimensionality reduction #3

Open
sroyyors opened this issue Jan 4, 2021 · 2 comments
Open

Run time and dimensionality reduction #3

sroyyors opened this issue Jan 4, 2021 · 2 comments

Comments

@sroyyors
Copy link

sroyyors commented Jan 4, 2021

hi,
I am running unioncom on integrate two datasets one with 5k cells and one with 50k cells. I gave the full feature matrix and it was going slowly, so I decided to reduce the dimensionality to 10 and 15 for each. But now it is even slower and not really doing anything. I am wondering if you have an idea of the runtime and if you recommend doing any dimensionality reduction. I did NMF to reduce dimensionality.

@sroyyors
Copy link
Author

sroyyors commented Jan 4, 2021

hi sorry,
to clarify the case where I saw the algorithm making some progress had 2k and 20k cells. But this had all the features. When I went to 5k and 50k, things have not been moving forward. Just wanted to clarify that dimensionality reduction of 5, and 50k did not help either.

@caokai1073
Copy link
Owner

Hi,
For distance-based or kernel-based algorithm, it is difficult to be scalable to very large-scale datasets with cells up to ~10^6 because the computational complexity depends on the number of samples. We are still working on it. But I think 5k or 50k cells can be handled by UnionCom.

Here are some ideas:

  1. Because UnionCom involves a lot of matrix operations, it can be accelerated by an efficient GPU device. If you have a GPU, you can give it a try.
  2. You can set the parameter "log_pd" to "1". This forces the program to print once for every step it runs.
  3. You can first randomly sample some cells (e.g., 1k cells) and run UnionCom and see how efficient is it.
  4. Besides, have you tried other dimensionality reduction methods such as PCA?
  5. We recently have developed a new framework named Pamona for single-cell multi-omics integration, which is based on optimal transport. Pamona can be computed by CPU efficiently. If you are interested, you can have a try, too. (https://github.com/caokai1073/Pamona)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants