-
Notifications
You must be signed in to change notification settings - Fork 265
ENH: avoid full dense matrix for parallel beta #2040
base: master
Are you sure you want to change the base?
ENH: avoid full dense matrix for parallel beta #2040
Conversation
The code looks fine, let's wait until we see the tests. Also we would need a mention in the ChangeLog and at least someone else to review this, I'm not the maintainer of the library and I'm not familiar with the intricacies of this pipeline. This will be super useful to have!!! |
@@ -34,7 +34,7 @@ | |||
import warnings | |||
warnings.filterwarnings('ignore', 'Not using MPI as mpi4py not found') | |||
|
|||
from numpy import asarray | |||
from numpy import vstack |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would probably import numpy as np
, but this is also ok.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no strong feelings here, can do if you'd like but the iteration with jenkins is large
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True, no strong feelings either ... Just ignore that comment.
Yoshiki Vázquez-Baeza
On Jun 4, 2015, at 9:45 PM, Daniel McDonald notifications@github.com wrote:
In qiime/beta_diversity.py:
@@ -34,7 +34,7 @@
import warnings
warnings.filterwarnings('ignore', 'Not using MPI as mpi4py not found')-from numpy import asarray
+from numpy import vstack
no strong feelings here, can do if you'd like but the iteration with jenkins is large—
Reply to this email directly or view it on GitHub.
Build results will soon be (or already are) available at: http://ci.qiime.org/job/qiime-github-pr/1658/ |
@ElDeveloper, able to pull this down and see if it does help with the EMP table? Curious to see what happens in a real world scenario. |
I'm planning on doing some testing with this branch later today or next On (Jun-05-15| 8:11), Daniel McDonald wrote:
|
Excellent, thanks! On Fri, Jun 5, 2015 at 10:13 AM, Yoshiki Vázquez Baeza <
|
Thanks @wasade, I just tested this out and in came up with the error you see below: Traceback (most recent call last):
File "/home/yova1074/.virtualenvs/qiime-dev/bin/beta_diversity.py", line 152, in <module>
main()
File "/home/yova1074/.virtualenvs/qiime-dev/bin/beta_diversity.py", line 145, in main
opts.output_dir, opts.rows, full_tree=opts.full_tree)
File "/home/yova1074/.virtualenvs/qiime-dev/lib/python2.7/site-packages/qiime/beta_diversity.py", line 227, in single_file_beta
otumtx = otu_table.matrix_data.T.toarray()
File "/home/yova1074/.virtualenvs/qiime-dev/lib/python2.7/site-packages/scipy/sparse/compressed.py", line 940, in toarray
return self.tocoo(copy=False).toarray(order=order, out=out)
File "/home/yova1074/.virtualenvs/qiime-dev/lib/python2.7/site-packages/scipy/sparse/coo.py", line 274, in toarray
B = self._process_toarray_args(order, out)
File "/home/yova1074/.virtualenvs/qiime-dev/lib/python2.7/site-packages/scipy/sparse/base.py", line 793, in _process_toarray_args
return np.zeros(self.shape, dtype=self.dtype, order=order)
MemoryError |
Parallel, and unifrac? If so, I do not understand how this code works
|
Yes, parallel and UniFrac. Using 8GB. On (Jun-05-15|17:45), Daniel McDonald wrote:
|
I don't understand why that branch is being taken in that code
|
@ElDeveloper, that branch implies that |
@wasade, thanks for checking! |
@@ -208,17 +208,23 @@ def single_file_beta(input_path, metrics, tree_path, output_dir, | |||
sample_ids = otu_table.ids() | |||
observation_ids = otu_table.ids(axis='observation') | |||
for i in range(len(sample_ids)): | |||
samp_a = otu_table.data(sample_ids[rowidx]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@wasade after looking at the code with @ElDeveloper, it turns out these lines inside the AttributeError
will never be executed for the unifrac distances, as line 204 do not raise any error.
IIRC, the current implementation of Unifrac in cogent is FastUnifrac. FastUnifrac is able to compute unifrac fast because it pre-computes a lot of data structures. that is why for Unifrac these lines never get executed.
One way of doing this will be to create a for loop and give the code an OTU table with 2 samples each time, and then just construct the distance matrix ourselves in here. On one hand, this will kill the "Fast" part of Unifrac, since it will recompute those structures for each pair of samples. On the other hand, we will be able to aggressively parallelize this, e.g. creating a job for each pair of samples; maximizing the parallelization during those pre-computations.
Other options require more though on how to parallelize this, instead of using the naïve per-row parallelization.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fast Unifrac does not require a dense representation. It does cache
information on the tree, but it doesn't use the zeros. However, if enabling
fast unifrac to scale within qiime requires modification of pycogent, then
I agree it is not worth pursuing.
Do you know why one_sample_unweighted_unifrac is being used instead of
dist_unweighted_unifrac? My impression was that the latter would work and
what I was expecting the metric being used to be. It is also fast unifrac
On Tue, Jun 9, 2015 at 3:39 PM, josenavas notifications@github.com wrote:
In qiime/beta_diversity.py
#2040 (comment):@@ -208,17 +208,23 @@ def single_file_beta(input_path, metrics, tree_path, output_dir,
sample_ids = otu_table.ids()
observation_ids = otu_table.ids(axis='observation')
for i in range(len(sample_ids)):
samp_a = otu_table.data(sample_ids[rowidx])
@wasade https://github.com/wasade after looking at the code with
@ElDeveloper https://github.com/ElDeveloper, it turns out these lines
inside the AttributeError will never be executed for the unifrac
distances, as line 204 do not raise any error.IIRC, the current implementation of Unifrac in cogent is FastUnifrac.
FastUnifrac is able to compute unifrac fast because it pre-computes a lot
of data structures. that is why for Unifrac these lines never get executed.One way of doing this will be to create a for loop and give the code an
OTU table with 2 samples each time, and then just construct the distance
matrix ourselves in here. On one hand, this will kill the "Fast" part of
Unifrac, since it will recompute those structures for each pair of samples.
On the other hand, we will be able to aggressively parallelize this, e.g.
creating a job for each pair of samples; maximizing the parallelization
during those pre-computations.Other options require more though on how to parallelize this, instead of
using the naïve per-row parallelization.—
Reply to this email directly or view it on GitHub
https://github.com/biocore/qiime/pull/2040/files#r32067547.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry @wasade
No idea why that function is used...
This reverts commit 590c6ae.
tree, otu_table.ids(), rowid, | ||
make_subtree=(not full_tree)) | ||
row_dissims.append(dissims) | ||
# do element by element |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ElDeveloper, wanna give this another spin? I verified with pdb
what branches were used and output by md5 sum is identical with and without this change on the test data.
@gregcaporaso, I'm really not sure if this change is safe for all the metrics, but for some reason, dist_unweighted_unifrac
and dist_weighted_unifrac
were not being treated like row metrics and instead operating on the full table because of the try/except. The comment here "do element by element" is actually misleading as best I can tell as the original code looked like it was doing it row by row. So I think this is correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@wasade, I think this might be it thanks for having a look, I started it
and its been running for a while in 8GB, while before I needed to use
64GB. 🍺 👍
On (Jul-23-15|11:42), Daniel McDonald wrote:
dissim = metric_f(
otumtx[[rowidx, i], :], observation_ids,
tree, [sample_ids[rowidx], sample_ids[i]],
make_subtree=(not full_tree))[0, 1]
else:
dissim = metric_f(otumtx[[rowidx, i], :])[0, 1]
dissims.append(dissim)
row_dissims.append(dissims)
else:
# do whole row at once
dissims = row_metric(otumtx,
otu_table.ids(axis='observation'),
tree, otu_table.ids(), rowid,
make_subtree=(not full_tree))
row_dissims.append(dissims)
# do element by element
@ElDeveloper, wanna give this another spin? I verified with
pdb
what branches were used and output by md5 sum is identical with and without this change on the test data.@gregcaporaso, I'm really not sure if this change is safe for all the metrics, but for some reason,
dist_unweighted_unifrac
anddist_weighted_unifrac
were not being treated like row metrics and instead operating on the full table because of the try/except. The comment here "do element by element" is actually misleading as best I can tell as the original code looked like it was doing it row by row. So I think this is correct?
Reply to this email directly or view it on GitHub:
https://github.com/biocore/qiime/pull/2040/files#r35356369
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💥
Build results will soon be (or already are) available at: http://ci.qiime.org/job/qiime-github-pr/1665/ |
@wasade, it seems like this may have fixed the original problem, however it seems like the running time is much slower now. I'm basing this on the fact that in the previous execution that I did, the jobs finished within 48 hours. This last time it took longer than I had initially anticipated, it exceeded the walltime limit 😭 :
|
That's unexpected. Did any of the individual jobs complete? On Wed, Jul 29, 2015 at 11:18 AM, Yoshiki Vázquez Baeza <
|
No, sadly not a single one of them. On (Jul-29-15|11:01), Daniel McDonald wrote:
|
Is cputime reasonably similar to walltime? On Wed, Jul 29, 2015 at 12:04 PM, Yoshiki Vázquez Baeza <
|
My guess is that each call to unifrac is doing some expensive caching, whereas previously the caching was done once. This is is going to get even more dirty. |
Just a note, @ElDeveloper exchanged messages out-of-thread and cputime is ~= walltime, so the jobs were burning the full time. This approach is resulting in Continuing the investigation... |
In true n00b form, I now know where the Anyway, found a reasonable way forward. It increases setup time, but I'm only operating on a small test dataset so it is not clear to me if the setup time increase will substantially increase overall runtime. The benefit of course is that we can avoid a dense representation. ...what would be really nice though is if Commit coming momentarily |
Thanks for the info @wasade. I just started the processing again, we'll On (Jul-29-15|16:20), Daniel McDonald wrote:
|
Build results will soon be (or already are) available at: http://ci.qiime.org/job/qiime-github-pr/1667/ |
For the EMP table and using 100 workers with 8GB of RAM all workers failed with this /home/yova1074/.bash_profile: line 57: /home/yova1074/.virtualenvs/qiime-dev/bin/postactivate: No such file or directory
Traceback (most recent call last):
File "/home/yova1074/.virtualenvs/qiime-dev/bin/beta_diversity.py", line 152, in <module>
main()
File "/home/yova1074/.virtualenvs/qiime-dev/bin/beta_diversity.py", line 145, in main
opts.output_dir, opts.rows, full_tree=opts.full_tree)
File "/home/yova1074/.virtualenvs/qiime-dev/lib/python2.7/site-packages/qiime/beta_diversity.py", line 230, in single_file_beta
make_subtree=(not full_tree))
File "/home/yova1074/.virtualenvs/qiime-dev/lib/python2.7/site-packages/qiime/beta_metrics.py", line 88, in result
tree, envs, weighted=weighted, metric=metric, **kwargs)
File "/home/yova1074/.virtualenvs/qiime-dev/lib/python2.7/site-packages/cogent/maths/unifrac/fast_unifrac.py", line 523, in fast_unifrac_one_sample
branch_lengths, nodes, t = _fast_unifrac_setup(t, envs, make_subtree)
File "/home/yova1074/.virtualenvs/qiime-dev/lib/python2.7/site-packages/cogent/maths/unifrac/fast_unifrac.py", line 189, in _fast_unifrac_setup
count_array, unique_envs, env_to_index, node_to_index = index_envs(envs, node_index)
File "/home/yova1074/.virtualenvs/qiime-dev/lib/python2.7/site-packages/cogent/maths/unifrac/fast_tree.py", line 113, in index_envs
result = zeros((num_nodes, num_envs), array_constructor)
MemoryError |
@rob-knight, does fast unifrac require a dense representation of the OTU table? This exception looks like it is forcing that. If so, then I'm not sure how to proceed here without making changes in cogent. That, or its possible that |
No but it does require a dense representation of the two vectors representing a pair of environments. Trimming this vector to remove shared zeroes would require trimming the tree, which might be computationally expensive.
|
@rob-knight @ElDeveloper, then I think we're in a bind. If I understand it correctly, we either do it pairwise and eat the non-trivial compute overhead of Doing some empirical tests (Notebook here) of cogent's It is my understanding that this is blocking EMP analysis. There are at least two issues that I believe need to be resolved:
In order to resolve the first point, which I think is doable, we'll need to backport all of fast unifrac. I've been optimistic about each commit that goes in on this PR, but it seems like everyone just unravels a deeper problem. So hopefully we can resolve this with the next fix... |
Part of the problem is the large number of envs so one possibility is to subset the environmental samples and run the computations such that each is compared to each other one at least once, then aggregate the table at the end, which is much less computationally expensive, right? I realize this requires some custom aggregation code and doesn’t use the current parallelization model.
|
That's a good idea -- find a balancing point between 2 envs and 30k envs. The nice thing is this would not require backporting and can be done in the |
A back of the envelope calculation puts the amount of memory necessary to compute unifrac over the EMP table at 3TB (75k samples, 5 million OTUs, double precision float). The dense representation is not necessary for the pairwise parallelized beta diversity metrics. This change addresses that issue.
Row metrics and non-parallel runs will still use the dense matrix as I believe that will require lower level changes to make possible.
@ElDeveloper, able to review?