ENH: avoid full dense matrix for parallel beta #2040

wasade · 2015-06-04T21:59:33Z

A back of the envelope calculation puts the amount of memory necessary to compute unifrac over the EMP table at 3TB (75k samples, 5 million OTUs, double precision float). The dense representation is not necessary for the pairwise parallelized beta diversity metrics. This change addresses that issue.

Row metrics and non-parallel runs will still use the dense matrix as I believe that will require lower level changes to make possible.

@ElDeveloper, able to review?

ElDeveloper · 2015-06-04T23:51:32Z

The code looks fine, let's wait until we see the tests. Also we would need a mention in the ChangeLog and at least someone else to review this, I'm not the maintainer of the library and I'm not familiar with the intricacies of this pipeline.

This will be super useful to have!!!

ElDeveloper · 2015-06-04T23:52:34Z

qiime/beta_diversity.py

@@ -34,7 +34,7 @@
 import warnings
 warnings.filterwarnings('ignore', 'Not using MPI as mpi4py not found')

-from numpy import asarray
+from numpy import vstack


I would probably import numpy as np, but this is also ok.

no strong feelings here, can do if you'd like but the iteration with jenkins is large

True, no strong feelings either ... Just ignore that comment.

Yoshiki Vázquez-Baeza

On Jun 4, 2015, at 9:45 PM, Daniel McDonald notifications@github.com wrote:

In qiime/beta_diversity.py:

@@ -34,7 +34,7 @@
import warnings
warnings.filterwarnings('ignore', 'Not using MPI as mpi4py not found')

-from numpy import asarray
+from numpy import vstack
no strong feelings here, can do if you'd like but the iteration with jenkins is large

—
Reply to this email directly or view it on GitHub.

ghost · 2015-06-05T00:50:43Z

Build results will soon be (or already are) available at: http://ci.qiime.org/job/qiime-github-pr/1658/

wasade · 2015-06-05T15:11:26Z

@ElDeveloper, able to pull this down and see if it does help with the EMP table? Curious to see what happens in a real world scenario.

ElDeveloper · 2015-06-05T16:13:40Z

I'm planning on doing some testing with this branch later today or next
week, I'll report back for sure! Thanks for putting this in place!

On (Jun-05-15| 8:11), Daniel McDonald wrote:

@ElDeveloper, able to pull this down and see if it does help with the EMP table? Curious to see what happens in a real world scenario.

Reply to this email directly or view it on GitHub:
#2040 (comment)

wasade · 2015-06-05T16:14:59Z

Excellent, thanks!

On Fri, Jun 5, 2015 at 10:13 AM, Yoshiki Vázquez Baeza <
notifications@github.com> wrote:

I'm planning on doing some testing with this branch later today or next
week, I'll report back for sure! Thanks for putting this in place!

On (Jun-05-15| 8:11), Daniel McDonald wrote:

@ElDeveloper, able to pull this down and see if it does help with the
EMP table? Curious to see what happens in a real world scenario.

Reply to this email directly or view it on GitHub:
#2040 (comment)

—
Reply to this email directly or view it on GitHub
#2040 (comment).

ElDeveloper · 2015-06-06T00:12:20Z

Thanks @wasade, I just tested this out and in came up with the error you see below:

Traceback (most recent call last):
  File "/home/yova1074/.virtualenvs/qiime-dev/bin/beta_diversity.py", line 152, in <module>
    main()
  File "/home/yova1074/.virtualenvs/qiime-dev/bin/beta_diversity.py", line 145, in main
    opts.output_dir, opts.rows, full_tree=opts.full_tree)
  File "/home/yova1074/.virtualenvs/qiime-dev/lib/python2.7/site-packages/qiime/beta_diversity.py", line 227, in single_file_beta
    otumtx = otu_table.matrix_data.T.toarray()
  File "/home/yova1074/.virtualenvs/qiime-dev/lib/python2.7/site-packages/scipy/sparse/compressed.py", line 940, in toarray
    return self.tocoo(copy=False).toarray(order=order, out=out)
  File "/home/yova1074/.virtualenvs/qiime-dev/lib/python2.7/site-packages/scipy/sparse/coo.py", line 274, in toarray
    B = self._process_toarray_args(order, out)
  File "/home/yova1074/.virtualenvs/qiime-dev/lib/python2.7/site-packages/scipy/sparse/base.py", line 793, in _process_toarray_args
    return np.zeros(self.shape, dtype=self.dtype, order=order)
MemoryError

wasade · 2015-06-06T00:45:49Z

Parallel, and unifrac? If so, I do not understand how this code works
On Jun 5, 2015 6:12 PM, "Yoshiki Vázquez Baeza" notifications@github.com
wrote:

Thanks @wasade https://github.com/wasade, I just tested this out and in
came up with the error you see below:

Traceback (most recent call last):
File "/home/yova1074/.virtualenvs/qiime-dev/bin/beta_diversity.py", line 152, in
main()
File "/home/yova1074/.virtualenvs/qiime-dev/bin/beta_diversity.py", line 145, in main
opts.output_dir, opts.rows, full_tree=opts.full_tree)
File "/home/yova1074/.virtualenvs/qiime-dev/lib/python2.7/site-packages/qiime/beta_diversity.py", line 227, in single_file_beta
otumtx = otu_table.matrix_data.T.toarray()
File "/home/yova1074/.virtualenvs/qiime-dev/lib/python2.7/site-packages/scipy/sparse/compressed.py", line 940, in toarray
return self.tocoo(copy=False).toarray(order=order, out=out)
File "/home/yova1074/.virtualenvs/qiime-dev/lib/python2.7/site-packages/scipy/sparse/coo.py", line 274, in toarray
B = self._process_toarray_args(order, out)
File "/home/yova1074/.virtualenvs/qiime-dev/lib/python2.7/site-packages/scipy/sparse/base.py", line 793, in _process_toarray_args
return np.zeros(self.shape, dtype=self.dtype, order=order)
MemoryError

—
Reply to this email directly or view it on GitHub
#2040 (comment).

ElDeveloper · 2015-06-06T00:47:14Z

Yes, parallel and UniFrac. Using 8GB.

On (Jun-05-15|17:45), Daniel McDonald wrote:

Parallel, and unifrac? If so, I do not understand how this code works
On Jun 5, 2015 6:12 PM, "Yoshiki Vázquez Baeza" notifications@github.com
wrote:

Thanks @wasade https://github.com/wasade, I just tested this out and in
came up with the error you see below:

Traceback (most recent call last):
File "/home/yova1074/.virtualenvs/qiime-dev/bin/beta_diversity.py", line 152, in
main()
File "/home/yova1074/.virtualenvs/qiime-dev/bin/beta_diversity.py", line 145, in main
opts.output_dir, opts.rows, full_tree=opts.full_tree)
File "/home/yova1074/.virtualenvs/qiime-dev/lib/python2.7/site-packages/qiime/beta_diversity.py", line 227, in single_file_beta
otumtx = otu_table.matrix_data.T.toarray()
File "/home/yova1074/.virtualenvs/qiime-dev/lib/python2.7/site-packages/scipy/sparse/compressed.py", line 940, in toarray
return self.tocoo(copy=False).toarray(order=order, out=out)
File "/home/yova1074/.virtualenvs/qiime-dev/lib/python2.7/site-packages/scipy/sparse/coo.py", line 274, in toarray
B = self._process_toarray_args(order, out)
File "/home/yova1074/.virtualenvs/qiime-dev/lib/python2.7/site-packages/scipy/sparse/base.py", line 793, in _process_toarray_args
return np.zeros(self.shape, dtype=self.dtype, order=order)
MemoryError

—
Reply to this email directly or view it on GitHub
#2040 (comment).

Reply to this email directly or view it on GitHub:
#2040 (comment)

wasade · 2015-06-06T01:04:15Z

I don't understand why that branch is being taken in that code
unfortunately... @gregcaporaso, do you know?
On Jun 5, 2015 6:47 PM, "Yoshiki Vázquez Baeza" notifications@github.com
wrote:

Yes, parallel and UniFrac. Using 8GB.

On (Jun-05-15|17:45), Daniel McDonald wrote:

Parallel, and unifrac? If so, I do not understand how this code works
On Jun 5, 2015 6:12 PM, "Yoshiki Vázquez Baeza" <
notifications@github.com>
wrote:

Thanks @wasade https://github.com/wasade, I just tested this out
and in
came up with the error you see below:

Traceback (most recent call last):
File "/home/yova1074/.virtualenvs/qiime-dev/bin/beta_diversity.py",
line 152, in
main()
File "/home/yova1074/.virtualenvs/qiime-dev/bin/beta_diversity.py",
line 145, in main
opts.output_dir, opts.rows, full_tree=opts.full_tree)
File
"/home/yova1074/.virtualenvs/qiime-dev/lib/python2.7/site-packages/qiime/beta_diversity.py",
line 227, in single_file_beta
otumtx = otu_table.matrix_data.T.toarray()
File
"/home/yova1074/.virtualenvs/qiime-dev/lib/python2.7/site-packages/scipy/sparse/compressed.py",
line 940, in toarray
return self.tocoo(copy=False).toarray(order=order, out=out)
File
"/home/yova1074/.virtualenvs/qiime-dev/lib/python2.7/site-packages/scipy/sparse/coo.py",
line 274, in toarray
B = self._process_toarray_args(order, out)
File
"/home/yova1074/.virtualenvs/qiime-dev/lib/python2.7/site-packages/scipy/sparse/base.py",
line 793, in _process_toarray_args
return np.zeros(self.shape, dtype=self.dtype, order=order)
MemoryError

—
Reply to this email directly or view it on GitHub
#2040 (comment).

Reply to this email directly or view it on GitHub:
#2040 (comment)

—
Reply to this email directly or view it on GitHub
#2040 (comment).

wasade · 2015-06-06T01:34:36Z

@ElDeveloper, that branch implies that one_sample_unifrac is being used, not dist_unweighted_unifrac. Walking backwards, the metric used in your traceback is pulled from this function, and using this unifrac metric. I was expecting this method for pulling the metric, and that it would pull this method for unifrac. Parallelizing the other aspects of this are going to be more involved unfortunately, though it may be as simple is replicating this for loop

ElDeveloper · 2015-06-06T06:41:24Z

@wasade, thanks for checking!

josenavas · 2015-06-09T21:39:36Z

qiime/beta_diversity.py

@@ -208,17 +208,23 @@ def single_file_beta(input_path, metrics, tree_path, output_dir,
                        sample_ids = otu_table.ids()
                        observation_ids = otu_table.ids(axis='observation')
                        for i in range(len(sample_ids)):
+                            samp_a = otu_table.data(sample_ids[rowidx])


@wasade after looking at the code with @ElDeveloper, it turns out these lines inside the AttributeError will never be executed for the unifrac distances, as line 204 do not raise any error.

IIRC, the current implementation of Unifrac in cogent is FastUnifrac. FastUnifrac is able to compute unifrac fast because it pre-computes a lot of data structures. that is why for Unifrac these lines never get executed.

One way of doing this will be to create a for loop and give the code an OTU table with 2 samples each time, and then just construct the distance matrix ourselves in here. On one hand, this will kill the "Fast" part of Unifrac, since it will recompute those structures for each pair of samples. On the other hand, we will be able to aggressively parallelize this, e.g. creating a job for each pair of samples; maximizing the parallelization during those pre-computations.

Other options require more though on how to parallelize this, instead of using the naïve per-row parallelization.

Fast Unifrac does not require a dense representation. It does cache
information on the tree, but it doesn't use the zeros. However, if enabling
fast unifrac to scale within qiime requires modification of pycogent, then
I agree it is not worth pursuing.

Do you know why one_sample_unweighted_unifrac is being used instead of
dist_unweighted_unifrac? My impression was that the latter would work and
what I was expecting the metric being used to be. It is also fast unifrac

On Tue, Jun 9, 2015 at 3:39 PM, josenavas notifications@github.com wrote:

In qiime/beta_diversity.py
#2040 (comment):

@@ -208,17 +208,23 @@ def single_file_beta(input_path, metrics, tree_path, output_dir,
sample_ids = otu_table.ids()
observation_ids = otu_table.ids(axis='observation')
for i in range(len(sample_ids)):

samp_a = otu_table.data(sample_ids[rowidx])

@wasade https://github.com/wasade after looking at the code with
@ElDeveloper https://github.com/ElDeveloper, it turns out these lines
inside the AttributeError will never be executed for the unifrac
distances, as line 204 do not raise any error.

IIRC, the current implementation of Unifrac in cogent is FastUnifrac.
FastUnifrac is able to compute unifrac fast because it pre-computes a lot
of data structures. that is why for Unifrac these lines never get executed.

One way of doing this will be to create a for loop and give the code an
OTU table with 2 samples each time, and then just construct the distance
matrix ourselves in here. On one hand, this will kill the "Fast" part of
Unifrac, since it will recompute those structures for each pair of samples.
On the other hand, we will be able to aggressively parallelize this, e.g.
creating a job for each pair of samples; maximizing the parallelization
during those pre-computations.

Other options require more though on how to parallelize this, instead of
using the naïve per-row parallelization.

—
Reply to this email directly or view it on GitHub
https://github.com/biocore/qiime/pull/2040/files#r32067547.

Sorry @wasade

No idea why that function is used...

This reverts commit 590c6ae.

wasade · 2015-07-23T18:42:51Z

qiime/beta_diversity.py

-                                             tree, otu_table.ids(), rowid,
-                                             make_subtree=(not full_tree))
-                        row_dissims.append(dissims)
+                    # do element by element


@ElDeveloper, wanna give this another spin? I verified with pdb what branches were used and output by md5 sum is identical with and without this change on the test data.

@gregcaporaso, I'm really not sure if this change is safe for all the metrics, but for some reason, dist_unweighted_unifrac and dist_weighted_unifrac were not being treated like row metrics and instead operating on the full table because of the try/except. The comment here "do element by element" is actually misleading as best I can tell as the original code looked like it was doing it row by row. So I think this is correct?

@wasade, I think this might be it thanks for having a look, I started it
and its been running for a while in 8GB, while before I needed to use
64GB. 🍺 👍

On (Jul-23-15|11:42), Daniel McDonald wrote:

dissim = metric_f(

otumtx[[rowidx, i], :], observation_ids,

tree, [sample_ids[rowidx], sample_ids[i]],

make_subtree=(not full_tree))[0, 1]

else:

dissim = metric_f(otumtx[[rowidx, i], :])[0, 1]

dissims.append(dissim)

row_dissims.append(dissims)

else:

# do whole row at once

dissims = row_metric(otumtx,

otu_table.ids(axis='observation'),

tree, otu_table.ids(), rowid,

make_subtree=(not full_tree))

row_dissims.append(dissims)

# do element by element

@ElDeveloper, wanna give this another spin? I verified with pdb what branches were used and output by md5 sum is identical with and without this change on the test data.

@gregcaporaso, I'm really not sure if this change is safe for all the metrics, but for some reason, dist_unweighted_unifrac and dist_weighted_unifrac were not being treated like row metrics and instead operating on the full table because of the try/except. The comment here "do element by element" is actually misleading as best I can tell as the original code looked like it was doing it row by row. So I think this is correct?

Reply to this email directly or view it on GitHub:
https://github.com/biocore/qiime/pull/2040/files#r35356369

ghost · 2015-07-23T21:10:43Z

Build results will soon be (or already are) available at: http://ci.qiime.org/job/qiime-github-pr/1665/

ElDeveloper · 2015-07-29T17:18:25Z

@wasade, it seems like this may have fixed the original problem, however it seems like the running time is much slower now. I'm basing this on the fact that in the previous execution that I did, the jobs finished within 48 hours. This last time it took longer than I had initially anticipated, it exceeded the walltime limit 😭 :

/home/yova1074/.bash_profile: line 57: /home/yova1074/.virtualenvs/qiime-dev/bin/postactivate: No such file or directory
=>> PBS: job killed: walltime 432039 exceeded limit 432000

wasade · 2015-07-29T18:01:21Z

That's unexpected. Did any of the individual jobs complete?

On Wed, Jul 29, 2015 at 11:18 AM, Yoshiki Vázquez Baeza <
notifications@github.com> wrote:

@wasade https://github.com/wasade, it seems like this may have fixed
the original problem, however it seems like the running time is much slower
now. I'm basing this on the fact that in the previous execution that I did,
the jobs finished within 48 hours. This last time it took longer than I had
initially anticipated, it exceeded the walltime limit [image: 😭] :

/home/yova1074/.bash_profile: line 57: /home/yova1074/.virtualenvs/qiime-dev/bin/postactivate: No such file or directory
=>> PBS: job killed: walltime 432039 exceeded limit 432000

—
Reply to this email directly or view it on GitHub
#2040 (comment).

ElDeveloper · 2015-07-29T18:04:44Z

No, sadly not a single one of them.

On (Jul-29-15|11:01), Daniel McDonald wrote:

That's unexpected. Did any of the individual jobs complete?

On Wed, Jul 29, 2015 at 11:18 AM, Yoshiki Vázquez Baeza <
notifications@github.com> wrote:

@wasade https://github.com/wasade, it seems like this may have fixed
the original problem, however it seems like the running time is much slower
now. I'm basing this on the fact that in the previous execution that I did,
the jobs finished within 48 hours. This last time it took longer than I had
initially anticipated, it exceeded the walltime limit [image: 😭] :

/home/yova1074/.bash_profile: line 57: /home/yova1074/.virtualenvs/qiime-dev/bin/postactivate: No such file or directory
=>> PBS: job killed: walltime 432039 exceeded limit 432000

—
Reply to this email directly or view it on GitHub
#2040 (comment).

Reply to this email directly or view it on GitHub:
#2040 (comment)

wasade · 2015-07-29T18:16:17Z

Is cputime reasonably similar to walltime?

On Wed, Jul 29, 2015 at 12:04 PM, Yoshiki Vázquez Baeza <
notifications@github.com> wrote:

No, sadly not a single one of them.

On (Jul-29-15|11:01), Daniel McDonald wrote:

That's unexpected. Did any of the individual jobs complete?

On Wed, Jul 29, 2015 at 11:18 AM, Yoshiki Vázquez Baeza <
notifications@github.com> wrote:

@wasade https://github.com/wasade, it seems like this may have fixed
the original problem, however it seems like the running time is much
slower
now. I'm basing this on the fact that in the previous execution that I
did,
the jobs finished within 48 hours. This last time it took longer than
I had
initially anticipated, it exceeded the walltime limit [image: 😭] :

/home/yova1074/.bash_profile: line 57:
/home/yova1074/.virtualenvs/qiime-dev/bin/postactivate: No such file or
directory
=>> PBS: job killed: walltime 432039 exceeded limit 432000

—
Reply to this email directly or view it on GitHub
#2040 (comment).

Reply to this email directly or view it on GitHub:
#2040 (comment)

—
Reply to this email directly or view it on GitHub
#2040 (comment).

wasade · 2015-07-29T18:40:39Z

My guess is that each call to unifrac is doing some expensive caching, whereas previously the caching was done once. This is is going to get even more dirty.

wasade · 2015-07-29T22:41:48Z

Just a note, @ElDeveloper exchanged messages out-of-thread and cputime is ~= walltime, so the jobs were burning the full time.

This approach is resulting in 2 * N**2 calls to _fast_unifrac_setup, in which the bulk of the time is then spent in cogent's TreeNode.copy. (Note, I do not understand the source of the coefficient at this time, but that's minor right now). The existing trajectory through this code resulted in 2 * N calls to _fast_unifrac_setup. In the small test case from qiime_test_data, the runtime difference is 0.9s vs 8.7s`.

Continuing the investigation...

wasade · 2015-07-29T23:20:25Z

In true n00b form, I now know where the 2 coefficient comes from: running both weighted and unweighted during testing.

Anyway, found a reasonable way forward. It increases setup time, but I'm only operating on a small test dataset so it is not clear to me if the setup time increase will substantially increase overall runtime. The benefit of course is that we can avoid a dense representation.

...what would be really nice though is if fast_unifrac could just consume a scipy.sparse.csr_matrix for envs as the 2d dict currently being used is very heavy weight when the number of nonzero entries is large.

Commit coming momentarily

ElDeveloper · 2015-07-30T00:26:49Z

Thanks for the info @wasade. I just started the processing again, we'll
see what happens! Agree on your proposal about fast_unifrac, not sure
where that fits in for the future of skbio.

On (Jul-29-15|16:20), Daniel McDonald wrote:

In true n00b form, I now know where the 2 coefficient comes from: running both weighted and unweighted during testing.

Anyway, found a reasonable way forward. It increases setup time, but I'm only operating on a small test dataset so it is not clear to me if the setup time increase will substantially increase overall runtime. The benefit of course is that we can avoid a dense representation.

...what would be really nice though is if fast_unifrac could just consume a scipy.sparse.csr_matrix for envs as the 2d dict currently being used is very heavy weight when the number of nonzero entries is large.

Commit coming momentarily

Reply to this email directly or view it on GitHub:
#2040 (comment)

ghost · 2015-07-30T02:30:43Z

Build results will soon be (or already are) available at: http://ci.qiime.org/job/qiime-github-pr/1667/

ElDeveloper · 2015-07-30T18:27:52Z

For the EMP table and using 100 workers with 8GB of RAM all workers failed with this MemoryError:

/home/yova1074/.bash_profile: line 57: /home/yova1074/.virtualenvs/qiime-dev/bin/postactivate: No such file or directory
Traceback (most recent call last):
  File "/home/yova1074/.virtualenvs/qiime-dev/bin/beta_diversity.py", line 152, in <module>
    main()
  File "/home/yova1074/.virtualenvs/qiime-dev/bin/beta_diversity.py", line 145, in main
    opts.output_dir, opts.rows, full_tree=opts.full_tree)
  File "/home/yova1074/.virtualenvs/qiime-dev/lib/python2.7/site-packages/qiime/beta_diversity.py", line 230, in single_file_beta
    make_subtree=(not full_tree))
  File "/home/yova1074/.virtualenvs/qiime-dev/lib/python2.7/site-packages/qiime/beta_metrics.py", line 88, in result
    tree, envs, weighted=weighted, metric=metric, **kwargs)
  File "/home/yova1074/.virtualenvs/qiime-dev/lib/python2.7/site-packages/cogent/maths/unifrac/fast_unifrac.py", line 523, in fast_unifrac_one_sample
    branch_lengths, nodes, t = _fast_unifrac_setup(t, envs, make_subtree)
  File "/home/yova1074/.virtualenvs/qiime-dev/lib/python2.7/site-packages/cogent/maths/unifrac/fast_unifrac.py", line 189, in _fast_unifrac_setup
    count_array, unique_envs, env_to_index, node_to_index = index_envs(envs, node_index)
  File "/home/yova1074/.virtualenvs/qiime-dev/lib/python2.7/site-packages/cogent/maths/unifrac/fast_tree.py", line 113, in index_envs
    result = zeros((num_nodes, num_envs), array_constructor)
MemoryError

wasade · 2015-07-30T19:27:48Z

@rob-knight, does fast unifrac require a dense representation of the OTU table? This exception looks like it is forcing that. If so, then I'm not sure how to proceed here without making changes in cogent. That, or its possible that parallel_beta_diversity.py isn't driving fast unifrac correctly?

rob-knight · 2015-07-31T10:46:56Z

No but it does require a dense representation of the two vectors representing a pair of environments. Trimming this vector to remove shared zeroes would require trimming the tree, which might be computationally expensive.

On Jul 30, 2015, at 12:27 PM, Daniel McDonald notifications@github.com wrote:

@rob-knight https://github.com/rob-knight, does fast unifrac require a dense representation of the OTU table? This exception looks like it is forcing that. If so, then I'm not sure how to proceed here without making changes in cogent. That, or its possible that parallel_beta_diversity.py isn't driving fast unifrac correctly?

—
Reply to this email directly or view it on GitHub #2040 (comment).

wasade · 2015-07-31T15:34:18Z

@rob-knight @ElDeveloper, then I think we're in a bind. If I understand it correctly, we either do it pairwise and eat the non-trivial compute overhead of _fast_unifrac_setup or we pass in all samples as the envs forcing a dense matrix to be produced in index_envs (i.e., ~3TB of memory). The compute overhead on top of fast unifrac itself for doing pairwise is appears to be at least O(K^2 * (N * log N)), where N is the number of tips in the tree (7.5 million, this is open ref, expense driven by TreeNode.copy prior to pruning), and K is the number of samples.

Doing some empirical tests (Notebook here) of cogent's TreeNode.copy with the Greengenes OTU trees, we can fit the growth pretty well (r^2 = 0.99). Extrapolating to 7.5 million tips in the EMP tree, and assuming 30k samples, we're looking at on the order of 56 million CPU hours of overhead.

It is my understanding that this is blocking EMP analysis. There are at least two issues that I believe need to be resolved:

introduce a fast TreeNode.copy, or modify index_tree to only index a subtree (thereby avoiding the need for a copy)
modify fast_unifrac to allow operating pairwise on the envs instead of them in bulk to reduce overhead expense associated with spinning up its machinery

In order to resolve the first point, which I think is doable, we'll need to backport all of fast unifrac.

I've been optimistic about each commit that goes in on this PR, but it seems like everyone just unravels a deeper problem. So hopefully we can resolve this with the next fix...

rob-knight · 2015-07-31T16:04:40Z

Part of the problem is the large number of envs so one possibility is to subset the environmental samples and run the computations such that each is compared to each other one at least once, then aggregate the table at the end, which is much less computationally expensive, right? I realize this requires some custom aggregation code and doesn’t use the current parallelization model.

On Jul 31, 2015, at 8:34 AM, Daniel McDonald notifications@github.com wrote:

@rob-knight https://github.com/rob-knight @ElDeveloper https://github.com/ElDeveloper, then I think we're in a bind. If I understand it correctly, we either do it pairwise and eat the non-trivial compute overhead of _fast_unifrac_setup or we pass in all samples as the envs forcing a dense matrix to be produced in index_envs (i.e., ~3TB of memory). The compute overhead on top of fast unifrac itself for doing pairwise is appears to be at least O(K^2 * (N * log N)), where N is the number of tips in the tree (7.5 million, this is open ref, expense driven by TreeNode.copy prior to pruning), and K is the number of samples.

Doing some empirical tests (Notebook here https://gist.github.com/wasade/22b96005a18636ef158a) of cogent's TreeNode.copy with the Greengenes OTU trees, we can fit the growth pretty well (r^2 = 0.99). Extrapolating to 7.5 million tips in the EMP tree, and assuming 30k samples, we're looking at on the order of 56 million CPU hours of overhead.

It is my understanding that this is blocking EMP analysis. There are at least two issues that I believe need to be resolved:

introduce a fast TreeNode.copy, or modify index_tree to only index a subtree (thereby avoiding the need for a copy)
modify fast_unifrac to allow operating pairwise on the envs instead of them in bulk to reduce overhead expense associated with spinning up its machinery
In order to resolve the first point, which I think is doable, we'll need to backport all of fast unifrac.

I've been optimistic about each commit that goes in on this PR, but it seems like everyone just unravels a deeper problem. So hopefully we can resolve this with the next fix...

—
Reply to this email directly or view it on GitHub #2040 (comment).

wasade · 2015-07-31T16:17:10Z

That's a good idea -- find a balancing point between 2 envs and 30k envs. The nice thing is this would not require backporting and can be done in the parallel_beta_diversity code such that it can leverage the parallelization in place. On the other hand, that code is dead, so I wonder if the better bet is to port, bench, and optimize first (particularly as I think there are some workarounds with the copy) to position us better for a subsequent port to skbio.

ENH: avoid full dense matrix for parallel beta

23369b2

ElDeveloper reviewed Jun 4, 2015
View reviewed changes

josenavas reviewed Jun 9, 2015
View reviewed changes

wasade added 3 commits July 23, 2015 12:25

ENH/BUG?

590c6ae

Revert "ENH/BUG?"

584af9b

This reverts commit 590c6ae.

Remove confusing possibly unnecessary logic

636891f

wasade reviewed Jul 23, 2015
View reviewed changes

Allow fast unifrac setup to indirectly source from a sparse matrix

b959ecb

ENH: avoid full dense matrix for parallel beta #2040

Are you sure you want to change the base?

ENH: avoid full dense matrix for parallel beta #2040

Conversation

wasade commented Jun 4, 2015

ElDeveloper commented Jun 4, 2015

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ghost commented Jun 5, 2015

wasade commented Jun 5, 2015

ElDeveloper commented Jun 5, 2015

wasade commented Jun 5, 2015

ElDeveloper commented Jun 6, 2015

wasade commented Jun 6, 2015

ElDeveloper commented Jun 6, 2015

wasade commented Jun 6, 2015

wasade commented Jun 6, 2015

ElDeveloper commented Jun 6, 2015

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ghost commented Jul 23, 2015

ElDeveloper commented Jul 29, 2015

wasade commented Jul 29, 2015

ElDeveloper commented Jul 29, 2015

wasade commented Jul 29, 2015

wasade commented Jul 29, 2015

wasade commented Jul 29, 2015

wasade commented Jul 29, 2015

ElDeveloper commented Jul 30, 2015

ghost commented Jul 30, 2015

ElDeveloper commented Jul 30, 2015

wasade commented Jul 30, 2015

rob-knight commented Jul 31, 2015

wasade commented Jul 31, 2015

rob-knight commented Jul 31, 2015

wasade commented Jul 31, 2015