-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix issue 88 plotAbundance rank #113
Conversation
Thanks! Usually, there are some NAs in microbial datasets even in the highest ranks, so this might not be the most robust option. Moreover, this still agglomerates the data which was the initial issue. Usually TreeSE is constructed so that the lowest rank (ASV/OTU/strain) is in "row level". The higher ranks are described in rowData (Species, Genus...). --> You cannot find the real lowest rank from rowData since it does not include that kind of information. Check for example GlobalPatterns data. Agglomerate it to lowest available level in rowData (Species) --> the number of rows differ because there are more OTU level bacteria than Species level. So the most transparent option would be to skip agglomeration by default. In function Line 276 in 0fa499a
So modifications to that function would be a. Add line that calls agglomerateByRank Other thing is that then user cannot use plotExpression anymore from the plotAbundance function (plotExpression is called when rank is NULL). Also, I noticed that plotExpression does not work currently because My suggestion:
|
Reason for using scater::plotExpression has been that it is easier to use existing methods. Other option would be to reuse code from scater in miaViz. However, the scater pkg is under GPL-3 so we cannot use the code unless we change miaViz license (which we are by default unwilling to do). Thus we need to either stick to scater::plotExpression, or rewrite the necessary parts. Both are feasible options in principle. But calling scater would be less work. If scater does not allow NULL ranks as discussed above, how about augmenting rowData internally and creating a new rowData field that equals to the assay rows. Then the agglomeration would not change the data but scater::plotExpression could still work? The function .find_lowest_taxonomy_level might be otherwise useful somewhere. |
Yes, but plotAbundance and plotExpression creates totally different kinds of plots.
Currently, we cannot create plotAbundance-type plots for ASV level. So instead of using plotExpression, we could create plotAbundance-type plots also when rank==NULL |
I agree about separating plotAbundance and plotExpression. I suggest that @TuomasBorman you check this with @Insaynoah when your respective schedules allow. |
@Insaynoah Can you modify plotAbundance so that it does not utilize plotExpression when rank = NULL? |
One thing i don't really get is that if rank is set to null, since the plot's colors come from the rank, it will just create gray bars like this: Thus, i don't really see how to make a usuable graph like this one. |
Hmmm, true... Well, if rank is NULL, then:
Does that work? The problem might be that there are lots of rows to plot, but this is still useful since user might have already subsetted the data |
That is actually a very good solution. Now when rank is set to null, the plot will color by each individual rowname, creating something like this However if you agglomerate the data beforehand such as by phylum it will create something like this: Also when rank is set to null, I added a condition where order_sample_by should also be null because this parameter is dependant on the rank. Let me know if I should make any changes. |
It seems as though, there's an issue with plotabundance in the Rmd file. But now that plotAbundance is updated to not use plotExpression when rank is null, the following line throws an error.
Should the example be updated to not expect a plotExpression plot ? |
I will comment on this later today in more detail, but you can use plotExpression() directly. That is handy function to visualize assay. (There will be support for boxplots in couple of days) Also check OMA examples on this |
Also OMA chapter 8 on this would require updating once this is done: |
Hi @Insaynoah can we aim to close this soon? |
I think this one should be done no ? Am i missing something ? |
Ok to me (with no detailed testing). |
Ah, also confirm that you have updated vignettes/ folder if that contains examples. If @TuomasBorman approves and merges this, please check that OMA examples are subsequently updated as well if this is used somewhere. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Can you still update documentation (description of rank and examples). Especially, you should modify example of rank = NULL. Abundance plot is not possible to do with too many features. You should first agglomerate the data and then plot.
- Checks are failing
- Can you update .get_abundance_data to use agglomerateByRank and meltSE
@Insaynoah any chance to fix this one..? |
This PR:
The check fails in Mac and Win are caused by old version of mia. For some reason, they are not updated even though GHA should fetch the devel version of mia. The check fail in linux is caused by permission issues since only devel branch can push to gh-pages branch. That issue is fixed in devel branch automatically. There seems not to be any other issues, so this PR can be merged. |
The original idea was for the default value to be equal to null instead of defaulting by 'Kingdom'. However, the plotExpression which is used for if the rank is not specified does not allow rank to be null because of issues with the scater library (merging error). Instead I created a function .find_lowest_taxonomy_level which looks for the lowest taxonomy level of the tse that does not contain NA's then reverts to that rank if the user didn't specify one.