Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix issue 88 plotAbundance rank #113

Merged
merged 25 commits into from
Jul 9, 2024
Merged

Conversation

Insaynoah
Copy link
Contributor

The original idea was for the default value to be equal to null instead of defaulting by 'Kingdom'. However, the plotExpression which is used for if the rank is not specified does not allow rank to be null because of issues with the scater library (merging error). Instead I created a function .find_lowest_taxonomy_level which looks for the lowest taxonomy level of the tse that does not contain NA's then reverts to that rank if the user didn't specify one.

@TuomasBorman
Copy link
Contributor

Thanks!

Usually, there are some NAs in microbial datasets even in the highest ranks, so this might not be the most robust option.

Moreover, this still agglomerates the data which was the initial issue. Usually TreeSE is constructed so that the lowest rank (ASV/OTU/strain) is in "row level". The higher ranks are described in rowData (Species, Genus...). --> You cannot find the real lowest rank from rowData since it does not include that kind of information.

Check for example GlobalPatterns data. Agglomerate it to lowest available level in rowData (Species) --> the number of rows differ because there are more OTU level bacteria than Species level.

So the most transparent option would be to skip agglomeration by default.

In function .get_abundance_data, we could modify it so that it first calls agglomerateByRank only if rank is not NULL. (currently the agglomeration is done in the function without using these general mia functions for some reason)

mutate(rank = factor(rowData(x)[,rank], unique(rowData(x)[,rank]))) %>%

So modifications to that function would be

a. Add line that calls agglomerateByRank
b. Remove line 276

Other thing is that then user cannot use plotExpression anymore from the plotAbundance function (plotExpression is called when rank is NULL).

Also, I noticed that plotExpression does not work currently because features cannot be NULL.

My suggestion:

  1. rank = NULL by default
  2. Use agglomerateByRank function in .get_abundance_data function (Skip the function when rank is NULL)
  3. Remove plotExpression from plotAbundance function (User can use it from scater.)

@TuomasBorman
Copy link
Contributor

@antagomir

@antagomir
Copy link
Member

Reason for using scater::plotExpression has been that it is easier to use existing methods. Other option would be to reuse code from scater in miaViz. However, the scater pkg is under GPL-3 so we cannot use the code unless we change miaViz license (which we are by default unwilling to do). Thus we need to either stick to scater::plotExpression, or rewrite the necessary parts. Both are feasible options in principle. But calling scater would be less work.

If scater does not allow NULL ranks as discussed above, how about augmenting rowData internally and creating a new rowData field that equals to the assay rows. Then the agglomeration would not change the data but scater::plotExpression could still work?

The function .find_lowest_taxonomy_level might be otherwise useful somewhere.

@TuomasBorman
Copy link
Contributor

Yes, but plotAbundance and plotExpression creates totally different kinds of plots.

library(miaViz)

data("GlobalPatterns")
tse <- GlobalPatterns

# This is how plotAbundance is "usually" used, --> it is agglomerated to some
# level
# It creates this common plot which is often used for microbiome summary
plotAbundance(tse, "Phylum")

# We have another layout for plotAbundance
plotAbundance(tse, "Phylum", layout = "point")

# If we want to visualize certain ASV level bacteria that belong to certain group for instance
tse_sub <- tse[1:10, ]

# We have to use plotExpression
plotAbundance(tse_sub, features = rownames(tse_sub), rank = NULL)

Currently, we cannot create plotAbundance-type plots for ASV level. So instead of using plotExpression, we could create plotAbundance-type plots also when rank==NULL

@antagomir
Copy link
Member

I agree about separating plotAbundance and plotExpression.

I suggest that @TuomasBorman you check this with @Insaynoah when your respective schedules allow.

@TuomasBorman
Copy link
Contributor

@Insaynoah Can you modify plotAbundance so that it does not utilize plotExpression when rank = NULL?

@Insaynoah
Copy link
Contributor Author

One thing i don't really get is that if rank is set to null, since the plot's colors come from the rank, it will just create gray bars like this:

image

Thus, i don't really see how to make a usuable graph like this one.

@TuomasBorman
Copy link
Contributor

Hmmm, true...

Well, if rank is NULL, then:

  1. add rownames to rowData with name "rownames"
  2. rank <- "rownames"

Does that work?

The problem might be that there are lots of rows to plot, but this is still useful since user might have already subsetted the data

@Insaynoah
Copy link
Contributor Author

That is actually a very good solution. Now when rank is set to null, the plot will color by each individual rowname, creating something like this

image

However if you agglomerate the data beforehand such as by phylum it will create something like this:

image

Also when rank is set to null, I added a condition where order_sample_by should also be null because this parameter is dependant on the rank.

Let me know if I should make any changes.

@Insaynoah
Copy link
Contributor Author

It seems as though, there's an issue with plotabundance in the Rmd file.
It uses the plotAbundance function with rank equal to NULL, expecting a plotExpression plot being returned:

image

But now that plotAbundance is updated to not use plotExpression when rank is null, the following line throws an error.

features <- match.arg(features, colnames(colData(x)))

Should the example be updated to not expect a plotExpression plot ?

@TuomasBorman
Copy link
Contributor

I will comment on this later today in more detail, but you can use plotExpression() directly. That is handy function to visualize assay. (There will be support for boxplots in couple of days)

Also check OMA examples on this

@antagomir
Copy link
Member

Also OMA chapter 8 on this would require updating once this is done:
https://microbiome.github.io/OMA/docs/devel/pages/21_microbiome_community.html

R/plotAbundance.R Outdated Show resolved Hide resolved
R/plotAbundance.R Outdated Show resolved Hide resolved
R/plotAbundance.R Outdated Show resolved Hide resolved
@antagomir
Copy link
Member

Hi @Insaynoah can we aim to close this soon?

@Insaynoah
Copy link
Contributor Author

I think this one should be done no ? Am i missing something ?

@antagomir
Copy link
Member

Ok to me (with no detailed testing).

@antagomir
Copy link
Member

Ah, also confirm that you have updated vignettes/ folder if that contains examples.

If @TuomasBorman approves and merges this, please check that OMA examples are subsequently updated as well if this is used somewhere.

Copy link
Contributor

@TuomasBorman TuomasBorman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Can you still update documentation (description of rank and examples). Especially, you should modify example of rank = NULL. Abundance plot is not possible to do with too many features. You should first agglomerate the data and then plot.
  2. Checks are failing
  3. Can you update .get_abundance_data to use agglomerateByRank and meltSE

R/plotAbundance.R Show resolved Hide resolved
R/plotAbundance.R Outdated Show resolved Hide resolved
R/plotAbundance.R Outdated Show resolved Hide resolved
R/plotAbundance.R Outdated Show resolved Hide resolved
@antagomir
Copy link
Member

@Insaynoah any chance to fix this one..?

@TuomasBorman TuomasBorman linked an issue Jul 9, 2024 that may be closed by this pull request
@TuomasBorman
Copy link
Contributor

This PR:

  1. Fixes issue with rank = NULL in plotAbundance. Now it is possible to plot abundances without agglomeration.
  2. plotAbundance and prevalence plotting functions were using own implementations even though mia already includes implementations for agglomeration, melting etc. I modified the code so that mia is used whenever possible.
  3. I deprecated plotFeaturePrevalence and created new function plotRowPrevalence (we have agreed to use row/col in function names)
  4. The code lacked comments and explanations. I commented and simplified the code for easier maintenance.

The check fails in Mac and Win are caused by old version of mia. For some reason, they are not updated even though GHA should fetch the devel version of mia.

The check fail in linux is caused by permission issues since only devel branch can push to gh-pages branch. That issue is fixed in devel branch automatically. There seems not to be any other issues, so this PR can be merged.

@TuomasBorman TuomasBorman merged commit dde3253 into microbiome:devel Jul 9, 2024
0 of 3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Default ranks
3 participants