-
Notifications
You must be signed in to change notification settings - Fork 28
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Color nodes according to correlation between relative abundance and external variables #340
Comments
Thanks! Yea, it should be, but there is no function to do that for you yet. You would have to fit a model for each taxon's abundance vs that external variable and add the correlation and p-value as columns in a table with per-taxon data in the |
oops, did not mean to close. |
Apparently |
Thanks for your fast response! |
Here is a prototype of the technique that could be used to make a function in the future. Load librarieslibrary(metacoder)
library(tidyr)
library(readr)
library(readxl)
library(dplyr)
library(purrr) I will use the TARA expedition dataset since that is the first that Parsing taxonomic dataThe data set at the below URL was downloaded and uncompressed: http://taraoceans.sb-roscoff.fr/EukDiv/data/Database_W5_OTU_occurences.tsv.zip raw_data <- readr::read_tsv("data/Database_W5_OTU_occurences.tsv")
obj <- parse_tax_data(raw_data, class_cols = "lineage", class_sep = "\\|", sep_is_regex = TRUE) Getting sample dataThe sample data was downloaded from the URL below: http://taraoceans.sb-roscoff.fr/EukDiv/data/Database_W1_Sample_parameters.xls sample_data <- read_excel("data/Database_W1_Sample_parameters.xls") Caluculate read abundance per taxonThe input data included read abundance for each sample-OTU combination, obj$data$otu_prop <- calc_obs_props(obj, data = "tax_data", cols = sample_data[["PANGAEA ACCESSION NUMBER"]])
obj$data$tax_abund <- calc_taxon_abund(obj, data = "otu_prop",
cols = sample_data[["PANGAEA ACCESSION NUMBER"]])
Looking for correlations between latitude and taxon abundanceI will be using simple linear regression to demonstrate how this might The first step is to get a table for each taxon in a format that typical run_one_test <- function(tax_prop_row) {
sample_ids <- sample_data[["PANGAEA ACCESSION NUMBER"]]
props <- unlist(tax_prop_row[1, sample_ids])
test_data <- tibble(sample_id = names(props), prop = props) %>%
left_join(sample_data, c("sample_id" = "PANGAEA ACCESSION NUMBER"))
lm_result = summary(lm(prop ~ `LATITUDE (Decimal Degrees)`, data = test_data))
output <- tibble(
taxon_id = tax_prop_row$taxon_id,
coeff = lm_result$coefficients[2, 1],
pvalue = lm_result$coefficients[2, 4]
)
return(output)
} And here is how to run that function for each row and format the results obj$data$tax_lm <- obj$data$tax_abund %>%
group_by(taxon_id) %>%
group_split() %>%
map_dfr(run_one_test)
obj
It would also be useful to have the per-taxon mean proportion for obj$data$tax_mean_prop <- calc_group_mean(obj, data = "tax_abund",
cols = sample_data[["PANGAEA ACCESSION NUMBER"]],
groups = "mean_prop") Now we have the results of the per-taxon regression in a format that can obj %>%
filter_taxa(taxon_names == "Bacteria", subtaxa = TRUE, reassign_obs = FALSE) %>%
filter_taxa(n_supertaxa < 3, taxon_names != "NA", reassign_obs = FALSE) %>%
# filter_taxa(mean_prop > 0.00001, reassign_obs = FALSE) %>%
filter_taxa(pvalue < 0.05, supertaxa = TRUE, reassign_obs = FALSE) %>%
heat_tree(node_label = taxon_names,
node_size = mean_prop,
node_color = ifelse(pvalue < 0.05, coeff, 0),
node_color_interval = c(-0.00001, 0.00001),
node_color_range = c("cyan", "gray", "tan"),
node_size_range = c(0.01, 0.04),
node_size_axis_label = "Mean taxon read proportion",
node_color_axis_label = "Regression coeffecient",
layout = "davidson-harel",
initial_layout = "reingold-tilford") Its clearly not the best dataset/variable as far as interesting results go, but it still demonstrates the idea. |
Hi there,
first of all, thanks for taxa and metacoder. These are extremely useful packages.
I was wondering if it would be possible to color the nodes and edges of a taxonomic tree according the correlation between the relative abundance of each taxa and an external variable (e.g. any environmental variable such as pH or Salinity). Thanks!
The text was updated successfully, but these errors were encountered: