-
Notifications
You must be signed in to change notification settings - Fork 28
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Getting incorrectly formated results with zero_low_counts #356
Comments
Hello, I ran your code with your example data and did not get that error. Does it only happen with the full dataset?
|
Hello Zachary,
So it turns out that in the I attached the wrong version of the meta-data file.
The correct file is now attached
This file is a dataframe with two variables and 30 observations
The first variable is “sample_id” and the second is “group”
The original data file had 10 variables and did not contain a “sample_id” column
With the correct metadata file I get the error regardless of whether I use the small sample otu dataset or the full otu dataset.
From: Zachary Foster ***@***.***>
Sent: Wednesday, October 25, 2023 2:59 PM
To: grunwaldlab/metacoder ***@***.***>
Cc: Hadsell, Darryl L. ***@***.***>; Author ***@***.***>
Subject: Re: [grunwaldlab/metacoder] Getting incorrectly formated results with zero_low_counts (Issue #356)
***CAUTION:*** This email is not from a BCM Source. Only click links or open attachments you know are safe.
…________________________________
Hello,
I ran your code with your example data and did not get that error. Does it only happen with the full dataset?
library(metacoder)
water_otu <- read.delim("sample_Trimmed_otu_for_metacoder_minus_nocall_plus_root_102423.txt", header =T, sep ='\t', check.names = FALSE)
water_samples <- read.delim("Caddo_metadata_DH_102423.txt", header = T, sep = '\t')
obj <- parse_tax_data(water_otu,
class_cols = "lineage", # the column that contains taxonomic information
class_sep = ";", # The character used to separate taxa in the classification
class_regex = "^(.+)__(.+)$", # Regex identifying where the data for each taxon is
class_key = c(tax_rank = "info", # A key describing each regex capture group
tax_name = "taxon_name"))
obj$data$tax_data <- zero_low_counts(obj, data = "tax_data", min_count = 5)
test <- obj$data$tax_data[ , water_samples$sample_id] == 0
no_reads <- rowSums(obj$data$tax_data[, water_samples$sample_id]) == 0
—
Reply to this email directly, view it on GitHub<https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_grunwaldlab_metacoder_issues_356-23issuecomment-2D1779963587&d=DwMFaQ&c=ZQs-KZ8oxEw0p81sqgiaRA&r=XTCyyeX5o3ZkNVzfALlNF9OFiwkE3rMWvWV85Km9iXs&m=ncNEDOJRZpCTjz6UowYa5AxcxtUOY7O1qUwc753ep1-jHHD5aRp35TCqm0JidMI8&s=N9Z3-ULOFlzAdXMufkV4vmq6etXMdflVsHB1-lbxqD8&e=>, or unsubscribe<https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_notifications_unsubscribe-2Dauth_AK2FDPQ37D52YZVXXTRHFDTYBFVR3AVCNFSM6AAAAAA6OKQODSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTONZZHE3DGNJYG4&d=DwMFaQ&c=ZQs-KZ8oxEw0p81sqgiaRA&r=XTCyyeX5o3ZkNVzfALlNF9OFiwkE3rMWvWV85Km9iXs&m=ncNEDOJRZpCTjz6UowYa5AxcxtUOY7O1qUwc753ep1-jHHD5aRp35TCqm0JidMI8&s=c0Qs2dCZzvFkHPqSrB8nu432s2xYzbVgrb3UkXN6RhY&e=>.
You are receiving this because you authored the thread.Message ID: ***@***.******@***.***>>
sample_id group
1 Deep_Open
10 Deep_Matted
11 Shallow_M
12 Deep_Matted
13 Shallow_M
14 Deep_Open
15 Deep_Mattted
16 Deep_Mattted
17 Shallow_M
18 Shallow_Open
19 Shallow_Matted
2 Deep_Matted
20 Deep_Matted
21 Deep_Open
22 Deep_Matted
23 Deep_Open
24 Deep_Open
25 Deep_Matted
26 Deep_Open
27 Shallow_Matted
28 Deep_Open
29 Deep_Matted
3 Deep_Open
30 Deep_Open
4 Shallow_Matted
5 Deep_Matted
6 Shallow_Open
7 Shallow_Open
8 Deep_Open
9 Shallow_Light
|
I noticed that the dataframes in your example are tibbles. I also just tried converting my two dataframes to tibbles but that does not seem to be the answer. here is the correct metadata file |
Thanks for the new file! I can reproduce the error now. The issue is that your sample IDs are being read in as integers instead of strings since they are all numeric. This is one reason ID that start with a non-number character are generally better. At this line:
Here is the updated script that circumvents this error: library(metacoder)
#> This is metacoder version 0.3.6 (stable)
water_otu <- read.delim("sample_Trimmed_otu_for_metacoder_minus_nocall_plus_root_102423.txt", header =T, sep ='\t', check.names = FALSE)
#> Warning in file(file, "rt"): cannot open file
#> 'sample_Trimmed_otu_for_metacoder_minus_nocall_plus_root_102423.txt': No such
#> file or directory
#> Error in file(file, "rt"): cannot open the connection
water_samples <- read.delim("Caddo_meta_102423.txt", header = T, sep = '\t')
#> Warning in file(file, "rt"): cannot open file 'Caddo_meta_102423.txt': No such
#> file or directory
#> Error in file(file, "rt"): cannot open the connection
water_samples$sample_id <- as.character(water_samples$sample_id)
#> Error in eval(expr, envir, enclos): object 'water_samples' not found
obj <- parse_tax_data(water_otu,
class_cols = "lineage", # the column that contains taxonomic information
class_sep = ";", # The character used to separate taxa in the classification
class_regex = "^(.+)__(.+)$", # Regex identifying where the data for each taxon is
class_key = c(tax_rank = "info", # A key describing each regex capture group
tax_name = "taxon_name"))
#> Error in eval(expr, envir, enclos): object 'water_otu' not found
obj$data$tax_data <- zero_low_counts(obj, data = "tax_data", min_count = 5)
#> Error in eval(expr, envir, enclos): object 'obj' not found
test <- obj$data$tax_data[ , water_samples$sample_id] == 0
#> Error in eval(expr, envir, enclos): object 'obj' not found
no_reads <- rowSums(obj$data$tax_data[, water_samples$sample_id]) == 0
#> Error in eval(expr, envir, enclos): object 'obj' not found
no_reads
#> Error in eval(expr, envir, enclos): object 'no_reads' not found Created on 2023-11-02 with reprex v2.0.2 |
Hello,
I am trying to run Metacoder on some 16s data from water samples that was obtained through RTL genomics. I import 2 text files. The first contains the otu with 30 samples x 6,672 otu. the second is the meta data file with sample ids in the first column and trt groups in the second. I have attached samples of both files along with the script that I used. It seems that when I first use the "parse_tax_data" comand the output appears correctly formated. But when I attempt to run the object through the "zero_low_counts" command the resulting tax_data loses the otu_id and lineage columns and the remaining sample columns are converted from to . It turns out that the obj$data$tax_data[ , water_samples$sample_id] == 0 on the full data set it generates a large matrix whereas in your example analysis with human data the object is a "logi" object. If I take only the first 100 lines of the otu data then a logi object is generated but the below error still happens
when I try to run the rowSums comand on obj$data$tax_data[ , water_samples$sample_id]) == 0 i get the following error "Error in rowSums(obj$data$tax_data[, water_samples$sample_id]) :
'x' must be numeric". I have tried numerous ways to correct this issue but to no avail. Can you tell me what the problem might be?
any help please?
Hadsell_script_for_running_Metacoder_w_RTL_genomics_16s.txt
Caddo_metadata_DH_102423.txt
sample_Trimmed_otu_for_metacoder_minus_nocall_plus_root_102423.txt
The text was updated successfully, but these errors were encountered: