Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Explaining discrepancies in microbiome composition caused by changes in sequence files and denoising methods. #2075

Open
emankhalaf opened this issue Jan 13, 2025 · 0 comments

Comments

@emankhalaf
Copy link

emankhalaf commented Jan 13, 2025

Hi @benjjneb

I am currently working with 16S sequences generated using the PacBio Sequel II platform. My dataset consists of approximately 700 sequence files, representing 16 genotypes. Each genotype includes sequence data from three different tissues. Given the large number of sequence files, the Pseudo option appeared to be the most practical approach for processing.

Additionally, I processed the sequence files for each genotype separately using the pooling option. In this case, I included sequence files from an additional tissue, resulting in each genotype comprising data from four tissues. My primary goal is to identify exact sequence matches of taxa across three specific tissues in each genotype.

When comparing results obtained using the Pseudo approach (applied to approximately 700 sequence files) with the pooling approach (applied to around 60–80 sequence files, including the additional tissue), I observed a reduction in the count of taxa with exact sequence matches across the targeted tissues in the pooling method. However, the overall taxa count for each tissue within each genotype remained relatively stable.

Additionally, I observed that some taxa disappeared from certain tissues when using the Pseudo option but were retained with the pooling method. This discrepancy appears to be influenced by the number of sequence files and the specific tissues included in each processing approach.

My question is: Is this discrepancy between the methods predictable? Additionally, could there be an explanation for the significant difference in the count of taxa with 100% sequence matches across tissues when using the two approaches? Why do some taxa disappear when changing the denoise method?

Your support is much appreciated!

Eman

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant