Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bowtie2 (bad greedy) and read multimapping for metagenomes #9

Open
TealFurnholm opened this issue Nov 12, 2020 · 1 comment
Open

Bowtie2 (bad greedy) and read multimapping for metagenomes #9

TealFurnholm opened this issue Nov 12, 2020 · 1 comment

Comments

@TealFurnholm
Copy link

Since this is designed for a meta-NGS data set - and Bowtie2 is not (he says so in his manual).

  • BT2 is a greedy matcher = very low %ID matches will still be reported, it was designed for a single eukaryote genome read alignment, with splicing and SNPs and optimized to find the first best hit
  • BT2 is incomprehensible in its manual to try and adjust to something similar to a %ID
  • 75% of all bacterial genes are orthologs - I curated the entire NCBI+JGI's 529 million genes, I know - and metagenomes are replete with many strains from the same species == you have to multimap the reads.

Instead of Bowtie2, I ran BBmap with 95% ID either with or without multimapping using MEC
(since I still haven't gotten DeepMased to work: see other reported issue)

  • no multimapping (random assign read to one of the best hits): #split_num 741
  • with read multimapping: #split_num 5322

You can see there is quite a difference - and I think you'll find the same with DeepMased.
Orthology/multimapping is a major issue. You may find quite a bit more than 1% chimeras!
Please trust me and check it out.

I plan to check results with MetaQuast to see which is correct, once I get DeepMAsED working.

The REAL question is what will your software do if I feed it a bam file with multimapped read?

Best,
Teal

@nick-youngblut
Copy link
Collaborator

We have not assessed the influence of read mapper on the accuracy of DeepMAsED: both for training and also for using an existing model trained with bowtie2 mapping of reads to contigs. It would be interesting to see how the read mapper affects misassembly identification. The challenge is what ground truth would you use for "true" mappings and how mapping accuracy affects identification of "true" misassemblies.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants