-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Illumina: Set Merge Options #53
Comments
What are your read lengths? So the default method is to merge reads with usearch, if a pair does not merge then the forward read is rescued. I have typically been able to merge ~90-95% of reads - we have used 2 x 300 using a double indexed strategy and default illumina sequencing primers. Per the code, the merging command for usearch/vsearch is here, you'll note that I have some somewhat customized calls to usearch, this is actually to increase the number of reads that are merged. As of right now there isn't the ability to customize the merging command in AMPtk. I guess my thought would be to try on a single sample several different parameters to usearch/vsearch and see if you can increase the merging. My guess might be that you have low quality reverse reads and they are getting clipped off and therefore not merged. If the quality score is below 5 it will get truncated before the alignment step. So you can also check your quality with some like fastqc or there are many other ways, but basically look at the quality scores near the 3' end of the reverse reads -- this is typically where some MiSeq runs fail (especially with 2 x 300). If you have low quality reverse reads, you might want to just use the forward reads, or allow the merging as above, but set your trim length to less than your read length, i.e. if you have 250 bp reads, then I'd set it to something like 230-240 so you keep as much data as you can. |
Thanks for the quick reply |
Hello
I'm testing amptk on some ITS data.
The "amptk illumina" log shows some values like:
#########################################
Merging reads 100%
107030 Pairs
8000 Merged (7.5%)
99030 Not merged (92.5%)
Pairs that failed merging due to various reasons:
1008 too few kmers found on same diagonal
389 multiple potential alignments
27692 too many differences
69941 alignment score too low, or score drop to high
############################################
Along all the reads, I got around 60-95 % Not merged.
1st question is:
Is this right? Because after concatenating, the number of valid output reads looks like if most of the pairs were merged (i.e. there is ~25 milion merged reads from 50 milion raw paired-end reads); is this the "--rescue_forward" step?
2nd question is:
Maybe is there any control option for the merging (overlap/diff) that I could make? The vsearch manual state that default parameters are 10 min bases overlap/10 max bases diff (If I understood it right).
Thanks!
The text was updated successfully, but these errors were encountered: