Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Illumina: Set Merge Options #53

Open
Aciole-David opened this issue May 17, 2019 · 2 comments
Open

Illumina: Set Merge Options #53

Aciole-David opened this issue May 17, 2019 · 2 comments

Comments

@Aciole-David
Copy link

Hello
I'm testing amptk on some ITS data.
The "amptk illumina" log shows some values like:
#########################################
Merging reads 100%
107030 Pairs
8000 Merged (7.5%)
99030 Not merged (92.5%)

Pairs that failed merging due to various reasons:
1008 too few kmers found on same diagonal
389 multiple potential alignments
27692 too many differences
69941 alignment score too low, or score drop to high
############################################

Along all the reads, I got around 60-95 % Not merged.
1st question is:
Is this right? Because after concatenating, the number of valid output reads looks like if most of the pairs were merged (i.e. there is ~25 milion merged reads from 50 milion raw paired-end reads); is this the "--rescue_forward" step?

2nd question is:
Maybe is there any control option for the merging (overlap/diff) that I could make? The vsearch manual state that default parameters are 10 min bases overlap/10 max bases diff (If I understood it right).

Thanks!

@nextgenusfs
Copy link
Owner

What are your read lengths?

So the default method is to merge reads with usearch, if a pair does not merge then the forward read is rescued. I have typically been able to merge ~90-95% of reads - we have used 2 x 300 using a double indexed strategy and default illumina sequencing primers.

Per the code, the merging command for usearch/vsearch is here, you'll note that I have some somewhat customized calls to usearch, this is actually to increase the number of reads that are merged.
https://github.com/nextgenusfs/amptk/blob/master/amptk/amptklib.py#L1180-L1183

As of right now there isn't the ability to customize the merging command in AMPtk. I guess my thought would be to try on a single sample several different parameters to usearch/vsearch and see if you can increase the merging. My guess might be that you have low quality reverse reads and they are getting clipped off and therefore not merged. If the quality score is below 5 it will get truncated before the alignment step. So you can also check your quality with some like fastqc or there are many other ways, but basically look at the quality scores near the 3' end of the reverse reads -- this is typically where some MiSeq runs fail (especially with 2 x 300). If you have low quality reverse reads, you might want to just use the forward reads, or allow the merging as above, but set your trim length to less than your read length, i.e. if you have 250 bp reads, then I'd set it to something like 230-240 so you keep as much data as you can.

@Aciole-David
Copy link
Author

Thanks for the quick reply
Our reads are also 2 x 300/double index/MiSeq
We do have low Q around the R2 3'. Maybe adapters read-through too.
I hope using forwards only do the trick
I'll test your suggestion on trim and show the counts
Thanks again

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants