In today's class we'll cover the next two steps in the bioinformatics pipeline
- Adaptor Trimming & Read Merging
- Clustering raw eDNA reads into Amplicon Sequence Variants (ASVs)
Fill out Minute cards: https://forms.gle/fK2FGG1uUSoaZTSo6
ALEJANDRO INSERT CONTENT + CODE HERE
Graphics taken from Ben Callahan's lectures at the STAMPS 2024 course at MBL: https://github.com/mblstamps/stamps2024/wiki#17
Different names for eDNA "clusters", which are all essentially a molecular proxy for a biological species (see also slide further down)
- OTU = Operational Taxonomic Unit
- ASV = Amplicon Sequence Variant
- ESV = Exact Sequence Variant
- zOTU = Zero-radius OTU (a denoised sequence)
Previously, we used to cluster eDNA reads under some % identity cutoff (e.g. 97% Operational Taxonomic Units), by either
- comparing against a database and discarding reads that didn't match the database (closed-reference OTUs)
- sequential database comparison followed by de novo clustering of non-database matches (open-reference OTUs)
- De novo clustering of all sequences by comparing eDNA reads against each other (de novo OTUs)
The nucleotide sequence chosen to represent all reads in an OTU is known as a "representative sequence"
McLaren & Callahan, mBio 2018 - https://journals.asm.org/doi/10.1128/mbio.02149-17
Callahan, McMurdie & Holmes 2017: https://www.nature.com/articles/ismej2017119
Ben Callahan's reccomendations:
What issues or biological uncertainties arise when using eDNA reads and ASVs instead of "traditional approaches" (microscopy, cell counts, net collections, etc.)? Take 5 minutes to jot down some thoughts here: https://docs.google.com/document/d/1vUWVESKF8idcMElPR1QVl3EByDOU-6UmsD55VNdKN_s/edit?usp=sharing
ALEJANDRO INSERT CONTENT + CODE HERE