Skip to content

Latest commit

 

History

History
84 lines (54 loc) · 4 KB

Week5a-ASV-clustering.md

File metadata and controls

84 lines (54 loc) · 4 KB

Week 5a Clustering eDNA metabarcoding data

In today's class we'll cover the next two steps in the bioinformatics pipeline

  1. Adaptor Trimming & Read Merging
  2. Clustering raw eDNA reads into Amplicon Sequence Variants (ASVs)

11:10 Minute Cards

Fill out Minute cards: https://forms.gle/fK2FGG1uUSoaZTSo6

11:10-11:25: Adaptor Trimming & Read Merging

ALEJANDRO INSERT CONTENT + CODE HERE

11:25-11:45 Clustering eDNA reads into Amplicon Sequence Variants

Graphics taken from Ben Callahan's lectures at the STAMPS 2024 course at MBL: https://github.com/mblstamps/stamps2024/wiki#17

Different names for eDNA "clusters", which are all essentially a molecular proxy for a biological species (see also slide further down)

  • OTU = Operational Taxonomic Unit
  • ASV = Amplicon Sequence Variant
  • ESV = Exact Sequence Variant
  • zOTU = Zero-radius OTU (a denoised sequence)

Screenshot 2025-02-02 at 2 12 53 PM
Screenshot 2025-02-02 at 2 12 19 PM
Screenshot 2025-02-02 at 2 13 45 PM

Previously, we used to cluster eDNA reads under some % identity cutoff (e.g. 97% Operational Taxonomic Units), by either

  1. comparing against a database and discarding reads that didn't match the database (closed-reference OTUs)
  2. sequential database comparison followed by de novo clustering of non-database matches (open-reference OTUs)
  3. De novo clustering of all sequences by comparing eDNA reads against each other (de novo OTUs)

The nucleotide sequence chosen to represent all reads in an OTU is known as a "representative sequence"

Screenshot 2025-02-02 at 2 15 09 PM
Screenshot 2025-02-02 at 2 21 57 PM

McLaren & Callahan, mBio 2018 - https://journals.asm.org/doi/10.1128/mbio.02149-17


Screenshot 2025-02-02 at 2 23 10 PM

Callahan, McMurdie & Holmes 2017: https://www.nature.com/articles/ismej2017119


Screenshot 2025-02-02 at 2 24 15 PM
Screenshot 2025-02-02 at 2 25 04 PM
Screenshot 2025-02-02 at 2 27 37 PM
Screenshot 2025-02-02 at 2 27 56 PM
Screenshot 2025-02-02 at 2 28 14 PM
Screenshot 2025-02-02 at 2 29 17 PM

Ben Callahan's reccomendations: Screenshot 2025-02-02 at 2 30 05 PM


11:45-11:50 Class Reflection

What issues or biological uncertainties arise when using eDNA reads and ASVs instead of "traditional approaches" (microscopy, cell counts, net collections, etc.)? Take 5 minutes to jot down some thoughts here: https://docs.google.com/document/d/1vUWVESKF8idcMElPR1QVl3EByDOU-6UmsD55VNdKN_s/edit?usp=sharing

11:50 - 12:25 Clustering eDNA data using DADA2

ALEJANDRO INSERT CONTENT + CODE HERE