In today's class we'll cover the next two steps in the bioinformatics pipeline

ALEJANDRO INSERT CONTENT + CODE HERE

Graphics taken from Ben Callahan's lectures at the STAMPS 2024 course at MBL: https://github.com/mblstamps/stamps2024/wiki#17

Different names for eDNA "clusters", which are all essentially a molecular proxy for a biological species (see also slide further down)

Previously, we used to cluster eDNA reads under some % identity cutoff (e.g. 97% Operational Taxonomic Units), by either

comparing against a database and discarding reads that didn't match the database (closed-reference OTUs)
sequential database comparison followed by de novo clustering of non-database matches (open-reference OTUs)
De novo clustering of all sequences by comparing eDNA reads against each other (de novo OTUs)

The nucleotide sequence chosen to represent all reads in an OTU is known as a "representative sequence"

Ben Callahan's reccomendations:

What issues or biological uncertainties arise when using eDNA reads and ASVs instead of "traditional approaches" (microscopy, cell counts, net collections, etc.)? Take 5 minutes to jot down some thoughts here: https://docs.google.com/document/d/1vUWVESKF8idcMElPR1QVl3EByDOU-6UmsD55VNdKN_s/edit?usp=sharing

ALEJANDRO INSERT CONTENT + CODE HERE

Provide feedback

Saved searches