Skip to content

Latest commit

 

History

History
74 lines (40 loc) · 6.12 KB

Week3b-Interpreting-Data.md

File metadata and controls

74 lines (40 loc) · 6.12 KB

Week 3b Bioinformatics Decision Points

No amount of high-end bioinformatics can compensate for poorly prepared samples and it is therefore imperative that careful attention is given to sample preparation and library generation within workflows, especially those involving multiple PCR steps.

11:10 - 11:30am Bioinformatics Workflows in -Omics studies

Graphics taken from Ben Callahan's lectures at the STAMPS 2024 course at MBL: https://github.com/mblstamps/stamps2024/wiki#17

Screenshot 2025-01-22 at 7 10 46 PM Screenshot 2025-01-22 at 7 11 51 PM Screenshot 2025-01-22 at 7 12 15 PM Screenshot 2025-01-22 at 7 12 58 PM Screenshot 2025-01-22 at 7 13 30 PM Screenshot 2025-01-22 at 7 21 53 PM Screenshot 2025-01-22 at 7 22 55 PM Screenshot 2025-01-22 at 7 23 21 PM Screenshot 2025-01-22 at 7 24 22 PM Screenshot 2025-01-22 at 7 25 13 PM Screenshot 2025-01-22 at 7 25 42 PM Screenshot 2025-01-22 at 7 26 01 PM Screenshot 2025-01-22 at 7 26 29 PM Screenshot 2025-01-22 at 7 26 54 PM

11:30-11:40 Decision points in typical -Omics workflows:

  • Initial data QA/QC - interpretation of phred scores, choosing quality cutoff, merging criteria (for Paired-End Illumina reads)
  • Clustering or assembly of raw reads - what algorithm/pipeline to use? Using custom or default software parameters?
  • QA/QC of clusters/contigs - Discard singletons? Discard clusters/contigs not meeting certain criteria (e.g. exclude all Amplicon Seqeunce Variants with <50 reads)? Assessment of contamination and/or filtering out potential contaminant seqeunces?
  • Assigning taxonomy or gene function - What database to use? How to match sequences with taxonomy/functional information (sequenced-based, ontology/hierarcy based, kmer or other method, etc.)?
  • How to filter, categorize, and interpret taxonomy or functional information - Data mining for a specific species/gene/pathway? Broad exploration of data? Grouping into biologically meaningful categories? Employ statistical analyses, e.g differential abundance across categories/genes/taxa?
  • How to visualize patterns in your data - SO MANY DECISIONS, this is very hard. Use existing software packages with standard visualizations? More freeform exploration of data using Base R or your own scripts?
  • How to create a scientific narrative through your choices + figures - this is the culmination of all the above decisions. Are you looking for a specific story (and taking a narrow path of analysis), or do you not know in advance what you should be looking for in your data (and analyzing/visualizing patterns as broadly as possible)? Probably a combination of both, in reality - looking for one story, but keeping your eye out for other patterns.

11:45-11:50am Group Brainstorming

Take 5 minutes and silently brainstorm your most pressing questions on "Bioinformatics Decisions Points" - what things are you struggling with in your own analyses, or what is one area where you need to learn more about to succeed in your own research?

GDoc Link: https://docs.google.com/document/d/1Y1qFFzBRD6J7SEyZBxLT9_X9HR1bsxNmzCEiHLzBxLU/edit?usp=sharing

11:50 - 12:00pm Compiling your study metadata!

Get into pairs, and take 10 minutes to brainstorm a giant list of study metadata that will be relevant to your analysis - this should include things you already have in hand, and things you may need to get from other people (or are results you are waiting on from a lab analysis). This can be anything related to the environment, sample site, time / date / location of sample collection, contextual information, etc.

GDoc Link: https://docs.google.com/document/d/1Y1qFFzBRD6J7SEyZBxLT9_X9HR1bsxNmzCEiHLzBxLU/edit?usp=sharing

12:00 - 12:10pm Group reports & discussion

12:10 - 12:25pm Introducing our class metadata file

Take a peek into "metabarcoding-dataset" mapping and contextual info on GitHub

NOAA Omics data management guide: https://noaa-omics-dmg.readthedocs.io/en/latest/metadata-guidelines.html

Keemei: cloud-based validation of tabular bioinformatics file formats in Google Sheets. Rideout JR, Chase JH, Bolyen E, Ackermann G, González A, Knight R, Caporaso JG. GigaScience. 2016;5:27. http://dx.doi.org/10.1186/s13742-016-0133-6

Keemi Website: https://keemei.qiime2.org/

Keemi Demo metadata file: https://docs.google.com/spreadsheets/d/1_gE_jQcoYGld9aW_dTyE86zdmg1CkNIPHvVJ6CkYvKY/edit?gid=1402180572#gid=1402180572