-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathPhD_02_Abstract.tex
23 lines (12 loc) · 2.71 KB
/
PhD_02_Abstract.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
\begin{abstract}
\thispagestyle{empty}
\begin{center}{\textbf{A NOVEL STATISTICAL FRAMEWORK FOR ASSESSMENT INTRASPECIFIC HAPLOTYPE SAMPLING COMPLETENESS}}\end{center}
\vspace{0.3in}
\ssp
\noindent Jarrett Daniel Phillips \hfill Co-Advisors:\\ \noindent University of Guelph, 2021 \hfill Dr. Daniel Gillis and Dr. Robert Hanner
\dsp
The problem of determining adequate sample sizes necessary for studies of biodiversity conservation and management is a challenging one that has received some attention in recent years. One particular area where the probing of sampling completeness is of utmost priority is DNA barcoding. Species show remarkable genomic marker variation within and among taxa, along with differing evolutionary and life histories. Thus, knowing how many specimens of a given species likely need to be collected to observe the majority of standing COI haplotype diversity present within animal species is a complex question to answer. Estimates of specimen sample sizes for DNA barcoding range from a single individual to hundreds of individuals per species (but typically around 5-10 individuals). However, due to obstacles surrounding project funding and species rarity, often just one or two specimens per species can be reasonably collected. In addition, numerous other factors, especially sequence quality and integrity, hinder the accurate and reliable estimation of specimen sample sizes from existing species-level sequence data found in large DNA repositories.
Here, a deep examination of the genetic specimen sample size problem (GSSSP) is undertaken. Specifically, a novel nonparametric stochastic local search optimization \\ algorithm based on trends in species haplotype accumulation curves, herein called {\tt HACSim} (\textbf{H}aplotype \textbf{A}ccumulation \textbf{C}urve \textbf{Sim}ulator) is introduced. The method, available as an R package, is tested on a variety of both hypothetical and real animal species mined from the Barcode of Life Data Systems (BOLD). Through a detailed statistical simulation study, the approach is demonstrated to work well across all examined scenarios. As {\tt HACSim} makes numerous simplifying assumptions that are unlikely to hold well in practice, such as panmixia (random mating), future work in incorporating elements of population structure is imperative.
In addition, it is argued that DNA barcoding currently lacks in statistical rigor needed to robustly estimate the DNA barcode gap, an important quantity expressing the difference between intraspecific and interspecific genetic variation. A number of accessible statistical solutions revolving around sample sizes needed for gap assessment, as well as visualization and inference are offered in this regard.
\end{abstract}
\normalsize