The P(e):P(o) package

This is the alpha release of pepo, a minimal package to calculate the probabilities of hemiplasy (trait evolution incongruent with species tree due to incomplete lineage sorting) and homoplasy (incongruent traits due to convergent mutations). Currently, the package includes functions to estimate the ratio of these two probabilities, which we call the Hemiplasy Risk Factor (HRF), for all branches on a phylogeny. Provided a tree, branch lengths and population-wide mutation rate in coalescent units (2N), the HRF will give an intuition for the relative importance of hemiplasy in the evolution of a particular clade.

Installation

This package depends on ape, dplyr, and purrr. The last two are part of the tidyverse.

Install pepo from github with: devtools::install_github("guerreror/pepo").

library(pepo)

Quick example

In this example we start with a preloaded phylogeny of Solanum sect Lycopersicon from Pease et al (2016). The tree is already of class phylo (from the 'ape' package).

data("tomato")
class(tomato)
#> [1] "phylo"

The two functions you'll need from pepo are: prep_branch_lengths() and tree_hrf(). The former returns a tibble (a tidy data frame) with variables that will be needed by the latter.

tomato_branches <- prep_branch_lengths(tomato) 
tomato_branches
#> # A tibble: 36 x 7
#>    code   from    to this_branch descendants ancestor sibling
#>    <chr> <int> <int>       <dbl> <list>         <dbl>   <dbl>
#>  1 20-21    20    21       1.40  <dbl [2]>     NA       4.59 
#>  2 21-22    21    22       1.38  <dbl [2]>      1.40    0.596
#>  3 22-23    22    23       1.56  <dbl [2]>      1.38    0.376
#>  4 23-24    23    24       4.42  <dbl [2]>      1.56    2.76 
#>  5 24-25    24    25       0.525 <dbl [2]>      4.42    0.355
#>  6 25-26    25    26       0.616 <dbl [2]>      0.525   1.63 
#>  7 26-27    26    27       1.07  <dbl [2]>      0.616   1.00 
#>  8 27-1     27     1       0.612 <dbl [0]>      1.07    1.00 
#>  9 27-2     27     2       1.00  <dbl [0]>      1.07    0.612
#> 10 26-3     26     3       1.00  <dbl [0]>      0.616   1.07 
#> # ... with 26 more rows

Then we can call tree_hrf() on that tibble. The function will return the original data frame plus a new variable, hrf. This function assumes branch lengths are in coalescent units (e.g., calculated in MP-EST). The call below will assume the default population-wide mutation rate (0.01).

tomato_hrf <- tree_hrf(tomato_branches)
tomato_hrf
#> # A tibble: 36 x 8
#>    code   from    to this_branch descendants ancestor sibling      hrf
#>    <chr> <int> <int>       <dbl> <list>         <dbl>   <dbl>    <dbl>
#>  1 20-21    20    21       1.40  <dbl [2]>     NA       4.59  NaN     
#>  2 21-22    21    22       1.38  <dbl [2]>      1.40    0.596   0.473 
#>  3 22-23    22    23       1.56  <dbl [2]>      1.38    0.376   0.241 
#>  4 23-24    23    24       4.42  <dbl [2]>      1.56    2.76    0.0341
#>  5 24-25    24    25       0.525 <dbl [2]>      4.42    0.355   0.685 
#>  6 25-26    25    26       0.616 <dbl [2]>      0.525   1.63    0.624 
#>  7 26-27    26    27       1.07  <dbl [2]>      0.616   1.00    0.561 
#>  8 27-1     27     1       0.612 <dbl [0]>      1.07    1.00   NA     
#>  9 27-2     27     2       1.00  <dbl [0]>      1.07    0.612  NA     
#> 10 26-3     26     3       1.00  <dbl [0]>      0.616   1.07   NA     
#> # ... with 26 more rows

Some NA values in the hrf column are normal: the function does not calculate HRF for tips or ancestral branches. This is because the HRF is a property of a branch that has: 1) two descendant lineages, 2) a sister lineage, and 3) an ancestral branch with known length.

That's it. Now we can explore/plot the HRF of all branches in the phylogeny. For example, we can use the ggtree package to plot the tree. The to_treedata() function converts our HRF tibble and phylo tree into a ggtree-compatible object (which allows for easy plotting).

library(ggtree)
solgg <- to_treedata(tomato, tomato_hrf%>%
  mutate(cathrf = cut(hrf, breaks=c(0, 0.2, 0.5, 0.8, 1))))

ggtree(solgg, aes(color=hrf), size=2) + 
  geom_tiplab(color='black') +
  scale_color_gradient2(limits=c(0,1), low='#008080', mid='#f6edbd', high='#ca562c', midpoint=0.5, na.value = 'grey90')+
  theme(legend.position = c(.05, .85))

Hemiplasy Risk in the Great Apes

This is a quick calculation of the HRF on the phylogeny of Human-Chimpanzee-Gorilla.

apetree <- ape::read.tree(text = "((((human,chimpanzee),gorilla),orangutan),out);")
#nodevec <- c("human","chimpanzee", "gorilla", "orangutan", "hcg", "hc")
#two_n_vec <- c(125, 45, 65, 10, 20, 21, 19, 50)*1000*2 #2Ne for each node
two_n_vec <- c(125, 45, 65, 15, 15, 18, 19, 50)*1000*2
pop_std_vec <- two_n_vec/two_n_vec[3] #relative to human 2Ne
time_vec <- c(5, 6.5, 1.5, 4.1, 4.1, 5.5, 12, 15)*1000000 # times in MY
length_vec <- time_vec/(20 * two_n_vec) # coalescent branch lengths: MY/(gen * 2Ne)
apetree$edge.length <- length_vec
plot(apetree); ape::edgelabels(prettyNum(length_vec, digits=2))

ape_hrf <- prep_branch_lengths(apetree)%>% tree_hrf(mutation =(2*10000*10^-8))
ape_hrf
#> # A tibble: 8 x 8
#>   code   from    to this_branch descendants ancestor sibling     hrf
#>   <chr> <int> <int>       <dbl> <list>         <dbl>   <dbl>   <dbl>
#> 1 6-7       6     7       1.00  <dbl [2]>     NA       7.50  NaN    
#> 2 7-8       7     8       3.61  <dbl [2]>      1.00   15.8     0.287
#> 3 8-9       8     9       0.577 <dbl [2]>      3.61    7.64    0.907
#> 4 9-1       9     1       6.83  <dbl [0]>      0.577   6.83   NA    
#> 5 9-2       9     2       6.83  <dbl [0]>      0.577   6.83   NA    
#> 6 8-3       8     3       7.64  <dbl [0]>      3.61    0.577  NA    
#> 7 7-4       7     4      15.8   <dbl [0]>      1.00    3.61   NA    
#> 8 6-5       6     5       7.50  <dbl [0]>     NA       1.00   NA

hc_hrf <- ape_hrf %>% filter(to==9)

# the data here are from Kong et al 2012
# numerators are sums of CpG and nonCpG mutations, from Table 2
# the denominator 2*78*2.583e9 is the number of sites observed (effective bases times transmission events -- two times number of trios)

cpg <- tibble(mu =c(1.2e-8, (2489 + 1516)/(2*78*2.583e9), (855+ 73)/(2*78*48.8e6)))%>%
  mutate(nmu = 2*15000*mu)%>%
  mutate(hc =  map_dbl(nmu, function(x) tree_hrf(hc_hrf, mutation = x, mode = 'strict', pepo=T)$hrf))
cpg
#> # A tibble: 3 x 3
#>              mu      nmu    hc
#>           <dbl>    <dbl> <dbl>
#> 1 0.0000000120  0.000360 5.42 
#> 2 0.00000000994 0.000298 6.55 
#> 3 0.000000122   0.00366  0.521

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
R		R
data		data
man		man
tests		tests
vignettes		vignettes
.Rbuildignore		.Rbuildignore
.gitignore		.gitignore
DESCRIPTION		DESCRIPTION
NAMESPACE		NAMESPACE
README-unnamed-chunk-6-1.png		README-unnamed-chunk-6-1.png
README-unnamed-chunk-7-1.png		README-unnamed-chunk-7-1.png
README.Rmd		README.Rmd
README.html		README.html
README.md		README.md
pepo.Rproj		pepo.Rproj

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

The P(e):P(o) package

Installation

Quick example

Hemiplasy Risk in the Great Apes

About

Releases

Packages

Languages

Angel-Popa/pepo

Folders and files

Latest commit

History

Repository files navigation

The P(e):P(o) package

Installation

Quick example

Hemiplasy Risk in the Great Apes

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages