Skip to content

4. Workflows

Justin C. Bagley edited this page Dec 27, 2020 · 17 revisions

Workflows

Get the help text and a list of functions available in PIrANHA like so:

piranha -h
piranha -f <TAB>
piranha -f list
piranha -s|--shortlist

Convert between DNA alignment formats like so:

# Convert FASTA to PHYLIP:

    piranha -f FASTA2PHYLIP -f 1 -i <inputFASTA> -k 1 -v 1     # Single PHYLIP file
    piranha -f FASTA2PHYLIP -f 2 -k 1 -v 1                     # Multiple PHYLIP files

# Convert FASTA to VCF:

    piranha -f FASTA2VCF -i <inputFASTA> -o <output>

# Convert Mega to PHYLIP:

    piranha -f Mega2PHYLIP -i <inputMega> -k 1                  # Single Mega file
    piranha -f Mega2PHYLIP -m 1 -k 1                            # Multiple Mega files
# Convert NEXUS to PHYLIP:

    piranha -f NEXUS2PHYLIP -i <inputNEXUS> -v 1                # Single NEXUS file

# Convert PHYLIP to FASTA:

    piranha -f PHYLIP2FASTA -i <inputPHYLIP> -k 1              # Single FASTA file
    piranha -f PHYLIP2FASTA -m 1 -k 1 -v 1                     # Multiple FASTA files 

# Convert PHYLIP to Mega:

    piranha -f PHYLIP2Mega -i <inputPHYLIP> -k 1               # Single PHYLIP file
    piranha -f PHYLIP2Mega -m1 -k 1                            # Multiple PHYLIP files

# Convert PHYLIP to NEXUS:

    piranha -f PHYLIP2NEXUS -i <inputPHYLIP> -p <partitionsFile> -f NEX     # Single PHYLIP file 
    piranha -f PHYLIP2NEXUS -m 1                                            # Multiple PHYLIP files

Concatenate DNA sequence alignments (e.g. genes) like so:

# Create <taxonNamesSpaces> file with getTaxonNames function (creates file '<numTips>_taxon_names_spaces.txt'):

    piranha -f getTaxonNames -n <numTips>

# Concatenate PHYLIP alignments (e.g. 1 per gene):

    piranha -f concatenateSeqs -t <numTips>_taxon_names_spaces.txt

# Complete (fill in missing individuals) and concatenate PHYLIP alignments: 

    piranha -f completeConcatSeqs -t <numTips>_taxon_names_spaces.txt

Trim DNA sequence alignments like so:

# Use trimSeqs to trim single PHYLIP alignment with default settings and PHYLIP output:

     piranha -f trimSeqs -i <inputPHYLIP> -o phylip
     piranha -f trimSeqs --input <inputPHYLIP> --output phylip

# Use trimSeqs to trim multiple PHYLIP alignments with default settings and PHYLIP output:

     piranha -f trimSeqs -m 1 -o phylip
     piranha -f trimSeqs --multi 1 --output phylip

# Use trimSeqs to trim PHYLIP alignments with custom gap handling and sequence conservation settings for trimAl:

     piranha -f trimSeqs --multi 1 --output phylip --cons 60 --gt 0.1

# Use trimSeqs to trim PHYLIP alignments stringently, removing all sites with gaps:

     piranha -f trimSeqs --multi 1 --output phylip --nogaps 1

# NOTE: You may also switch output formats to FASTA (--output fasta) or NEXUS (--output nexus) formats.

Phase consensus sequences from HTS (e.g. targeted sequence capture) using reference:

# Phase alleles with default settings (creates intermediate files and final, unaligned phased FASTAs):

    piranha -f phaseAlleles -i <input> -o <output> -r <reference>

# Phase alleles while masking reference indels (insertions/deletions) in final, unaligned phased FASTAs:

    piranha -f phaseAlleles -i <input> -o <output> -r <reference> -m 1

Run standard evolutionary analysis programs (run with -h for help text first):

# Run BEAST:

    piranha -f BEASTRunner

# Run ∂a∂i:

    piranha -f dadiRunner
    piranha -f dadiUncertainty

# Run RAxML:

    piranha -f MAGNET
    piranha -f RAxMLRunner

# Run RogueNaRok: 

    piranha -f RogueNaRokRunner

# Run SNAPP:

    piranha -f SNAPPRunner

Conduct post-processing of results from standard evolutionary analysis programs (run with -h for help text first):

# Process output from BEAST:

    piranha -f MLEResultsProc
    piranha -f BEASTPostProc

# Process output from ExaBayes:

    piranha -f ExaBayesPostProc

# Process output from MrBayes:

    piranha -f MrBayesPostProc

🚧 The remainder of this wiki page (here below) is under construction. In the future, it will illustrate and describe PIrANHA workflows likely to be of most interest to biologists using PIrANHA.

Phylogenetic Partitioning Scheme/Model Selection

pyRAD2PartitionFinder

This function allows the user to go directly from PHYLIP alignment (.phy) and partitions (.partisions) files output by pyRAD (Eaton 2014) or ipyrad (Eaton and Overcast 2016; for de novo assembly of reduced-representation sequence data from an NGS experiment) to inference of the optimal partitioning scheme and models of DNA sequence evolution for pyRAD-defined SNP loci in PartitionFinder (Lanfear et al. 2012, 2016). See current release of pyRAD2PartitionFinder scripts for more information (e.g. detailed comments located in help text and within the code itself; a README is hopefully coming soon).


Estimating Gene Trees for Species Tree Inference

MAGNET (MAny GeNE Trees)

This function is comprised of an interactive shell pipeline for inferring maximum-likelihood gene trees in RAxML (Stamatakis 2014) for multilocus DNA sequence alignments (e.g. RAD loci from ddRAD-seq experiments, candidate genes, genomic contigs) to aid downstream summary-statistics species tree inference. Please see the README for the MAGNET Package, which is available as its own stand-alone repository so that it can be tracked and continually given its own updated doi and citation by Zenodo. Three starting input file formats are currently supported, including single NEXUS (.nex), single G-PhoCS (.gphocs; formatted for G-PhoCS software, Gronau et al. 2011), and multiple PHYLIP files.


Phylogenomics

Targeted sequence capture

In the present release (v0.4a4), I have worked to further flesh out contributions of PIrANHA to phylogenomics workflows for analyzing targeted sequence capture data (e.g. from Hyb-Seq) by adding the new function assembleReads, a script that automates de novo assembly of cleaned sequence reads (short reads in FASTQ format) from targeted capture HTS experiments using the ABySS assembler. This is a companion script designed to be run before phaseAlleles and alignAlleles. The overall workflow now assembles HTS read data, and phases and aligns consensus sequences based on reads (re)mapped to a reference assembly FASTA file (i.e. following reference-based assembly). This combination of programs was designed to be run 1) in a custom target capture workflow (“Workflow 1” below) or 2) after first conducting cleaning, assembly, locus selection, and reference-based assembly in the SECAPR sequence capture pipeline (Andermann et al. 2018; “Workflow 2” below, tested using output from SECAPR as input for PIrANHA).

There are two recommended workflows:

Workflow 1 (Recommended, most stable):

  1. Cleaning reads using fastp (see here; or similar software).
  2. Read assembly using assembleReads, followed by sequence phasing (phaseAlleles) and alignment of allelic sequences (alignAlleles) in PIrANHA.
  3. Post-processing and phylogenetic inference.

Workflow 2:

  1. Read cleaning, assembly, locus selection, and reference-based assembly (specifically created with SECAPR (Andermann et al. 2018).
  2. Sequence phasing (phaseAlleles) and alignment of allelic sequences (alignAlleles) in PIrANHA.
  3. Post-processing and phylogenetic inference.

Automating Bayesian evolutionary analyses in BEAST

BEASTRunner

BEASTRunner automates conducting multiple runs of BEAST v1 or v2 (Drummond et al. 2012; Bouckaert et al. 2014) XML input files on a remote supercomputing cluster that uses SLURM resource management with PBS wrappers, or a TORQUE/PBS resource management system. See the BEASTRunner help text (-h flag) for more information.

BEAST_PathSampling

The BEAST_PathSampling directory is a new area of development within PIrANHA in which I am actively coding scripts to (1) edit BEAST v2++ XML files for path sampling analyses (Xie et al. 2011; Baele et al. 2012) and (2) automate moving/running the new path sampling XML files on a supercomputing cluster. Even as of August 2017, this isvery new stuff that is experimental and may still not be working, so stay tuned for more updates soon.

References

  • Andermann et al. 2018 SECAPR paper
  • Baele, G., Lemey, P., Bedford, T., Rambaut, A., Suchard, M.A., Alekseyenko, A.V. 2012. Improving the accuracy of demographic and molecular clock model comparison while accommodating phylogenetic uncertainty. Molecular Biology and Evolution 29, 2157-2167.
  • Bouckaert, R., Heled, J., Künert, D., Vaughan, T.G., Wu, C.H., Xie, D., Suchard, M.A., Rambaut, A., Drummond, A.J. 2014. BEAST2: a software platform for Bayesian evolutionary analysis. PLoS Computational Biology 10, e1003537.
  • Drummond, A.J., Suchard, M.A., Xie, D., Rambaut, A. 2012. Bayesian phylogenetics with BEAUti and the BEAST 1.7. Molecular Biology and Evolution 29, 1969-1973.
  • Eaton, D.A. 2014. PyRAD: assembly of de novo RADseq loci for phylogenetic analyses. Bioinformatics 30, 1844-1849.
  • Eaton, D.A.R., Overcast, I. 2016. ipyrad: interactive assembly and analysis of RADseq data sets. Available at: http://ipyrad.readthedocs.io/.
  • Gronau et al. 2011
  • Lanfear, R., Calcott, B., Ho, S.Y.W., Guindon, S. 2012. PartitionFinder: combined selection of partitioning schemes and substitution models for phylogenetic analyses. Molecular Biology and Evolution 29, 1695-1701.
  • Lanfear, R., Frandsen, P.B., Wright, A.M., Senfeld, T., Calcott, B. 2016. PartitionFinder 2: new methods for selecting partitioned models of evolution for molecular and morphological phylogenetic analyses. Molecular Biology and Evolution.
  • Xie et al. 2011

December 26, 2020 - Justin C. Bagley, Jacksonville, AL, USA

<< Previous (Getting Started) | Next (More Info) >>