-
Notifications
You must be signed in to change notification settings - Fork 3
4. Workflows
Get the help text and a list of functions available in PIrANHA like so:
piranha -h
piranha -f <TAB>
piranha -f list
piranha -s|--shortlist
Convert between DNA alignment formats like so:
# Convert FASTA to PHYLIP:
piranha -f FASTA2PHYLIP -f 1 -i <inputFASTA> -k 1 -v 1 # Single PHYLIP file
piranha -f FASTA2PHYLIP -f 2 -k 1 -v 1 # Multiple PHYLIP files
# Convert FASTA to VCF:
piranha -f FASTA2VCF -i <inputFASTA> -o <output>
# Convert Mega to PHYLIP:
piranha -f Mega2PHYLIP -i <inputMega> -k 1 # Single Mega file
piranha -f Mega2PHYLIP -m 1 -k 1 # Multiple Mega files
# Convert NEXUS to PHYLIP:
piranha -f NEXUS2PHYLIP -i <inputNEXUS> -v 1 # Single NEXUS file
# Convert PHYLIP to FASTA:
piranha -f PHYLIP2FASTA -i <inputPHYLIP> -k 1 # Single FASTA file
piranha -f PHYLIP2FASTA -m 1 -k 1 -v 1 # Multiple FASTA files
# Convert PHYLIP to Mega:
piranha -f PHYLIP2Mega -i <inputPHYLIP> -k 1 # Single PHYLIP file
piranha -f PHYLIP2Mega -m1 -k 1 # Multiple PHYLIP files
# Convert PHYLIP to NEXUS:
piranha -f PHYLIP2NEXUS -i <inputPHYLIP> -p <partitionsFile> -f NEX # Single PHYLIP file
piranha -f PHYLIP2NEXUS -m 1 # Multiple PHYLIP files
Concatenate DNA sequence alignments (e.g. genes) like so:
# Create <taxonNamesSpaces> file with getTaxonNames function (creates file '<numTips>_taxon_names_spaces.txt'):
piranha -f getTaxonNames -n <numTips>
# Concatenate PHYLIP alignments (e.g. 1 per gene):
piranha -f concatenateSeqs -t <numTips>_taxon_names_spaces.txt
# Complete (fill in missing individuals) and concatenate PHYLIP alignments:
piranha -f completeConcatSeqs -t <numTips>_taxon_names_spaces.txt
Trim DNA sequence alignments like so:
# Use trimSeqs to trim single PHYLIP alignment with default settings and PHYLIP output:
piranha -f trimSeqs -i <inputPHYLIP> -o phylip
piranha -f trimSeqs --input <inputPHYLIP> --output phylip
# Use trimSeqs to trim multiple PHYLIP alignments with default settings and PHYLIP output:
piranha -f trimSeqs -m 1 -o phylip
piranha -f trimSeqs --multi 1 --output phylip
# Use trimSeqs to trim PHYLIP alignments with custom gap handling and sequence conservation settings for trimAl:
piranha -f trimSeqs --multi 1 --output phylip --cons 60 --gt 0.1
# Use trimSeqs to trim PHYLIP alignments stringently, removing all sites with gaps:
piranha -f trimSeqs --multi 1 --output phylip --nogaps 1
# NOTE: You may also switch output formats to FASTA (--output fasta) or NEXUS (--output nexus) formats.
Phase consensus sequences from HTS (e.g. targeted sequence capture) using reference:
# Phase alleles with default settings (creates intermediate files and final, unaligned phased FASTAs):
piranha -f phaseAlleles -i <input> -o <output> -r <reference>
# Phase alleles while masking reference indels (insertions/deletions) in final, unaligned phased FASTAs:
piranha -f phaseAlleles -i <input> -o <output> -r <reference> -m 1
Run standard evolutionary analysis programs (run with -h for help text first):
# Run BEAST:
piranha -f BEASTRunner
# Run ∂a∂i:
piranha -f dadiRunner
piranha -f dadiUncertainty
# Run RAxML:
piranha -f MAGNET
piranha -f RAxMLRunner
# Run RogueNaRok:
piranha -f RogueNaRokRunner
# Run SNAPP:
piranha -f SNAPPRunner
Conduct post-processing of results from standard evolutionary analysis programs (run with -h for help text first):
# Process output from BEAST:
piranha -f MLEResultsProc
piranha -f BEASTPostProc
# Process output from ExaBayes:
piranha -f ExaBayesPostProc
# Process output from MrBayes:
piranha -f MrBayesPostProc
🚧 The remainder of this wiki page (here below) is under construction. In the future, it will illustrate and describe PIrANHA workflows likely to be of most interest to biologists using PIrANHA.
This function allows the user to go directly from PHYLIP
alignment (.phy) and partitions (.partisions) files output by pyRAD
(Eaton 2014) or ipyrad
(Eaton and Overcast 2016; for de novo assembly of reduced-representation sequence data from an NGS experiment) to inference of the optimal partitioning scheme and models of DNA sequence evolution for pyRAD
-defined SNP loci in PartitionFinder (Lanfear et al. 2012, 2016). See current release of pyRAD2PartitionFinder
scripts for more information (e.g. detailed comments located in help text and within the code itself; a README is hopefully coming soon).
This function is comprised of an interactive shell pipeline for inferring maximum-likelihood gene trees in RAxML
(Stamatakis 2014) for multilocus DNA sequence alignments (e.g. RAD loci from ddRAD-seq experiments, candidate genes, genomic contigs) to aid downstream summary-statistics species tree inference. Please see the README for the MAGNET
Package, which is available as its own stand-alone repository so that it can be tracked and continually given its own updated doi and citation by Zenodo. Three starting input file formats are currently supported, including single NEXUS (.nex), single G-PhoCS
(.gphocs; formatted for G-PhoCS
software, Gronau et al. 2011), and multiple PHYLIP
files.
In the present release (v0.4a4), I have worked to further flesh out contributions of PIrANHA to phylogenomics workflows for analyzing targeted sequence capture data (e.g. from Hyb-Seq) by adding the new function assembleReads
, a script that automates de novo assembly of cleaned sequence reads (short reads in FASTQ format) from targeted capture HTS experiments using the ABySS assembler. This is a companion script designed to be run before phaseAlleles
and alignAlleles
. The overall workflow now assembles HTS read data, and phases and aligns consensus sequences based on reads (re)mapped to a reference assembly FASTA file (i.e. following reference-based assembly). This combination of programs was designed to be run 1) in a custom target capture workflow (“Workflow 1” below) or 2) after first conducting cleaning, assembly, locus selection, and reference-based assembly in the SECAPR sequence capture pipeline (Andermann et al. 2018; “Workflow 2” below, tested using output from SECAPR as input for PIrANHA).
There are two recommended workflows:
Workflow 1 (Recommended, most stable):
- Cleaning reads using
fastp
(see here; or similar software). - Read assembly using
assembleReads
, followed by sequence phasing (phaseAlleles
) and alignment of allelic sequences (alignAlleles
) in PIrANHA. - Post-processing and phylogenetic inference.
Workflow 2:
- Read cleaning, assembly, locus selection, and reference-based assembly (specifically created with SECAPR (Andermann et al. 2018).
- Sequence phasing (
phaseAlleles
) and alignment of allelic sequences (alignAlleles
) in PIrANHA. - Post-processing and phylogenetic inference.
BEASTRunner automates conducting multiple runs of BEAST
v1 or v2 (Drummond et al. 2012; Bouckaert et al. 2014) XML input files on a remote supercomputing cluster that uses SLURM resource management with PBS wrappers, or a TORQUE/PBS resource management system. See the BEASTRunner
help text (-h flag) for more information.
The BEAST_PathSampling directory is a new area of development within PIrANHA in which I am actively coding scripts to (1) edit BEAST
v2++ XML files for path sampling analyses (Xie et al. 2011; Baele et al. 2012) and (2) automate moving/running the new path sampling XML files on a supercomputing cluster. Even as of August 2017, this isvery new stuff that is experimental and may still not be working, so stay tuned for more updates soon.
- Andermann et al. 2018 SECAPR paper
- Baele, G., Lemey, P., Bedford, T., Rambaut, A., Suchard, M.A., Alekseyenko, A.V. 2012. Improving the accuracy of demographic and molecular clock model comparison while accommodating phylogenetic uncertainty. Molecular Biology and Evolution 29, 2157-2167.
- Bouckaert, R., Heled, J., Künert, D., Vaughan, T.G., Wu, C.H., Xie, D., Suchard, M.A., Rambaut, A., Drummond, A.J. 2014. BEAST2: a software platform for Bayesian evolutionary analysis. PLoS Computational Biology 10, e1003537.
- Drummond, A.J., Suchard, M.A., Xie, D., Rambaut, A. 2012. Bayesian phylogenetics with BEAUti and the BEAST 1.7. Molecular Biology and Evolution 29, 1969-1973.
- Eaton, D.A. 2014. PyRAD: assembly of de novo RADseq loci for phylogenetic analyses. Bioinformatics 30, 1844-1849.
- Eaton, D.A.R., Overcast, I. 2016. ipyrad: interactive assembly and analysis of RADseq data sets. Available at: http://ipyrad.readthedocs.io/.
- Gronau et al. 2011
- Lanfear, R., Calcott, B., Ho, S.Y.W., Guindon, S. 2012. PartitionFinder: combined selection of partitioning schemes and substitution models for phylogenetic analyses. Molecular Biology and Evolution 29, 1695-1701.
- Lanfear, R., Frandsen, P.B., Wright, A.M., Senfeld, T., Calcott, B. 2016. PartitionFinder 2: new methods for selecting partitioned models of evolution for molecular and morphological phylogenetic analyses. Molecular Biology and Evolution.
- Xie et al. 2011
December 26, 2020 - Justin C. Bagley, Jacksonville, AL, USA