Code and access to data associated with the Enyalius comparative phylogeography project.
First, clone this repository.
To install python packages necessary for analysis, first install miniconda if you do not already have anaconda/miniconda. Then, create a conda environment with the command:
conda env create -f environment.yml
To activate the environment, run:
conda activate enyalius
- In the terminal, make sure you are in the main
enyalius/
directory - Activate the conda environment with
conda activate enyalius
- run
snakemake
. You may want to perform a dry run first withsnakemake -n
. See the snakemake documentation for more on using snakemake - navigate to the
assembly/
directory - run
snakemake
. CAUTION- this is very compute intensive. Make sure you have appropriate resources. You may want to runsnakemake
in parallel with thesnakemake -j
flag - navigate to the
analysis/
folder - Follow the steps in
species_distribution_modeling.md
to run the SDM analysis (may want to view the md file on Github) - run
snakemake
- Follow the steps in
empirical_sumstats.md
to calculate empirical summary statistics - run each file in the
scripts/
folder, following instructions in the file
-
analysis/
- where I house all analysis-related files. It’s organized into subfolders:-
data/
- data files used for analysis-
data/atlantic_forest/atlantic_forest.geojson
- shapefile of the Atlantic Forest boundaries -
data/current_climate_chelsa/
- CHELSA bioclimate data for present-day -
data/vcfs/
- filtered VCFs (output from assembly) for missing data exploration and final analysis -
enyalius_locs.csv
- table of enyalius localities and metadata -
*_inds.txt
- lists of individuals per species
-
-
empirical_sumstats.*
- notebook outlining the calculation of genetic summary statistics for the empirical samples -
scripts/
- Python and R scripts used to conduct the analysis -
single_pop_sumstats.*
- notebook exploring per-species genetic summary statistics calculated for different levels of missing data -
species_distribution_modeling
.* - notebook to run SDMs -
Snakefile
- Snakemake config file for running all analyses -
renv/
- R package management -
renv.lock
- R package management
-
-
assembly/
- files related to Enyalius RADseq assemblyassembly-qc*
- report outlining assembly quality controlconfig.yaml
- configuring file for Snakemake runfastq/
- fastq files (are summarized inmultiqc.html
)processed_localities/
- localities used for assembly filteringscripts/
- scripts used for assembly*inds.txt
-
R/
- R scripts used for analysismake_maps_fns.R
- functions to convert a raster to a SLiM map. Taken from Peter Ralph’snebria
repository
-
renv/
- folder to house R and Python package info. Don’t touch anything here. -
Snakefile
- snakemake config to move files around for analyses and assembly