-
Notifications
You must be signed in to change notification settings - Fork 0
mattheatley/extract_4fds
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
READ ME This software finds 4-fold degenerate sites (4fds) within an annotated fasta sequence & extracts them from a vcf file on a SLURM cluster. Requirements - python 3.7.3; os, sys, subprocess, shutil, re, argparse, math - samtools 1.9 - gatk4 4.1.4.0 N.B. If software is installed within a conda environment the environment name must be supplied at each stage (see below). SETUP Create the initial default/user-specified working & sub-directories. python pipe_4fds.py -setup [-p </path/to>] [-d <directory_name>] PREPARE FILES Move the fasta, gff & (if also extracting sites) vcf files to the relevant sub-directories. i.e. /ref.gen sequence.fasta /gff annotation.gff /vcf variants.vcf.gz INDEX Index reference genome. python pipe_4fds.py -index FIND SITES Find 4fds coordinates within the fasta sequence. python pipe_4fds.py -find N.B. Two 4fds categories are reported; (i) all 4fds & (ii) those 4fds within all transcripts of a given gene. 4fds within codons disrupted by introns (i.e. phase-1 & phase-2) are also identified. Any unassigned CDS/mRNA (i.e. parent not found) or ambiguous transcripts (i.e. indiscernible codons) are ignored. (see unassigned_CDS, unassigned_mRNA & ignore_dtranscripts respectively in the output directory) EXTRACT SITES Extract 4fds for individual scaffolds from a vcf file. python pipe_4fds.py -extract {all|consistent} MERGE EXTRACTED SITES Merge extracted per-scaffold 4fds vcfs into a genome-wide 4fds vcf. python pipe_4fds.py -merge {all|consistent} ADDITIONAL SETTINGS Flag Default Description -pa <partition> stage-specific specify SLURM partition -no <nodes> stage-specific specify SLURM nodes -nt <ntasks> stage-specific specify SLURM ntasks -me <memory[units]> stage-specific specify SLURM memory -wt <HH:MM:SS> stage-specific specify SLURM walltime -d <diretory_name> ngs_pipe specify working directory name -p </path/to> home path specify working directory path -u <user_name> current user specify SLURM user name -m <email_address> None specify email address to receive SLURM notifications -e <environment_name> ngs_env specify conda environment with software installed -l <limit> 100 specify concurrent task submission limit -pipe submit pipeline itself; submits new tasks up to the limit from the hpcc
About
A pipeline to find & extract 4-fold degenerate sites on a SLURM cluster
Resources
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published