mutation-annotater

Workflow:

annotation pipeline for PDXFinder mut templates using VEP.

The pipeline was written in python 2.7. It has not been tested in higher versions.

The pipeline requires Singularity to run the containerized VEP. After installing Singularity to the system run the install.sh script. The install.sh script will:

pdxfinder/pdx-liftover:vep_release98.3, this is the containerized instance of VEP
Homo_sapiens.GRCh38.dna_sm.primary_assembly.fa.gz, a VEP running dependency
homo_sapiens_merged_vep_98_GRCh38.gz, another vep running dependency
homo_sapiens_refseq_vep_98_GRCh38.gz, another vep running dependency

Once these are installed the variables in the config.yaml must be set.

These are:

fastaDir, the location of the Homo_sapiens.GRCh38.dna_sm.primary_assembly.fa file
alleleDB, location of the homo_sapiens_merged_vep_98_GRCh38 folder
vepSingularityImage, the location of the Singularity image

The rest of config.yaml contains the arguments to pass to the VEP pipeline. The important arguments have comment annotations. Changing an option can completely change the pipelines behavior and results in unwanted behavior (that's difficult to debug). So please only change if you know what you are doing. All the options can be found here: https://grch37.ensembl.org/info/docs/tools/vep/script/vep_options.html#basic

To run the annotater, run the Annotater.py with the target mut.tsv as the argument. Several files will be created with various suffixes. The following are the suffixes:

ANNO, the annotations
log, the log of the process. Check this after the run.
VCF and ENSEMBL, these files are the various formats the pipeline uses to convert data.

After these files are generated run the AnnotaterMerger.py with same mut.tsv argument as before. This will compile the annotations with the original data and output the final .hmz file.

Check lines counts of the original file against the hmz to see if any data was lost. Data lost only happens when formatting errors occur. 100% data conversion is completely feasible.

The run time for this pipeline can be up to several hours if the data set is sufficiently large.

Name		Name	Last commit message	Last commit date
Latest commit History 186 Commits
.idea		.idea
src		src
.gitignore		.gitignore
README.md		README.md
annotation pipeline.png		annotation pipeline.png
install.sh		install.sh
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

mutation-annotater

About

Releases

Packages

Contributors 3

Languages

PDCMFinder/mutation-annotater

Folders and files

Latest commit

History

Repository files navigation

mutation-annotater

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages