Skip to content

annotation pipeline for PDXFinder mut and cna templates using VEP

Notifications You must be signed in to change notification settings

PDCMFinder/mutation-annotater

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

mutation-annotater

Workflow:

Annotation pipeline workflow

annotation pipeline for PDXFinder mut templates using VEP.

The pipeline was written in python 2.7. It has not been tested in higher versions.

The pipeline requires Singularity to run the containerized VEP. After installing Singularity to the system run the install.sh script. The install.sh script will:

  • pdxfinder/pdx-liftover:vep_release98.3, this is the containerized instance of VEP
  • Homo_sapiens.GRCh38.dna_sm.primary_assembly.fa.gz, a VEP running dependency
  • homo_sapiens_merged_vep_98_GRCh38.gz, another vep running dependency
  • homo_sapiens_refseq_vep_98_GRCh38.gz, another vep running dependency

Once these are installed the variables in the config.yaml must be set.

These are:

  • fastaDir, the location of the Homo_sapiens.GRCh38.dna_sm.primary_assembly.fa file
  • alleleDB, location of the homo_sapiens_merged_vep_98_GRCh38 folder
  • vepSingularityImage, the location of the Singularity image

The rest of config.yaml contains the arguments to pass to the VEP pipeline. The important arguments have comment annotations. Changing an option can completely change the pipelines behavior and results in unwanted behavior (that's difficult to debug). So please only change if you know what you are doing. All the options can be found here: https://grch37.ensembl.org/info/docs/tools/vep/script/vep_options.html#basic

To run the annotater, run the Annotater.py with the target mut.tsv as the argument. Several files will be created with various suffixes. The following are the suffixes:

  • ANNO, the annotations
  • log, the log of the process. Check this after the run.
  • VCF and ENSEMBL, these files are the various formats the pipeline uses to convert data.

After these files are generated run the AnnotaterMerger.py with same mut.tsv argument as before. This will compile the annotations with the original data and output the final .hmz file.

Check lines counts of the original file against the hmz to see if any data was lost. Data lost only happens when formatting errors occur. 100% data conversion is completely feasible.

The run time for this pipeline can be up to several hours if the data set is sufficiently large.

About

annotation pipeline for PDXFinder mut and cna templates using VEP

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •