Skip to content

Latest commit

 

History

History
29 lines (15 loc) · 1.25 KB

README.md

File metadata and controls

29 lines (15 loc) · 1.25 KB

tlmsa

Detection of SUMOylation sites that emerge through mutations in cancer. The pipeline uses the SUMOnet to predict possible SUMOylation sites.

Running the Pipeline

Pipeline consist of three main part:

  • Retrieving Mutation data from GDC Database and filtering respected to patients gene that has mutation resulted in lysine and getting all of the mutations of the corresponding genes of the patient since mutation near the mutated K may affect SUMOylation (R code).
  • Getting wild type sequence and mapping the mutations to wild-type sequence to have mutated sequence of each protein of patient.
  • Constructing 21 long subsequence (mutated K in the middle) as a input of SUMOnet.

Pipeline can be performed using bash script with a input of path and project name.

./bash_script/tlmsa.sh

Part 1 can be done by retrieveData.R by defining TCGA project name in the code. Once data is retrieved, Part 2 and 3 can be found as a part of tlmsa python package.

Also, part 2 and 3 can perform on a data other than TCGA. Detailed instruction can be found in tutorial.py

Pipeline workflow

workflow