tlmsa

Detection of SUMOylation sites that emerge through mutations in cancer. The pipeline uses the SUMOnet to predict possible SUMOylation sites.

Running the Pipeline

Pipeline consist of three main part:

Retrieving Mutation data from GDC Database and filtering respected to patients gene that has mutation resulted in lysine and getting all of the mutations of the corresponding genes of the patient since mutation near the mutated K may affect SUMOylation (R code).
Getting wild type sequence and mapping the mutations to wild-type sequence to have mutated sequence of each protein of patient.
Constructing 21 long subsequence (mutated K in the middle) as a input of SUMOnet.

Pipeline can be performed using bash script with a input of path and project name.

./bash_script/tlmsa.sh

Part 1 can be done by retrieveData.R by defining TCGA project name in the code. Once data is retrieved, Part 2 and 3 can be found as a part of tlmsa python package.

Also, part 2 and 3 can perform on a data other than TCGA. Detailed instruction can be found in tutorial.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

tlmsa

Running the Pipeline

Pipeline workflow

Files

README.md

Latest commit

History

README.md

File metadata and controls

tlmsa

Running the Pipeline

Pipeline workflow