This repository is for our project Comparative Modeling Pipeline of Protein Kinase Inhibitors.
Figure: Overview of Comparative Modeling Pipeline- Python 3.7 (Pandas)
- Rosetta Software Suite
- PyRosetta Software Suite
- OpenEye Software Suite
- EMBOSS Software Suite
2W1C_A_L0C
├── 2W1C_A.fasta
├── 2W1C_A_L0C.pdb
└── L0C.smi
The initial folder should contain three files and following the naming convention
2W1C_A_L0C.fasta
is the FASTA protein file for the kinase 2W1C2W1C_A_L0C.pdb
is the PDB structure of 2W1C in complex with the L0C ligand or an empty file with the.pdb
extension.L0C.smi
is the SMILES string for the L0C ligand
- This script generates conformers, aligns conformers with a template, and selects the top 100 conformers for Rosetta minimization.
An example command is below:
/work/07424/gabeong/stampede2/anaconda3/bin/python /work/07424/gabeong/stampede2/rosetta_cm/Rosetta_Kinase_CM/modeling_script.py -f /work/07424/gabeong/stampede2/2W1C_A_L0C/ -omega /work/07424/gabeong/stampede2/openeye/bin/omega2 -rocs /work/07424/gabeong/stampede2/openeye/bin/rocs -temp_lig /work/07424/gabeong/stampede2/rosetta_cm/Rosetta_Kinase_CM/template_ligand_library -mol2params /work/07424/gabeong/stampede2/Rosetta/main/source/scripts/python/public/generic_potential/mol2genparams.py -convert /work/07424/gabeong/stampede2/rosetta_cm/Rosetta_Kinase_CM/convert.py
where
/work/07424/gabeong/stampede2/anaconda3/bin/python
is the filepath to the local python installation./work/07424/gabeong/stampede2/rosetta_cm/Rosetta_Kinase_CM/modeling_script.py
is the filepath to the modeling script.- To
-f
add the filepath to the folder described above. - To
-omega
add the filepath to the local installation of OpenEye Omega. - To
-rocs
add the filepath to the local installation of ROCS. - To
-temp_lig
add the filepath to the template ligand library. - To
-mol2params
add the filepath to themol2genparams.py
script in Rosetta. - To
-convert
add the filepath to theconvert.py
script in this repository.
2W1C_A_L0C
├── 2W1C_A.fasta
├── 2W1C_A_L0C.pdb
├── L0C.smi
├── OMEGA
│ ├── L0C_omega.log
│ ├── L0C_omega.parm
│ ├── L0C_omega.rpt
│ ├── L0C_omega.sdf
│ └── L0C_omega_status.txt
├── ROCS [7787 entries exceeds filelimit, not opening dir]
├── mol2params [200 entries exceeds filelimit, not opening dir]
└── top_100_conf [200 entries exceeds filelimit, not opening dir]
- This script conducts target-template sequence alignment, selects top templates, and minimizes the top 10 predicted models with PyRosetta.
- Using the best model we concatenate the 100 conformers from the ligand alignment step, resulting in an unrefined protein-ligand complex of 100 comparative models.
/work/07424/gabeong/stampede2/anaconda3/bin/python /work/07424/gabeong/stampede2/rosetta_cm/Rosetta_Kinase_CM/new_protein_modeling.py -f /work/07424/gabeong/stampede2/2W1C_A_L0C/ -emboss /work/07424/gabeong/stampede2/emboss/bin/needle -temp_seq /work/07424/gabeong/stampede2/rosetta_cm/Rosetta_Kinase_CM/template_fasta_seq_training_set -apo_pdb /work/07424/gabeong/stampede2/rosetta_cm/Rosetta_Kinase_CM/apo_pdbs_for_template_seq_extraction
2W1C_A_L0C
├── 2W1C_A.fasta
├── 2W1C_A_L0C.pdb
├── L0C.smi
├── OMEGA
│ ├── L0C_omega.log
│ ├── L0C_omega.parm
│ ├── L0C_omega.rpt
│ ├── L0C_omega.sdf
│ └── L0C_omega_status.txt
├── ROCS [7787 entries exceeds filelimit, not opening dir]
├── mol2params [200 entries exceeds filelimit, not opening dir]
├── protein_comp_modeling [13 entries exceeds filelimit, not opening dir]
├── protein_ligand_complex_top_1_comp_model [100 entries exceeds filelimit, not opening dir]
└── top_100_conf [200 entries exceeds filelimit, not opening dir]
- This script generates commands for parallelizing the two-step minimization.
- Minimization is performed against KLIFS residues, an 85 residue representation of the ATP binding site.
/work/07424/gabeong/stampede2/anaconda3/bin/python work/07424/gabeong/stampede2/rosetta_cm/Rosetta_Kinase_CM/minimization_script.py -f 2W1C_A_L0C -ma work/07424/gabeong/stampede2/rosetta_cm/Rosetta_Kinase_CM/minimize_ppi_res_of_interest.py -rf work/07424/gabeong/stampede2/rosetta_cm/Rosetta_Kinase_CM/klifs_only_uniprot_resnum
- Here, I will generate the table for 100 minimized structures. It contains the name and energy attributes of those models. Out of 100, the top 10 models will be selected using Rosetta energy values.
5. Complex modeling of remaining protein models (top_comp_prtn_lig_modeling.py) (from step 2, third point)
- Here, the PARAMS files of top 10 ligands will be taken from step 1, fifth point.
- Concatenation of protein-ligand complex (this will again result into 100 complex models)
- The minimization process is same as step 3.
- The analysis process is same as step 4.
- The top 1 model will be reported as the best prediction.
Reach me at [email protected]
This project uses the following license: MIT License