DeepTrio: a ternary prediction system for protein–protein interaction using mask multiple parallel convolutional neural networks
2023-06-08: v1.0.3 - v1.0.4: fix BUG in main.py
, which clarifies the probability type of model output.
2022-11-09: v1.0.2 - v1.0.3: fix some BUGs in build_model_for_hyperparameter_search.py
, and change the command-line parameters for inputting data. model.py
is renamed as build_model.py
2022-05-11: v1.0.1 - v1.0.2: fix some BUGs in model.py
, and change the command-line parameters for inputting data.
2021-11-09: Note for article: the title of Section 2.2.1 should be 'Protein sequence representation', and the reference in the footnote of Table 4 should be Chen et al. (2019).
2021-09-03: v1.0.0 - v1.0.1: adding an alternative function for applying max-pooling on the outer-product of two protein feature maps.
It is recommended to install dependencies in conda virtual environment so that only few installation commands are required for running DeepTrio. You can prepare all the dependencies just by the following commands.
-
Install Miniconda
Miniconda is a free minimal installer for conda. It is a small, bootstrap version of Anaconda that includes only conda, Python, the packages they depend on, and a small number of other useful packages, including pip, zlib and a few others
- Download Miniconda installer for linux : https://docs.conda.io/en/latest/miniconda.html#linux-installers
- Check the hashes for the Miniconda from : https://docs.conda.io/en/latest/miniconda_hashes.html
- Go to the installation directory and run command :
bash Miniconda3-latest-Linux-x86_64.sh
-
Creating the environment
If there is no environment in your Miniconda environment, it is recommeneded to create a new environment to run DeepTrio.
- Run
conda create -n [your env name] python=3.7
- Run
conda activate [your env name]
- Run
pip install --upgrade pip
Run conda install tensorflow-gpu=2.1
WARNING: Using TensorFlow < 2.6.0 may have worse performance with the latest GPU like A100, and it is recommended to use pip to install the latest TensorFlow e.g. 2.12.0)
Runpip install --upgrade tensorflow
- Run
conda install seaborn
- Run
conda install -c conda-forge scikit-learn
- Run
conda install -c conda-forge gpyopt
- Run
conda install -c conda-forge dotmap
- Run
-
To run DeepTrio on your own training data you need to prepare the following two things:
-
Protein-protein Interaction File: A pure protein ID file, in which two protein IDs are separated by the Tab key, along with their label (1 for 'interacting', and 0 for 'non-interacting').
line1: protein_id_1 [Tab] protein_id_2 [Tab] label line2: protein_id_3 [Tab] protein_id_4 [Tab] label
-
Protein Sequence Database File: A file containing protein IDs and their sequences in fasta format, which are separated by the Tab key.
line1: protein_id_1 [Tab] protein_1_sequence line2: protein_id_3 [Tab] protein_2_sequence
-
-
Execute command with arguments in shell:
python build_model.py [-h] [--interaction_data INTERACTION_DATA] [--sequence_data SEQUENCE_DATA] [--fold_index FOLD_INDEX] [--epoch EPOCH] [--outer_product OUTER_PRODUCT] [--cuda]
for example:
python build_model.py --interaction_data data/benchmarks/yeast\ core\ dataset\ from\ DeepFE-PPI/action_pair.tsv --sequence_data data/benchmarks/yeast\ core\ dataset\ from\ DeepFE-PPI/action_dictionary.tsv
Arguments:
Argument Required Default Description --interaction_data Yes The customized name of your Protein-protein Interaction File with its path --sequence_data Yes The customized name of your Protein Sequence Database File with its path --fold_index No 0 The fold index in 5-fold cross-validation --outer_product No False Whether apply max-pooling on outer-product of two proteins --epoch No 50 The maximum number of epochs --cuda No False Allow GPU to perform training process --help No Help message
WARNING: If you only want to train and predict with DeepTrio, it is not recommended to run the hyper-parameter searching program.
-
To run DeepTrio on your own training data and search hyper-parameters, you need to prepare the following two things:
-
Protein-protein Interaction File: A pure protein ID file, in which two protein IDs are separated by the Tab key, alonge with their label (1 for 'interacting', 0 for 'non-interacting' and 2 for 'single protein').
This file must be named as [(your customized name).pair.tsv]. For example:line1: protein_id_1 [Tab] protein_id_2 [Tab] label line2: protein_id_3 [Tab] protein_id_4 [Tab] label
-
Protein Sequence Database File: A file containing protein IDs and their sequences in fasta format, which are separated by the Tab key.
This file must be named as [(your customized name).seq.tsv]. For example:line1: protein_id_1 [Tab] protein_1_sequence line2: protein_id_3 [Tab] protein_2_sequence
-
-
Execute command with arguments in shell:
python build_model_for_hyperparameter_search.py [-h] [--interaction_data INTERACTION_DATA] [--sequence_data SEQUENCE_DATA] [--epoch EPOCH] [--outer_product OUTER_PRODUCT] [--cuda]
for example:
python build_model_for_hyperparameter_search.py --interaction_data data/benchmarks/yeast\ core\ dataset\ from\ DeepFE-PPI/action_pair.tsv --sequence_data data/benchmarks/yeast\ core\ dataset\ from\ DeepFE-PPI/action_dictionary.tsv --cuda
Arguments:
Argument Required Default Description --interaction_data Yes The customized name of your Protein-protein Interaction File with its path --sequence_data Yes The customized name of your Protein Sequence Database File with its path --epoch No 100 The maximum number of epochs --cuda No False Allow GPU to perform training process --help No Help message -
Select the best model according to GpyOpt log file:
DeepTrio_search_1.h5 DeepTrio_search_2.h5 DeepTrio_search_3.h5 DeepTrio_search_4.h5 ... search_log.txt
- The
search_log.txt
shows the details of all the candidate models' parameters and the best model parameters.
result: parameter em_dim: 15.0 parameter sp_drop: 0.005 parameter kernel_rate_1: 0.16 ... evaluation: 0.9795729
- The
-
To run DeepTrio for prediction on your own query protein pairs you need to prepare the following three things:
-
The first protein File: It can contain multiple proteins in fasta format. For example:
line1: >protein_id_1 line2: protein_1_sequence line3: >protein_id_2 line4: protein_2_sequence
-
The second protein File: It can contain multiple proteins in fasta format. For example:
line1: >protein_id_3 line2: protein_3_sequence
-
The model file name and its path.
-
The inputs of DeepTrio will be:
the first query protein pair: protein_1 and protein_3 the second query protein pair: protein_2 and protein_3
-
-
Execute command with arguments in shell:
python main.py [-h] -p1 PROTEIN1 -p2 PROTEIN2 -m MODEL [-o OUTPUT]
Arguments:
Abbreviation Argument Required Description -p1 --protein1 Yes The first protein group in fasta format with its path -p2 --protein2 Yes The second protein group in fasta format with its path -m --model Yes The DeepTrio model with its path -o --output No The output file name -h --help No Help message
-
To run DeepTrio for visualization on your own query protein pairs you need to prepare the following three things:
-
The first protein File: which must contain only one protein in fasta format. For example:
line1: >protein_id_1 line2: protein_1_sequence
-
The second protein File: which must contain only one protein, like
the first protein File
. -
The model file name and its path.
-
-
Execute command with arguments in shell:
python visual_DeepTrio.py [-h] -p1 PROTEIN1 -p2 PROTEIN2 -m MODEL
Arguments:
Abbreviation Argument Required Description -p1 --protein1 Yes The first protein group in fasta format with its path -p2 --protein2 Yes The second protein group in fasta format with its path -m --model Yes The DeepTrio model with its path -h --help No Help message
A) Yes, you need to install some addtional libraries, like GPU drivers, matplotlib, numpy, Gpy and so on, so we recommend to use conda to install dependencies.
A) Yes, you can configure conda virtual environment on your Windows PC.
A) Yes, you can visit our online website : http://bis.zju.edu.cn/deeptrio, where you can predicti PPIs and draw importance maps on the DeepTrio model without any configurations.
If you find DeepTrio useful, please consider citing our publication:
Hu, X., Feng, C., Zhou, Y., Harrison, A., & Chen, M. (2021). DeepTrio: a ternary prediction system for protein-protein interaction using mask multiple parallel convolutional neural networks. Bioinformatics, btab737.