ArkDTA: Attention Regularization guided by non-Covalent Interactions for Explainable Drug-Target Binding Affinity Prediction
Protein-ligand binding affinity prediction is a central task in drug design and development. Cross-modal attention mechanism has recently become a core component of many deep learning models due to its potential to improve model explainability. Non-covalent interactions, one of the most critical domain knowledge in binding affinity prediction task, should be incorporated in protein-ligand attention mechanism for more explainable deep DTI models. We propose ArkDTA, a novel deep neural architecture for explainable binding affinity prediction guided by non-covalent interactions. Experimental results show that ArkDTA achieves predictive performance comparable to current state-of-the-art models while significantly improving model explainability. Qualitative investigation into our novel attention mechanism reveals that ArkDTA can identify potential regions for non-covalent interactions between candidate drug compounds and target proteins, as well as guiding internal operations of the model in a more interpretable and domain-aware manner. (submitted to ISMB2023, under review)
- Python 3.7.9
- CUDA: 11.X
- Download and extract data.tar.gz (link), 45MB) at current directory. These files are the preprocessed datasets PDBBind (ver.2020), Davis and Metz.
- Download and extract saved.tar.gz (link), 170MB) at directory ./saved. These files are the model checkpoints for each fold of the PDBbind datset.
conda env create -f arkdta.yaml
conda activate arkdta
Run the following code,
python run.py -pn {wandb_project_name} -sn arkdta -mg {multiple gpu indices}
If you want to train ArkDTA on the IC50 subset, configure the /sessions/arkdta.yaml by editing the following,
ba_measure: IC50
Run the following code,
python run.py -pn {wandb_project_name} -sn arkdta -mg {multiple gpu indices} -tm
Configure the /sessions/arkdta.yaml by editing the following,
dataset_subsets: davis
dataset_partition: randomsingle
Then run the following code,
python run.py -pn {wandb_project_name} -sn arkdta -mg {multiple gpu indices} -ft {davis or metz}
Run the following code,
python run.py -pn {wandb_project_name} -sn arkdta -mg {multiple gpu indices} -tm -cn {your/saved/path_davis or _metz}
Run the following script,
./arkdta.sh
You can change the input SMILES (ligands) or FASTA sequence (proteins) by editting the arkdta.sh file.
Name | Affiliation | |
---|---|---|
Mogan Gim | Data Mining and Information Systems Lab, Korea University, Seoul, South Korea |
[email protected] |
Junseok Choe | Data Mining and Information Systems Lab, Korea University, Seoul, South Korea |
[email protected] |
Seungheun Baek | Data Mining and Information Systems Lab, Korea University, Seoul, South Korea |
[email protected] |
Jueon Park | Data Mining and Information Systems Lab, Korea University, Seoul, South Korea |
[email protected] |
Chaeeun Lee | Data Mining and Information Systems Lab, Korea University, Seoul, South Korea |
[email protected] |
Minjae Ju† | LG CNS, AI Research Center, Seoul, South Korea | [email protected] |
Sumin Lee† | LG AI Research, Seoul South Korea | [email protected] |
Jaewoo Kang* | Data Mining and Information Systems Lab, Korea University, Seoul, South Korea |
[email protected] |
- †: This work was done while the author was a graduate student at Korea University Computer Science Department.
- *: Corresponding Author