This is the code supporting our work ''ACCELERATED HYDRATION SITE LOCALIZATION AND THERMODYNAMIC PROFILING''. We predict location, entropy and enthalpy of high occupancy hydration sites of proteins.
The data for training and test can be downloaded here.
The conda environment can be created by running
conda env create -f environment.yml
For faster installation we recommend using mamba instead:
mamba env create -f environment.yml
To train the model for hydration site location prediction, execute the command
python -m training.train
Training can be performed on multiple GPUs by changing the line cuda_ids: [0]
in the config file config/location_model.yaml
.
A separate model was trained for predicting thermodynamic properties (i.e. enthalpy and entropy). To train this model, switch the config_name
in
@hydra.main(config_path="../config/", config_name="location_model", version_base="1.1")
to thermo_model
in the training script training/train.py
.
The execute
python -m training.train
After training, the model performance can be evaluated:
python -m inference.evaluation.evaluate_after_clustering
Loading our pretrained model checkpoint, we obtain the following results:
Cutoff (Angstrom) | Ground Truth Revory Rate | Prediction Hit Rate |
---|---|---|
0.5 | 59.0% | 48.3% |
1.0 | 80.2% | 65.9% |
The ground truth recovery rate can be further investigated by differentiating based on occupancy:
Cutoff (Angstrom) | [0.5,0.6] | [0.6,0.7] | [0.7,0.8] | [0.8,0.9] | [0.9,1.0] |
---|---|---|---|---|---|
0.5 | 42.0% | 57.2% | 65.1% | 70.0% | 69.7% |
1.0 | 62.3% | 79.4% | 89.0% | 91.0% | 90.8% |
If we restrict ourselves to the first layer of hydration sites (distance from non-hydrogen atoms no further than
Cutoff (Angstrom) | Ground Truth Revory Rate |
---|---|
0.5 | 62.5% |
1.0 | 84.5% |
Cutoff (Angstrom) | [0.5,0.6] | [0.6,0.7] | [0.7,0.8] | [0.8,0.9] | [0.9,1.0] |
---|---|---|---|---|---|
0.5 | 48.5% | 60.6% | 67.0% | 70.9% | 69.9% |
1.0 | 71.0% | 83.6% | 88.9% | 91.9% | 90.9% |
The second layer water hydration sites (distance from protein non-hydrogen atoms at least
Cutoff (Angstrom) | Ground Truth Revory Rate |
---|---|
0.5 | 15.2% |
1.0 | 26.9% |
Cutoff (Angstrom) | [0.5,0.6] | [0.6,0.7] | [0.7,0.8] | [0.8,0.9] | [0.9,1.0] |
---|---|---|---|---|---|
0.5 | 11.4% | 17.6% | 22.8% | 29.6% | 38.7% |
1.0 | 20.8% | 30.6% | 39.8% | 50.7% | 63.5% |
We evaluate the prediction peformance for enthalpy and entropy by investigating the correlation of the predictions with the simulated ground truth. Running the following command prints Pearson correlation and creates density plots:
python -m inference.visualization.thermodynamics.create_correlation_plots
The correlation between predictions and ground truth is given the following table:
Predicted Variable | Pearson r Correlation with Ground Truth |
---|---|
Enthalpy | 0.8388 |
Entropy | 0.8643 |
For a protein-ligand binding, the ligand displaces the water molecules in the binding pocket. The desolvation free energy can be calculated as
for
We apply our model to predict the hydration sites within a protein binding pocket and calculate the Gibbs free energy config/location_model.yaml
and config/thermo_model.yaml
the data
parameter to data: case_study
.
Then execute the script
python -m inference.visualization.calculate_displaced_waters
The highest Pearson correlation of
In order to predict water molecules with associated enthalpy and entropy, set both in config/location_model.yaml
and config/thermo_model.yaml
the data entry to the data set of interest:
- data: case_study
.
Then run
python -m inference.evaluation.predict_waters
.
The predicted location, entropy and enthalpy e.g. for the protein 1I06
will be saved at
images/case_study/1I06/enthalpy.pt
images/case_study/1I06/entropy.pt
images/case_study/1I06/location_prediction.pt
A pymol visualization is provided at
images/case_study/1I06/ protein_with_predictions.pse
.