Skip to content

Commit

Permalink
Merge pull request #168 from matchms/dev_pytorch
Browse files Browse the repository at this point in the history
Dev pytorch
  • Loading branch information
niekdejonge authored Mar 21, 2024
2 parents a808f6c + dc407e8 commit 7f35c4e
Show file tree
Hide file tree
Showing 82 changed files with 43,184 additions and 2,986 deletions.
37 changes: 1 addition & 36 deletions .github/workflows/CI_build.yml
Original file line number Diff line number Diff line change
Expand Up @@ -44,7 +44,7 @@ jobs:
fail-fast: false
matrix:
os: ['ubuntu-latest', 'macos-latest', 'windows-latest']
python-version: ['3.8', '3.9', '3.10']
python-version: ['3.9', '3.10', '3.11']
exclude:
# already tested in first_check job
- python-version: 3.9
Expand All @@ -69,38 +69,3 @@ jobs:
- name: Run tests
run: |
pytest
tensorflow_check:
name: Tensorflow version check / python-3.8 / ubuntu-latest
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v2
- name: Set up Python
uses: actions/setup-python@v1
with:
python-version: 3.8
- name: Python info
run: |
which python
python --version
- name: Install Tensorflow version 2.6
run: |
python -m pip install --upgrade pip
pip install "tensorflow>=2.6,<2.7"
- name: Install other dependencies
run: |
pip install -e .[dev,train]
- name: Show pip list
run: |
pip list
- name: Run test with tensorflow version 2.6
run: pytest
- name: Install Tensorflow version 2.8
run: |
pip install --upgrade "numpy<1.24.0"
pip install --upgrade "tensorflow>=2.8,<2.9"
- name: Show pip list
run: |
pip list
- name: Run test with tensorflow version 2.8
run: pytest
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@ xunit-result.xml

docs/_build
docs/apidocs
prototyping/

# ide
.idea
Expand Down
20 changes: 20 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,26 @@ All notable changes to this project will be documented in this file.
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).

## [2.0.0] - date...
Large scale expansion, revision, and restructuring of MS2Deepscore.

### Added
- Models are now build using PyTorch.
- Models have build-in GPU support (using pytorch).
- new `EmbeddingEvaluatorModel` (Inception Time CNN)
- new `LinearModel` for absolute error estimates
- new `MS2DeepScoreEvaluated` matchms-style score --> gives "score" and "predicted_absolute_error"
- Additional smart binning layer that can handle input of much higher peak resolution (not used as a default!)
- New validation concept --> all-vs-all scores for the validation spectra are computed, but loss is then computed per score bin. This gives better and more significant statistics of the model performance
- New loss functions "Risk Aware MAE" and "Risk Aware MSE" which function similar to MAE or MSE but try to counteract the tendency of a model to predict towards 0.5.
- Losses can now be weighted with a weighting_factor.


### Changed
- No longer supports Tensorflow/Keras
- The concept of Spectrum binning has changed and is now implemented differently (i.e. no more "missing peaks" as before)
- Monte-Carlo Dropout does not return a score (mean or median) together with percentile-based upper and lower bound (instead of STD or IQR before).

## [Unreleased]

## [1.0.0] - 2024-03-12
Expand Down
10 changes: 5 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -116,11 +116,13 @@ In that scenario, `scores["score"]` contains the similarity scores (median of th
Training your own model is only recommended if you have some familiarity with machine learning.
To train your own model you can run the code below.
Please first ensure cleaning your spectra. We recommend using the cleaning pipeline in [matchms](https://github.com/matchms/matchms).

```python
from ms2deepscore.train_new_model.SettingMS2Deepscore import \
from ms2deepscore.SettingsMS2Deepscore import
SettingsMS2Deepscore
from ms2deepscore.wrapper_functions.training_wrapper_functions import \
from ms2deepscore.wrapper_functions.training_wrapper_functions import
train_ms2deepscore_wrapper

settings = SettingsMS2Deepscore({"epochs": 300,
"base_dims": (1000, 1000, 1000),
"embedding_dim": 500,
Expand All @@ -129,9 +131,7 @@ settings = SettingsMS2Deepscore({"epochs": 300,
"learning_rate": 0.00025,
"patience": 30,
})
train_ms2deepscore_wrapper(spectra_file_path=,
settings=settings,
validation_split_fraction=20)
train_ms2deepscore_wrapper(spectra_file_path=, model_settings=, validation_split_fraction=20)
```
## Contributing
We welcome contributions to the development of ms2deepscore! Have a look at the [contribution guidelines](https://github.com/matchms/ms2deepscore/blob/main/CONTRIBUTING.md).
48 changes: 0 additions & 48 deletions ms2deepscore/BinnedSpectrum.py

This file was deleted.

73 changes: 16 additions & 57 deletions ms2deepscore/MS2DeepScore.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,8 +2,8 @@
import numpy as np
from matchms import Spectrum
from matchms.similarity.BaseSimilarity import BaseSimilarity
from tqdm import tqdm
from .typing import BinnedSpectrumType
from ms2deepscore.models.SiameseSpectralModel import (SiameseSpectralModel,
compute_embedding_array)
from .vector_operations import cosine_similarity, cosine_similarity_matrix


Expand All @@ -29,7 +29,7 @@ class MS2DeepScore(BaseSimilarity):
queries = load_from_json("xyz.json")
# Load pretrained model
model = load_model("model_file_123.hdf5")
model = load_model("model_file_123.pt")
similarity_measure = MS2DeepScore(model)
# Calculate scores and get matchms.Scores object
Expand All @@ -38,43 +38,25 @@ class MS2DeepScore(BaseSimilarity):
"""

def __init__(self, model, progress_bar: bool = True):
def __init__(self, model: SiameseSpectralModel, progress_bar: bool = True):
"""
Parameters
----------
model:
Expected input is a SiameseModel that has been trained on
the desired set of spectra. The model contains the keras deep neural
network (model.model) as well as the used spectrum binner (model.spectrum_binner).
the desired set of spectra.
progress_bar:
Set to True to monitor the embedding creating with a progress bar.
Default is False.
"""
self.model = model
self.multi_inputs = (model.nr_of_additional_inputs > 0)
if self.multi_inputs:
self.input_vector_dim = [self.model.base.input_shape[0][1], self.model.base.input_shape[1][1]]
else:
self.input_vector_dim = self.model.base.input_shape[1]
self.output_vector_dim = self.model.base.output_shape[1]
self.model.eval()
self.output_vector_dim = self.model.model_settings.embedding_dim
self.progress_bar = progress_bar

def _create_input_vector(self, binned_spectrum: BinnedSpectrumType):
"""Creates input vector for model.base based on binned peaks and intensities"""
if self.multi_inputs:
X = [np.zeros((1, i[1])) for i in self.model.base.input_shape]
idx = np.array([int(x) for x in binned_spectrum.binned_peaks.keys()])
values = np.array(list(binned_spectrum.binned_peaks.values()))

X[0][0, idx] = values
X[1] = np.array([[float(value) for key, value in binned_spectrum.metadata.items() if (key != "inchikey")]])
else:
X = np.zeros((1, self.input_vector_dim))
idx = np.array([int(x) for x in binned_spectrum.binned_peaks.keys()])
values = np.array(list(binned_spectrum.binned_peaks.values()))
X[0, idx] = values
return X
def get_embedding_array(self, spectrums):
return compute_embedding_array(self.model, spectrums)

def pair(self, reference: Spectrum, query: Spectrum) -> float:
"""Calculate the MS2DeepScore similaritiy between a reference and a query spectrum.
Expand All @@ -91,12 +73,9 @@ def pair(self, reference: Spectrum, query: Spectrum) -> float:
ms2ds_similarity
MS2DeepScore similarity score.
"""
binned_reference = self.model.spectrum_binner.transform([reference])[0]
binned_query = self.model.spectrum_binner.transform([query])[0]
reference_vector = self.model.base.predict(self._create_input_vector(binned_reference))
query_vector = self.model.base.predict(self._create_input_vector(binned_query))

return cosine_similarity(reference_vector[0, :], query_vector[0, :])
embedding_reference = self.get_embedding_array([reference])
embedding_query = self.get_embedding_array([query])
return cosine_similarity(embedding_reference[0, :], embedding_query[0, :])

def matrix(self, references: List[Spectrum], queries: List[Spectrum],
array_type: str = "numpy",
Expand All @@ -122,33 +101,13 @@ def matrix(self, references: List[Spectrum], queries: List[Spectrum],
ms2ds_similarity
Array of MS2DeepScore similarity scores.
"""
reference_vectors = self.calculate_vectors(references)
embeddings_reference = self.get_embedding_array(references)
if is_symmetric:
assert np.all(references == queries), \
"Expected references to be equal to queries for is_symmetric=True"
query_vectors = reference_vectors
embeddings_query = embeddings_reference
else:
query_vectors = self.calculate_vectors(queries)
embeddings_query = self.get_embedding_array(queries)

ms2ds_similarity = cosine_similarity_matrix(reference_vectors, query_vectors)
ms2ds_similarity = cosine_similarity_matrix(embeddings_reference, embeddings_query)
return ms2ds_similarity

def calculate_vectors(self, spectrum_list: List[Spectrum]) -> np.ndarray:
"""Returns a list of vectors for all spectra
parameters
----------
spectrum_list:
List of spectra for which the vector should be calculated
"""
n_rows = len(spectrum_list)
reference_vectors = np.empty(
(n_rows, self.output_vector_dim), dtype="float")
binned_spectrums = self.model.spectrum_binner.transform(spectrum_list, progress_bar=self.progress_bar)
for index_reference, reference in enumerate(
tqdm(binned_spectrums,
desc='Calculating vectors of reference spectrums',
disable=(not self.progress_bar))):
reference_vectors[index_reference, 0:self.output_vector_dim] = \
self.model.base.predict(self._create_input_vector(reference), verbose=0)
return reference_vectors
Loading

0 comments on commit 7f35c4e

Please sign in to comment.