All notable changes to this project will be documented in this file.
The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.
2.4.0 - 2024-11-08
- Adapted sampling strategy to avoid biases even further #253
2.3.0 - 2024-10-30
- New plotting functions for benchmarking #244
- Integrated new plotting functions in automated training pipeline #244
- Removed automatic storing of benchmarking scores #244
- Integrated loss calculation for validation loss and plots #244
- Validation loss uses all spectrum pairs instead of only 1 spectrum per inchikey #244
- Fingerprint type and number of bits specified in the settings is now correctly used in training and validation (before was set to default values in some instances) #244 and #251
- Removed version warning
2.2.0 - 2024-10-17
- Switch linter (and linting style) from pylint + prospector to ruff #240
- Clearer documentation and naming to run training from existing splits #239
- Fixed one memory leak when running the
parameter_serch
function (there might be more though) #243
2.1.0 - 2024-10-07
- A bug of spectrum pair sampling during training was fixed. Due to this bug for each spectrum only one unique spectrum was sampled, even if multiple spectra were available. The bug was introduced with MS2Deepscore 2.0
- The inchikey pair selection and data generator has been refactored. The new data generator results in a more balanced inchikey distribution. For details see #232
- dense layers are now build with leaky ReLU instead of ReLU #222.
- The zenodo link to the latest model has been updated to a model trained using the new algorithm.
- Missing code documentation #222.
2.0.0 - date...
Large scale expansion, revision, and restructuring of MS2Deepscore.
- Models are now build using PyTorch.
- Models have build-in GPU support (using pytorch).
- new
EmbeddingEvaluatorModel
(Inception Time CNN) - new
LinearModel
for absolute error estimates - new
MS2DeepScoreEvaluated
matchms-style score --> gives "score" and "predicted_absolute_error" - Additional smart binning layer that can handle input of much higher peak resolution (not used as a default!)
- New validation concept --> all-vs-all scores for the validation spectra are computed, but loss is then computed per score bin. This gives better and more significant statistics of the model performance
- New loss functions "Risk Aware MAE" and "Risk Aware MSE" which function similar to MAE or MSE but try to counteract the tendency of a model to predict towards 0.5.
- Losses can now be weighted with a weighting_factor.
- No longer supports Tensorflow/Keras
- The concept of Spectrum binning has changed and is now implemented differently (i.e. no more "missing peaks" as before)
- Monte-Carlo Dropout does not return a score (mean or median) together with percentile-based upper and lower bound (instead of STD or IQR before).
1.0.0 - 2024-03-12
Last version using Tensorflow. Next versions will be using PyTorch.
- Added split_positive_and_negative_mode.py #148
- Added SettingMS2Deepscore #151
- Clearer Warnings when too little input spectra are used in data generator. #155
- Change the max oversampling rate to max_pairs_per_bin #148
- Made spectrum pair selection a lot simpler and fixed mistake #148
- Use DataGeneratorCherrypicked instead of DataGeneratorAllInchikeys in pipelines #148
- Removed M1 Chip compatibility which lead to faulty results depending on Tensorflow version #200
0.5.0 - 2023-08-18
- New
DataGeneratorCherrypicked
as alternative to former data generators #145. This will work better for large datasets and also tried to counteract biases in the chemical similarity scores. - Models can now be trained on selected metadata entries in addition to the spectrum peaks #128.
- New
MetadataFeatureGenerator
class to handle additional metadata more robustly #128 - Workflow scripts for training a new MS2DeepScore model #124. The ease of training MS2Deepscore models is improved, including standard settings and splitting validation and training data.
- In SiameseModel, the attributes are not passed as an argument but instead used by the class.
- Improved plotting functionality. Some additional plotting options were added and plots previously created in notebooks are now functions.
- Linting (code and imports) #145.
0.4.0 - 2023-04-25
- Functions to cover the full pipeline of training a new model #129
- Tensorflow issues when saving/loading models #123
- Random seed is now optional when
fixed_set=True
for the data generator #134 load_model()
functions now auto-detects if a model is multi_inputs or not- Python version support was changed to 3.8, 3.9, 3.10 (other versions should still work but are not systematically tested)
0.3.1 - 2023-01-06
- Minor changes to make tests work with new matchms (>=0.18.0). Older versions should work as well though. #120
0.3.0 - 2022-11-29
- Allow adding metadata to the network inputs, e.g. precursor-m/z using the
additional_inputs
parameter #115
- Update test to work with Tensorflow 2.11 #114
0.2.3 - 2022-03-02
- Fixes issue #97 by raising a ValueError when duplicate InChiKey14 are specified by the user in the reference_scores_df DataFrame.
- Minor linting #93
0.2.2 - 2021-08-19
- now compatible with new Tensorflow 2.6, also checked by additional CI runs for Tensorflow 2.4, 2.5 and 2.6 #92
0.2.1 - 2021-07-20
- Speed improvement of spectrum binning step #90
0.2.0 - 2021-04-01
MS2DeepScoreMonteCarlo
Monte-Carlo dropout based ensembling do obtain mean/median score and STD #65- choice between
median
(default) andmean
ensemble score which come withIQR
andSTD
as uncertainty measures #86 dropout_in_first_layer
option for SiameseModel (default is False) #86use_fixed_set
option for data generators to create deterministic training/testing data with fixed random seed #73
- small update of
create_histograms_plot
to make the plot prettier/better to read #85
- solved minor unclarity with the pair selection for non-available reference scores #79
- solved minor unclarity with the addition of noise peaks during data augmentation #78
0.1.3 - 2021-03-09
- Allow users to define L1 and L2 regularization of
SiameseModel
#67 - Allow users to define number and size of
SiameseModel
#64
0.1.2 - 2021-03-05
create_confusion_matrix_plot
inplotting
#58
0.1.1 - 2021-02-09
- noise peak addition during training via data generators #55
- L1 and L2 regularization for first dense layer #55
- move vector calculation to separate calculate_vectors method #52
0.1.0 - 2021-02-08
- This is the initial version of MS2DeepScore