Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Eliminate evaluate Command #359

Merged
merged 45 commits into from
Aug 21, 2024
Merged
Show file tree
Hide file tree
Changes from 15 commits
Commits
Show all changes
45 commits
Select commit Hold shift + click to select a range
4e24904
prediction output in model eval mode
Lilferrit Jul 30, 2024
82018cf
eliminate eval command, introduce -e flag for predict command
Lilferrit Jul 30, 2024
c59edce
adapted unit test to new model runner and model functionality
Lilferrit Jul 31, 2024
b9f843a
updated documentation
Lilferrit Jul 31, 2024
6716db6
removed log and result files
Lilferrit Jul 31, 2024
dccb729
Generate new screengrabs with rich-codex
github-actions[bot] Jul 31, 2024
5879b0a
Update paper reference (#361)
bittremieux Aug 2, 2024
f792527
Bug report template (#360)
bittremieux Aug 6, 2024
bf1e5a3
upgrade codecove to v4 (#364)
Lilferrit Aug 6, 2024
4c14b94
implemen eval mode at model runner level, fix unit test
Lilferrit Aug 6, 2024
8d3ceba
merge dev
Lilferrit Aug 6, 2024
0862a7c
CLI documentation
Lilferrit Aug 6, 2024
8461608
Generate new screengrabs with rich-codex
github-actions[bot] Aug 6, 2024
059119e
Merge branch 'main' into elim-eval
Lilferrit Aug 6, 2024
87ad500
Merge branch 'elim-eval' of github.com:Noble-Lab/casanovo into elim-eval
Lilferrit Aug 6, 2024
a2b50c1
merge conflict
Lilferrit Aug 7, 2024
148d32a
requested changes
Lilferrit Aug 7, 2024
1981f13
Generate new screengrabs with rich-codex
github-actions[bot] Aug 7, 2024
6ffa3f8
evaluation test cases
Lilferrit Aug 7, 2024
8174ee5
file warnings, evaluation tests
Lilferrit Aug 8, 2024
1aebc64
Merge branch 'elim-eval' of github.com:Noble-Lab/casanovo into elim-eval
Lilferrit Aug 8, 2024
81a3267
fixed ubuntu specific test case bug
Lilferrit Aug 8, 2024
7b9557b
verify annotated mgf files
Lilferrit Aug 9, 2024
5dd591f
verify annotated mgf files
Lilferrit Aug 9, 2024
9c90aee
Merge branch 'elim-eval' of github.com:Noble-Lab/casanovo into elim-eval
Lilferrit Aug 9, 2024
c188df3
Merge branch 'elim-eval' of github.com:Noble-Lab/casanovo into elim-eval
Lilferrit Aug 9, 2024
4b3d1a4
Generate new screengrabs with rich-codex
github-actions[bot] Aug 9, 2024
34fb4d1
Merge branch 'elim-eval' of github.com:Noble-Lab/casanovo into elim-eval
Lilferrit Aug 9, 2024
ba58668
Save best model (#365)
Lilferrit Aug 12, 2024
bd8ceba
prediction output in model eval mode
Lilferrit Jul 30, 2024
d4326b1
eliminate eval command, introduce -e flag for predict command
Lilferrit Jul 30, 2024
b43121e
adapted unit test to new model runner and model functionality
Lilferrit Jul 31, 2024
d9b6f48
updated documentation
Lilferrit Jul 31, 2024
f441034
removed log and result files
Lilferrit Jul 31, 2024
cd9bfe5
implemen eval mode at model runner level, fix unit test
Lilferrit Aug 6, 2024
5cb3e21
CLI documentation
Lilferrit Aug 6, 2024
0bb617d
Bug report template (#360)
bittremieux Aug 6, 2024
7be64ed
requested changes
Lilferrit Aug 7, 2024
b5862b5
evaluation test cases
Lilferrit Aug 7, 2024
d20494c
file warnings, evaluation tests
Lilferrit Aug 8, 2024
9647321
fixed ubuntu specific test case bug
Lilferrit Aug 8, 2024
c4cd147
verify annotated mgf files
Lilferrit Aug 9, 2024
31cc133
removed mgf annotation verification
Lilferrit Aug 12, 2024
695c739
AnnotatedSpectrumIndex type error
Lilferrit Aug 14, 2024
2d882b7
requested changes, changelog entry
Lilferrit Aug 20, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
55 changes: 55 additions & 0 deletions .github/ISSUE_TEMPLATE/bug_report_template.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,55 @@
---
name: Bug Report
about: Submit a Casanovo Bug Report
labels: bug
---

## Describe the Issue
A clear and concise description of what the issue/bug is.

## Steps To Reproduce
Steps to reproduce the incorrect behavior.

## Expected Behavior
A clear and concise description of what you expected to happen.

## Terminal Output (If Applicable)
Provide any applicable console output in between the tick marks below.

```

```

## Environment:
- OS: [e.g. Windows 11, Windows 10, macOS 14, Ubuntu 24.04]
- Casanovo Version: [e.g. 4.2.1]
- Hardware Used (CPU or GPU, if GPU also GPU model and CUDA version): [e.g. GPU: NVIDIA GeForce RTX 2070, CUDA Version: 12.5]

### Checking GPU Version

The GPU model can be checked by typing `nvidia-smi` into a terminal/console window.
An example of how to use this command is shown below.
In this case, the CUDA version is 12.5 and the GPU model is GeForce RTX 2070.


```
(casanovo_env) C:\Users\<user>\OneDrive\Documents\casanovo>nvidia-smi
Fri Aug 2 12:34:57 2024
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 555.99 Driver Version: 555.99 CUDA Version: 12.5 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Driver-Model | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA GeForce RTX 2070 ... WDDM | 00000000:01:00.0 On | N/A |
| N/A 60C P8 16W / 90W | 1059MiB / 8192MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
```

## Additional Context
Add any other context about the problem here.

## Attach Files
Please attach all input files used and the full Casanovo log file.
2 changes: 1 addition & 1 deletion .github/workflows/tests.yml
Original file line number Diff line number Diff line change
Expand Up @@ -42,7 +42,7 @@ jobs:
run: |
pytest --cov=casanovo tests/
- name: Upload coverage to codecov
uses: codecov/codecov-action@v3
uses: codecov/codecov-action@v4
with:
token: ${{ secrets.CODECOV_TOKEN }}
fail_ci_if_error: true
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@
If you use Casanovo in your work, please cite the following publications:

- Yilmaz, M., Fondrie, W. E., Bittremieux, W., Oh, S. & Noble, W. S. *De novo* mass spectrometry peptide sequencing with a transformer model. in *Proceedings of the 39th International Conference on Machine Learning - ICML '22* vol. 162 25514–25522 (PMLR, 2022). [https://proceedings.mlr.press/v162/yilmaz22a.html](https://proceedings.mlr.press/v162/yilmaz22a.html)
- Yilmaz, M., Fondrie, W. E., Bittremieux, W., Nelson, R., Ananth, V., Oh, S. & Noble, W. S. Sequence-to-sequence translation from mass spectra to peptides with a transformer model. in *bioRxiv* (2023). [doi:10.1101/2023.01.03.522621](https://doi.org/10.1101/2023.01.03.522621)
- Yilmaz, M., Fondrie, W. E., Bittremieux, W., Melendez, C.F., Nelson, R., Ananth, V., Oh, S. & Noble, W. S. Sequence-to-sequence translation from mass spectra to peptides with a transformer model. in *Nature Communications* **15**, 6427 (2024). [doi:10.1038/s41467-024-49731-x](https://doi.org/10.1038/s41467-024-49731-x)

## Documentation

Expand Down
56 changes: 21 additions & 35 deletions casanovo/casanovo.py
Original file line number Diff line number Diff line change
Expand Up @@ -128,64 +128,50 @@ def main() -> None:
nargs=-1,
type=click.Path(exists=True, dir_okay=False),
)
@click.option(
"--evaluate",
"-e",
is_flag=True,
default=False,
help="""
Run in evaluation mode. When this flag is set the peptide and amino
acid precision will be calculate and logged at the end of the sequencing
bittremieux marked this conversation as resolved.
Show resolved Hide resolved
run. All input files must be annotated MGF files if running in evaluation
mode.
""",
)
def sequence(
peak_path: Tuple[str],
model: Optional[str],
config: Optional[str],
output: Optional[str],
verbosity: str,
evaluate: bool,
) -> None:
"""De novo sequence peptides from tandem mass spectra.

PEAK_PATH must be one or more mzMl, mzXML, or MGF files from which
to sequence peptides.
PEAK_PATH must be one or more mzML, mzXML, or MGF files from which
bittremieux marked this conversation as resolved.
Show resolved Hide resolved
to sequence peptides. If evaluate is set to true peak_path must be
bittremieux marked this conversation as resolved.
Show resolved Hide resolved
one or more annotated MGF file.
"""
output = setup_logging(output, verbosity)
config, model = setup_model(model, config, output, False)
start_time = time.time()
with ModelRunner(config, model) as runner:
logger.info("Sequencing peptides from:")
logger.info(
"Sequencing %speptides from:",
"and evaluating " if evaluate else "",
)
for peak_file in peak_path:
logger.info(" %s", peak_file)

runner.predict(peak_path, output)
runner.predict(peak_path, output, evaluate=evaluate)
psms = runner.writer.psms
utils.log_sequencing_report(
psms, start_time=start_time, end_time=time.time()
)


@main.command(cls=_SharedParams)
@click.argument(
"annotated_peak_path",
required=True,
nargs=-1,
type=click.Path(exists=True, dir_okay=False),
)
def evaluate(
annotated_peak_path: Tuple[str],
model: Optional[str],
config: Optional[str],
output: Optional[str],
verbosity: str,
) -> None:
"""Evaluate de novo peptide sequencing performance.

ANNOTATED_PEAK_PATH must be one or more annoated MGF files,
such as those provided by MassIVE-KB.
"""
output = setup_logging(output, verbosity)
config, model = setup_model(model, config, output, False)
start_time = time.time()
with ModelRunner(config, model) as runner:
logger.info("Sequencing and evaluating peptides from:")
for peak_file in annotated_peak_path:
logger.info(" %s", peak_file)

runner.evaluate(annotated_peak_path)
utils.log_run_report(start_time=start_time, end_time=time.time())


@main.command(cls=_SharedParams)
@click.argument(
"train_peak_path",
Expand Down
4 changes: 3 additions & 1 deletion casanovo/data/datasets.py
Original file line number Diff line number Diff line change
Expand Up @@ -83,7 +83,9 @@ def __getitem__(
The unique spectrum identifier, formed by its original peak file and
identifier (index or scan number) therein.
"""
mz_array, int_array, precursor_mz, precursor_charge = self.index[idx]
mz_array, int_array, precursor_mz, precursor_charge = self.index[idx][
:4
]
spectrum = self._process_peaks(
mz_array, int_array, precursor_mz, precursor_charge
)
Expand Down
55 changes: 38 additions & 17 deletions casanovo/denovo/model_runner.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@
from pathlib import Path
from typing import Iterable, List, Optional, Union

import depthcharge.masses
import lightning.pytorch as pl
import numpy as np
import torch
Expand All @@ -20,6 +21,7 @@
from ..config import Config
from ..data import ms_io
from ..denovo.dataloaders import DeNovoDataModule
from ..denovo.evaluate import aa_match_batch, aa_match_metrics
from ..denovo.model import Spec2Pep


Expand Down Expand Up @@ -116,36 +118,52 @@
self.loaders.val_dataloader(),
)

def evaluate(self, peak_path: Iterable[str]) -> None:
"""Evaluate peptide sequence preditions from a trained Casanovo model.
def log_metrics(self, test_index: AnnotatedSpectrumIndex) -> None:
"""Log pep_precision and aa_precision
bittremieux marked this conversation as resolved.
Show resolved Hide resolved

Calculate and log peptide precision and amino acid precision
based off of model predictions and spectrum annotations

Parameters
----------
peak_path : iterable of str
The path with MS data files for predicting peptide sequences.

Returns
-------
self
test_index : AnnotatedSpectrumIndex
Index containing the annotated spectra used to generate model
predictions
"""
self.initialize_trainer(train=False)
self.initialize_model(train=False)

test_index = self._get_index(peak_path, True, "evaluation")
self.initialize_data_module(test_index=test_index)
self.loaders.setup(stage="test", annotated=True)
model_output = [psm[0] for psm in self.writer.psms]
spectrum_annotations = [

Check warning on line 134 in casanovo/denovo/model_runner.py

View check run for this annotation

Codecov / codecov/patch

casanovo/denovo/model_runner.py#L133-L134

Added lines #L133 - L134 were not covered by tests
test_index[i][4] for i in range(test_index.n_spectra)
]
bittremieux marked this conversation as resolved.
Show resolved Hide resolved
aa_precision, _, pep_precision = aa_match_metrics(

Check warning on line 137 in casanovo/denovo/model_runner.py

View check run for this annotation

Codecov / codecov/patch

casanovo/denovo/model_runner.py#L137

Added line #L137 was not covered by tests
*aa_match_batch(
spectrum_annotations,
model_output,
depthcharge.masses.PeptideMass().masses,
)
)

self.trainer.validate(self.model, self.loaders.test_dataloader())
logger.info("Peptide Precision: %f", pep_precision)
bittremieux marked this conversation as resolved.
Show resolved Hide resolved
logger.info("Amino Acid Precision: %f", aa_precision)

Check warning on line 146 in casanovo/denovo/model_runner.py

View check run for this annotation

Codecov / codecov/patch

casanovo/denovo/model_runner.py#L145-L146

Added lines #L145 - L146 were not covered by tests

def predict(self, peak_path: Iterable[str], output: str) -> None:
def predict(
self, peak_path: Iterable[str], output: str, evaluate: bool = False
) -> None:
"""Predict peptide sequences with a trained Casanovo model.

Can also evaluate model during prediction if provided with annotated
peak files.

Parameters
----------
peak_path : iterable of str
The path with the MS data files for predicting peptide sequences.
output : str
Where should the output be saved?
evaluate: bool
whether to run model evaluation in addition to inference
Note: peak_path most point to annotated MS data files when
running model evaluation. Files that are not an annotated
peak file format will be ignored if evaluate is set to true.

Returns
-------
Expand All @@ -162,12 +180,15 @@
self.initialize_model(train=False)
self.model.out_writer = self.writer

test_index = self._get_index(peak_path, False, "")
test_index = self._get_index(peak_path, evaluate, "")
self.writer.set_ms_run(test_index.ms_files)
self.initialize_data_module(test_index=test_index)
self.loaders.setup(stage="test", annotated=False)
self.trainer.predict(self.model, self.loaders.test_dataloader())

if evaluate:
self.log_metrics(test_index)

Check warning on line 190 in casanovo/denovo/model_runner.py

View check run for this annotation

Codecov / codecov/patch

casanovo/denovo/model_runner.py#L190

Added line #L190 was not covered by tests

def initialize_trainer(self, train: bool) -> None:
"""Initialize the lightning Trainer.

Expand Down
Loading
Loading