Param/input file parsing mistakes in currently public benchmarking runs #560

rodvrees · 2025-01-27T16:56:45Z

I manually checked all currently public benchmark runs to identify some mistakes in parameter or input proforma parsing.
Safe to say there's some mistakes to iron out. Some probably need a little discussion.
Here's the rundown:

DDA ion quant

General

(minor) homogenize modifications

Maxquant

Turns out, if you change PSM FDR in the GUI it changes the 'peptideFDR' setting in the mqpar.xml file. This value is parsed as fdr_level_peptide in ProteoBench, but should maybe be parsed as fdr_level_psm. There doesnt seem to be a peptide FDR level setting in the GUI.
Fixed modifications are not parsed because they are not explicitely reported in MaxQuant (Fixed modifications are implicit, not explicitly reported by MaxQuant #545)
MaxQuant_20241216_130203 is not available on the server
MaxQuant_20241216_134651 is not available on the server

ProlineStudio

N-terminal modification parsing is incorrect, "-" is missing
Right now: [Acetyl]ACDEFGHI
Should be: [Acetyl]-ACDEFGHI
EDIT: this error already fixed, but point needs to be resubmitted to reflect fix (Issue with MSAngel Prolinestudio #535)
(very minor) It would be easier if the output file could be downloaded as xlsx rather than txt because the txt won't be readable until extension is changed

i2MassChroQ

Correctness of cleavage pattern -> protease needs to be checked by someone who is familiar with the tool, because it is unintuitive to check by someone who doesn't know the tool
Minimum charge is hardcoded as 1 but no Charge 1 IDs were made, so is this correct?
N-terminal modification parsing is incorrect
Right now: A[Acetyl]CDEFGHI
Should be: [Acetyl]-ACDEFGHI
EDIT: Issue is that i2MassChroq proforma column is directly used without parsing. Indeed in the i2MC output the acetylation is also put on the first AA instead of N-term. Is this intentional behaviour? Probably need to ask devs.
In the runs with Sage as search engine, Oxidation is incorrectly parsed as mod:00719. This is not an issue with the X!Tandem runs
EDIT: Fix Oxidation parsing i22MassChroQ + Sage #561, will also no longer be an issue with newer versions of iMass2ChroQ, as per Fix Oxidation parsing i22MassChroQ + Sage #561 (comment)

AlphaPept

According to the codebase of Alphapept, peptide_FDR parameter is used to do precursor level scoring, so should be parsed as psm_fdr? (ask devs)
N-terminal modifications are not parsed (prot. Nter mod. not parsed for Alphapept #508)
N-terminal mod parsing is incorrect
Right now: A[Acetyl-]CDEFGHI
Should be: [Acetyl]-ACDEFGHI

DIA ion quant

General

(minor) match between run column is set to float whereas in DDA its set to bool (which it should probably be)
EDIT: I think this is because of Spectronaut defaulting its value to None, which will be fixed.
Do we need a 'second pass' column if -IMO- this is very similar to MBR, or am I misunderstanding something?
There are tolerance unit columns which are unused
EDIT: I think this will be fixed by resubmitting the points as this parameter is not parsed anywhere
(minor) Homogenize modifications
Maybe we should discuss whether precursor_fdr can be used instead of psm_fdr since it makes more sense for DIA, at least to me

DIA-NN

--q-value in DIANN CLI is precursor level, so probably should go in psm_fdr instead of peptide_fdr?
Fragment tolerance is now parsed as being identical to precursor tolerance, which is incorrect. By default fragment tolerance is optimized separately for each run, so how should we deal with this?
MBR should definitely not be defaulted to False, but this comes back to above comment about 'second pass'
Protein inference can be parsed from command: --relaxed-prot-inference

FragPipe

fdr_peptide should be left as None as Fragpipe docs say it only controls precursor and protein FDRs
**EDIT: Fixed Fix several param parsing issues from #560 #562 **

AlphaDIA

Version can be parsed from param file but isn't
**EDIT: Fixed Fix several param parsing issues from #560 #562 **
FDR in log file should be ident_fdr_psm, not protein level
Tolerances should go in ranges
**EDIT: Fixed Fix several param parsing issues from #560 #562 **
Scan window is parsed as the value for max_size_rt? I'm not sure why, altough I'm also not sure what should be in scan_window anyway.

Spectronaut

Set MBR to false by default instead of nan?
**EDIT: Fixed Fix several param parsing issues from #560 #562 **
The hardcoding of the system deffault tolerances are specific to orbitrapm which is fine for now but if we ever work with Waters TOF data then the tolerances need to be changed, an approach like is done for maxquant params parsing will work here (add an argument to extract_params)
**EDIT: Fixed Fix several param parsing issues from #560 #562 **
Predictors_library column is filled with the value of "Hybrid library" setting in Spectronaut, what does this setting mean?

As you can see, most of these are pretty minor and quickly fixed. I'll get started on those soon. Others I might need some help with because of unfamiliarity with the tools.
Any thoughts, comments, suggestions?

mlocardpaulet · 2025-01-29T13:21:41Z

This is very good work. Thanks a lot @rodvrees!
I'll share this issue with the i2MassChroQ developer. He may be able to help out for this tool.
Some of these points we need to ask developers directly. And maybe we need to create a detailed documentation where we describe how we parse parameters, so that it is easy for find these information for everyone?

OlivierLangella · 2025-01-29T13:53:09Z

Hi everybody,

concerning the behaviour of i2MassChroQ for Nter acetylation :
Right now: A[Acetyl]CDEFGHI
Should be: [Acetyl]-ACDEFGHI

it's because internally, the peptide model (inside i2m, not Sage or X!Tandem) does not support Nter modification yet (each modification must be related to one amino acid, there is no flag to say that this is a protein or peptide related Nter modification).
I'll fix it as soon as possible, but it implies a lot of checks and tests, this will take some time.

Thanks
Olivier

Fix several param parsing issues from #560

rodvrees added bug Something isn't working DDA quantification - precursor ions DIA quantification - peptidoform/precursor ion to be discussed a decision still needs to be made labels Jan 27, 2025

rodvrees self-assigned this Jan 27, 2025

rodvrees mentioned this issue Jan 28, 2025

Fix Oxidation parsing i22MassChroQ + Sage #561

Merged

rodvrees mentioned this issue Jan 29, 2025

Fix several param parsing issues from #560 #562

Merged

RobbinBouwmeester added a commit that referenced this issue Jan 31, 2025

Merge pull request #562 from Proteobench/fix-general-dia-issues

46ee938

Fix several param parsing issues from #560

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Param/input file parsing mistakes in currently public benchmarking runs #560

Param/input file parsing mistakes in currently public benchmarking runs #560

rodvrees commented Jan 27, 2025 •

edited

Loading

mlocardpaulet commented Jan 29, 2025

OlivierLangella commented Jan 29, 2025

Param/input file parsing mistakes in currently public benchmarking runs #560

Param/input file parsing mistakes in currently public benchmarking runs #560

Comments

rodvrees commented Jan 27, 2025 • edited Loading

DDA ion quant

DIA ion quant

mlocardpaulet commented Jan 29, 2025

OlivierLangella commented Jan 29, 2025

rodvrees commented Jan 27, 2025 •

edited

Loading