Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Param/input file parsing mistakes in currently public benchmarking runs #560

Open
rodvrees opened this issue Jan 27, 2025 · 2 comments
Open

Comments

@rodvrees
Copy link
Contributor

rodvrees commented Jan 27, 2025

I manually checked all currently public benchmark runs to identify some mistakes in parameter or input proforma parsing.
Safe to say there's some mistakes to iron out. Some probably need a little discussion.
Here's the rundown:

DDA ion quant

General

  • (minor) homogenize modifications

Maxquant

  • Turns out, if you change PSM FDR in the GUI it changes the 'peptideFDR' setting in the mqpar.xml file. This value is parsed as fdr_level_peptide in ProteoBench, but should maybe be parsed as fdr_level_psm. There doesnt seem to be a peptide FDR level setting in the GUI.
  • Fixed modifications are not parsed because they are not explicitely reported in MaxQuant (Fixed modifications are implicit, not explicitly reported by MaxQuant #545)
  • MaxQuant_20241216_130203 is not available on the server
  • MaxQuant_20241216_134651 is not available on the server

ProlineStudio

  • N-terminal modification parsing is incorrect, "-" is missing
    Right now: [Acetyl]ACDEFGHI
    Should be: [Acetyl]-ACDEFGHI
    EDIT: this error already fixed, but point needs to be resubmitted to reflect fix (Issue with MSAngel Prolinestudio #535)
  • (very minor) It would be easier if the output file could be downloaded as xlsx rather than txt because the txt won't be readable until extension is changed

i2MassChroQ

  • Correctness of cleavage pattern -> protease needs to be checked by someone who is familiar with the tool, because it is unintuitive to check by someone who doesn't know the tool
  • Minimum charge is hardcoded as 1 but no Charge 1 IDs were made, so is this correct?
  • N-terminal modification parsing is incorrect
    Right now: A[Acetyl]CDEFGHI
    Should be: [Acetyl]-ACDEFGHI
    EDIT: Issue is that i2MassChroq proforma column is directly used without parsing. Indeed in the i2MC output the acetylation is also put on the first AA instead of N-term. Is this intentional behaviour? Probably need to ask devs.
  • In the runs with Sage as search engine, Oxidation is incorrectly parsed as mod:00719. This is not an issue with the X!Tandem runs
    EDIT: Fix Oxidation parsing i22MassChroQ + Sage #561, will also no longer be an issue with newer versions of iMass2ChroQ, as per Fix Oxidation parsing i22MassChroQ + Sage #561 (comment)

AlphaPept

  • According to the codebase of Alphapept, peptide_FDR parameter is used to do precursor level scoring, so should be parsed as psm_fdr? (ask devs)
  • N-terminal modifications are not parsed (prot. Nter mod. not parsed for Alphapept #508)
  • N-terminal mod parsing is incorrect
    Right now: A[Acetyl-]CDEFGHI
    Should be: [Acetyl]-ACDEFGHI

DIA ion quant

General

  • (minor) match between run column is set to float whereas in DDA its set to bool (which it should probably be)
    EDIT: I think this is because of Spectronaut defaulting its value to None, which will be fixed.
  • Do we need a 'second pass' column if -IMO- this is very similar to MBR, or am I misunderstanding something?
  • There are tolerance unit columns which are unused
    EDIT: I think this will be fixed by resubmitting the points as this parameter is not parsed anywhere
  • (minor) Homogenize modifications
  • Maybe we should discuss whether precursor_fdr can be used instead of psm_fdr since it makes more sense for DIA, at least to me

DIA-NN

  • --q-value in DIANN CLI is precursor level, so probably should go in psm_fdr instead of peptide_fdr?
  • Fragment tolerance is now parsed as being identical to precursor tolerance, which is incorrect. By default fragment tolerance is optimized separately for each run, so how should we deal with this?
  • MBR should definitely not be defaulted to False, but this comes back to above comment about 'second pass'
  • Protein inference can be parsed from command: --relaxed-prot-inference

FragPipe

AlphaDIA

Spectronaut

  • Set MBR to false by default instead of nan?
    **EDIT: Fixed Fix several param parsing issues from #560 #562 **
  • The hardcoding of the system deffault tolerances are specific to orbitrapm which is fine for now but if we ever work with Waters TOF data then the tolerances need to be changed, an approach like is done for maxquant params parsing will work here (add an argument to extract_params)
    **EDIT: Fixed Fix several param parsing issues from #560 #562 **
  • Predictors_library column is filled with the value of "Hybrid library" setting in Spectronaut, what does this setting mean?

As you can see, most of these are pretty minor and quickly fixed. I'll get started on those soon. Others I might need some help with because of unfamiliarity with the tools.
Any thoughts, comments, suggestions?

@mlocardpaulet
Copy link
Contributor

This is very good work. Thanks a lot @rodvrees!
I'll share this issue with the i2MassChroQ developer. He may be able to help out for this tool.
Some of these points we need to ask developers directly. And maybe we need to create a detailed documentation where we describe how we parse parameters, so that it is easy for find these information for everyone?

@OlivierLangella
Copy link

Hi everybody,

concerning the behaviour of i2MassChroQ for Nter acetylation :
Right now: A[Acetyl]CDEFGHI
Should be: [Acetyl]-ACDEFGHI

it's because internally, the peptide model (inside i2m, not Sage or X!Tandem) does not support Nter modification yet (each modification must be related to one amino acid, there is no flag to say that this is a protein or peptide related Nter modification).
I'll fix it as soon as possible, but it implies a lot of checks and tests, this will take some time.

Thanks
Olivier

RobbinBouwmeester added a commit that referenced this issue Jan 31, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants