Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement support for uploading (and storing) workflow configuration files #58

Closed
RalfG opened this issue Feb 10, 2023 · 12 comments · Fixed by #101
Closed

Implement support for uploading (and storing) workflow configuration files #58

RalfG opened this issue Feb 10, 2023 · 12 comments · Fixed by #101
Assignees
Labels
enhancement New feature or request high priority

Comments

@RalfG
Copy link
Contributor

RalfG commented Feb 10, 2023

Next to structured and unstructured metadata, we would like to support the upload of configuration files (e.g., mqpar.xml) to be stored alongside the intermediate files and the metadata.

@enryH
Copy link
Member

enryH commented Sep 25, 2023

I had to parse a lot of parameter files for a submission into a table format. It is not yet a standalone module, but the functionality should be fairly general:

Link to script to parse maxquant parameter files

@enryH
Copy link
Member

enryH commented Sep 25, 2023

Aim for now:

  • parse parameter files exported by search engines into a standardized format.
    • e.g. based on mqpar.xml of MaxQuant
  • in case the parameter file cannot be parsed, describe the problem, but continue running (manuell editing)

@mlocardpaulet
Copy link
Contributor

I am making a table with the list of parameters that we want to keep, and their description.
https://docs.google.com/spreadsheets/d/1Os3By8LYuVGcfH65X2-9tSjJnmtH-nL03tPYLtPOj-c/edit#gid=0

@enryH
Copy link
Member

enryH commented Sep 26, 2023

If you look at e.g. json and xml files in the test/params folder - https://github.com/Proteobench/ProteoBench/tree/parse_settings/test/params - an example mapping could be

our parameter name MaxQuant
MS2 mass tolerance '["msmsParamsArray"][0]["MatchTolerance"]'
minimum peptide length '["minPepLen"]'

I think ideally the table will be build from a schema in the code.

Looking at the first example, I would store the maxquant information rather using
["msmsParamsArray"]["FTMS"]["MatchTolerance"]

@mlocardpaulet
Copy link
Contributor

We noticed lack of consistency in the msfragger.params.toml file between the upper/lower tolerance bounds and the mass tolerance:

precursor_mass_lower = -10			# Lower bound of the precursor mass window.
precursor_mass_upper = 10			# Upper bound of the precursor mass window.
precursor_mass_units = 1			# Precursor mass tolerance units (0 for Da, 1 for ppm).
precursor_true_tolerance = 20			# True precursor mass tolerance (window is +/- this value).

We need to contact the developers to ask how to deal with it.

@mlocardpaulet
Copy link
Contributor

precursor_mass_lower and precursor_mass_upper is defined for some pre-searching step. We'll ignore them

@mlocardpaulet
Copy link
Contributor

For FragPipe we need to contact the developers to ask where to find the parameters. We'll send an email when we have a define set of parameters that we need.

@enryH
Copy link
Member

enryH commented Oct 13, 2023

I am making a table with the list of parameters that we want to keep, and their description. https://docs.google.com/spreadsheets/d/1Os3By8LYuVGcfH65X2-9tSjJnmtH-nL03tPYLtPOj-c/edit#gid=0

And here is the table Veit mentioned: link

Potentially it could be a good idea to take their parameter naming to keep the projects in sync.

@mlocardpaulet
Copy link
Contributor

Of course! I totally agree

@enryH enryH linked a pull request Oct 27, 2023 that will close this issue
@enryH
Copy link
Member

enryH commented Oct 27, 2023

@RobbinBouwmeester Can you show me the current results.json? The one I linked in the main folder on the main branch is seven months old and did not look like the one you showed.

@RobbinBouwmeester
Copy link
Contributor

@enryH
Copy link
Member

enryH commented Oct 27, 2023

Discussion with Veit, Viki, Klemen, Julian, Witold and Nadja on a curate list of parameters. Based on that, I will

  1. use as reference yaml of the common parameters the one in sdrf-pipelines
  2. To keep it simple: For a task use the current one which is present when the task is defined -> Need to add a version or timestamp to the reference yaml file through a PR
  3. If a new task needs other parameters: Sumbit a PR, get it accepted, bump version of reference yaml file

Ressources:

  • table with overview in EXLIXIR of shared and categorized common parameters: link
  • documentation on general parameters from Veit and David: link
  • Bard and Marie's table of current parameters: link

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request high priority
Projects
No open projects
Status: Done
Development

Successfully merging a pull request may close this issue.

7 participants