Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace mass shifts in peptide proforma strings during FragPipe and Sage parsing with modification names #444

Open
Cajac102 opened this issue Nov 21, 2024 · 6 comments · Fixed by #446
Assignees

Comments

@Cajac102
Copy link
Contributor

Might be a bug, modifications are still reported as mass shifts in FragPipe (e.g. "AC[[57.0215]]FHC[[57.0215]]ETC[[57.0215]]K|Z=3") and Sage (e.g. "AAAIGIDLGTTYSC[[+57.0215]]VGVFQHGK|Z=3
") result tables while in MaxQuant and AlphaPept, they are reported with names (e.g. "AAC[Carbamidomethyl]LPLPGYR|Z=2.0
"). The replacement dicts are already in the .toml.

@Cajac102 Cajac102 self-assigned this Nov 21, 2024
@mlocardpaulet
Copy link
Contributor

mlocardpaulet commented Nov 21, 2024

Yes, we decided not to homogenise the modifications at this stage.
The thing is: every software and version can have its own way of encoding modifications. And for ProteoBench it does not really matter since we do not compare between tool.
This means that if we want to compare outputs of different tools (or versions) (for example for the paper ;)), we need to homogenise the PTMs afterwards.

We can totally discuss changing this.

@Cajac102
Copy link
Contributor Author

Cajac102 commented Nov 21, 2024

I see, that makes total sense. If we only look at the total number of peptidoforms it doesn't matter at all.
From the code it looks like it is intended to homogenise, but the regex was off a bit. I fixed it for myself since I want to do the UpSetplots for the paper, if we choose to change it we can just merge it :)

@mlocardpaulet
Copy link
Contributor

It never hurts to improve the code ;)

@Cajac102 Cajac102 linked a pull request Nov 21, 2024 that will close this issue
@Cajac102
Copy link
Contributor Author

Two more differences I ran into:
-For AlphaPept, the charges are floats. I added parsing them to integers to make the proforma strings consistent.
-For MaxQuant: Right now, fixed modifications are not included in the proforma string. This might be a bit more effort to include, so I only hacked it together locally for the plot.

@Cajac102
Copy link
Contributor Author

As Robbin already predicted, simply replacing the mass shifts with the modification names does not always work due to differences in precision. I have a sage file now where Carbamidomethylation is "+57.021465" instead of just "+57.0215", leading to the replacing not working.

@Cajac102 Cajac102 reopened this Jan 21, 2025
@Cajac102
Copy link
Contributor Author

Fixed temporarily in commit 7ed16c4, but we might need something more sophisticated for the future

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants