Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add header description in the fasta file used for the mixed species modules. #463

Open
mlocardpaulet opened this issue Nov 29, 2024 · 1 comment

Comments

@mlocardpaulet
Copy link
Contributor

Some software tools need the header description to properly parse fasta header with default parameters. We do not have this in the current fasta proposed for the DDA and DIA modules.

So we decided to generate a fasta with these descriptions and the same sequences as the ones in the fasta that we currently use.

I had a look at it and: problem!

Whatever I do, we won't have description for all the accessions because:

  1. some accessions are "weird" but necessary: the spiked in biognosys iRTs have no descriptions; and some sequences in the contaminants are tags or else (hence, no header description).
  2. some accessions are deleted or not annotated anymore in Uniprot.

So, whatever I do there will be accessions without descriptions.

Will this be an issue for some software tools? If yes, how do I proceed? What do I do with the sequences that are not "real" proteins?

@wolski: do you have an opinion on how to proceed?

@wolski
Copy link
Contributor

wolski commented Jan 30, 2025

Hi @mlocardpaulet

I used the UniProt service to retrieve the description lines for all the UniProt ids, for all the entries in
ProteoBenchFASTA_DDAQuantification.zip

I am attaching the jupyter notebook.
AddDescriptions.ipynb.zip

I created a new fasta with the description lines. I uploaded the file to the proteobench cloud.
It is called:
ProteoBenchFASTA_DDAQuantification_Descriptions.fasta
and you will find it in the Module_2_DDA... folder.

Compared with the original fasts, the only change is that all entries except about 40 now have a description.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants