-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
v0.11.0 RC1 #132
base: main
Are you sure you want to change the base?
v0.11.0 RC1 #132
Conversation
* ✨ cherry picks internal fixes from !68 and !70 * Cherry pick feature/confidence_streaming branch * ✨ adds filelock dependency for tests * 💄 linting * 💄 reformat to satisfy linter k * ✨ imports type annotations from future for python 3.9 * ✨ make pytest and cli behave with type annotations in Python 3.9 * ✨ test dropping Python 3.9 support - inspired by https://github.com/wfondrie/mokapot/pull/126/files#diff-1db27d93186e46d3b441ece35801b244db8ee144ff1405ca27a163bfe878957fL20 * Set scale_to_one to false in *all* cases * Fixed path problems probably causing errors under windows * Fix more possible path issues * Fix warning about bitwise not in python 3.12 * Fix problem with numpy 2.x's different str rep of floats * Make hashing of rows for splitting independent of numpy version and spectra columns * Feature/streaming fix windows (wfondrie#48) * ✨ log more infos * ✨ uses uv for env setup; fix dependencies --------- Co-authored-by: Elmar Zander <[email protected]>
Fixed retention time division by 60. Time is required in minutes for FlashLFQ, it's already in minutues Co-authored-by: William Fondrie <[email protected]>
) | ||
|
||
CSV_SUFFIXES = [".csv", ".pin", ".tab", ".csv"] | ||
CSV_SUFFIXES = [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the record... I still dislike naming so many tab-delimited file formats as "comma separated values (csv)"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I absolute agree. I just don't see a better way, as those other extensions are already out their in the wild.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't recall any tool that generates a tab delimited .csv of the top of my head. Do you happen to have an example? (I wont deal with it in this PR but in the future we could split csv-tsv formats internally)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, you're right. I somehow misread your initial comment. Yes, since we really never have "comma-separated" values anywhere, why not get completely rid of it and replace "comma separated/CSV" with "tab separated/TSV" everywhere.
For the record: when I started on this code base, it was something with "comma separated" everywhere, but a separator variable sep
was passed around, which was always set to "\t". I got rid of all the explicit file reading/writing stuff and moved that into the readers/writers, set the separator (I think) unconditionally to "\t", but did not rename the variables/classes. So: my bad ;)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be clear, I think adding support for .csv would be a good idea in the future (comma separated file)
Edit: a7401c3 does some progress, figured out the confidence but still need to "pipe" some columns needed by flashlfq, since I might need help with this one to understand how to update the documentation. Right now if I try to do this (part of tests/unit_tests/test_writer_flashlfq.py):
so .... where are these columns specified? how can one assign confidence without proteins? |
Note: There seems to be a difference on what 'OnDiskDataset' and 'LinearPsmDataset' mean by spectra: on disk psm is all of these:
and the linear psm defines it as
which would seem more like the OnDisk ... of |
Chore/fix confidence api
…ion to remove numba
feat, wip: compound key on spectrum
Changes in the output:note:
Original column names in the pin file
Several columns changed from release -> main and back again from
Currently (this pr vs last release)
Currently:
Current:
IMO: It feels inconsistent to have generated columns that have spaces and ones that dont, All line counts are the same! wooo!
|
What does this pr do:
Addresses:
mokapot.picked_protein.strip_peptides
#130Notes:
Most of the lines added are just this file: data/phospho_rep1.traditional.pin
Which is a backport of the original testing file (
phospho_rep1.pin
which was ""fixed"" in another PR by removing the ragged aspect of the protein columnPROT1\tPROT2
->PROT1:PROT2
)Blockers:
Unhandled things: