Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Combine mokapot scores from multiple files #97

Open
wfondrie opened this issue Apr 6, 2023 · 0 comments
Open

Combine mokapot scores from multiple files #97

wfondrie opened this issue Apr 6, 2023 · 0 comments
Assignees
Labels
enhancement New feature or request

Comments

@wfondrie
Copy link
Owner

wfondrie commented Apr 6, 2023

One hindrance to large-scale analyses across many runs is that PSMs from all of the runs must be concatenated and read into memory at the same time. This creates a memory and compute bottleneck unless subset_max_train is used. However, this parameter only currently alleviates the model-training compute bottleneck in mokapot.

Instead, it should be possible to run the full mokapot algorithm on each run individually, then aggregate the new scores for FDR estimation. Each run would be re-scored using its own cross-validated models---notably these are already calibrated to combine the cross-validated predictions for FDR estimation. We could then combine only the scores and spectrum identifiers in a separate FDR estimation step, massively reducing the required memory.

This is also nice, because the compute bottleneck could be trivially parallelized!

@wfondrie wfondrie added the enhancement New feature or request label Apr 6, 2023
@wfondrie wfondrie self-assigned this Apr 6, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant