You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
One hindrance to large-scale analyses across many runs is that PSMs from all of the runs must be concatenated and read into memory at the same time. This creates a memory and compute bottleneck unless subset_max_train is used. However, this parameter only currently alleviates the model-training compute bottleneck in mokapot.
Instead, it should be possible to run the full mokapot algorithm on each run individually, then aggregate the new scores for FDR estimation. Each run would be re-scored using its own cross-validated models---notably these are already calibrated to combine the cross-validated predictions for FDR estimation. We could then combine only the scores and spectrum identifiers in a separate FDR estimation step, massively reducing the required memory.
This is also nice, because the compute bottleneck could be trivially parallelized!
The text was updated successfully, but these errors were encountered:
One hindrance to large-scale analyses across many runs is that PSMs from all of the runs must be concatenated and read into memory at the same time. This creates a memory and compute bottleneck unless
subset_max_train
is used. However, this parameter only currently alleviates the model-training compute bottleneck in mokapot.Instead, it should be possible to run the full mokapot algorithm on each run individually, then aggregate the new scores for FDR estimation. Each run would be re-scored using its own cross-validated models---notably these are already calibrated to combine the cross-validated predictions for FDR estimation. We could then combine only the scores and spectrum identifiers in a separate FDR estimation step, massively reducing the required memory.
This is also nice, because the compute bottleneck could be trivially parallelized!
The text was updated successfully, but these errors were encountered: