An awesome list of proteomics tools and resources.
I've been analyzing large scale proteomics data sets for over 5 years now. I've recently stumbled upon these awesome lists and wanted to make an one for proteomics.
1A. Learning Resources - Proteomics
1B. Learning Resources - Programming
2. Databases
3. Raw data search software/algorithms
4. Assorted Pipeline Tools
5. Raw Data Analysis
6. Stastical Analysis
7. Protein Pathway Enrichment
8. Kinase Motif/Activity Analysis
9. Top down data analysis
10. Multi-Omics data analysis
Ben Orsburn has hands down the best protomeics blog I've ever seen. Ben is very knowledgable with a great sense of humor.
Phil Wilmarth has alot of goodlearning resources. He was python scripts for raw data analysis as well as blog posts detailing TMT data analysis techniques.
biostars is like stackoverflow but only for bioinformatics, has forums for questions, job postings and tutorials.
Review article about protoemics.
Tutorial videos from NCQBCS, a project led by the Coon lab. Contains lots of information regarding experimental design, ionization, quantitative proteomics, analysis, post-translational modifications and more.
ASMS video mass spec channel contains a lot of videos from leading researchers in the protoemics field.
Videos from Nikolai most of his videos focus on single cell proteomics, and DIA.
Videos from Matthew Padula lots of great videos on the basis of mass spectrometry and proteomics.
MayInstuite Computational proteomics short courses organized by Olga Vitek. They also have ALOT of videos on youtube.
R books large abundance of ebooks for learning R, from basic R, to advance R, Shiny and more!!
Are-we-learning-yet a resource to learn machine learning in rust. I've noticed rust is starting to become a more popular language, not only in the wild but also in proteomics (see Sage below).
conda/bioconda anaconda is a popular bioinformatics tool used mostly for python, and for some R, programming. Useful for creating reproducible enrivonments.
Intro Math if you're like me and it's been a few years since you've had to use math check out this repo to brush up on it and learn some R, Python and Julia.
python data science tips another resource to learn python/pandas with.
Mass Spec Coding Club great resource to learn python and then apply that knowledge to mass spectrometry.
ProteomeXchange is a global repository for raw MS data that contains links to all major databases, including MassIVE, Pride, iProX and more. Probably the best place to start.
Pastel BioScience has a database that contains staggering amounts information that I'm sure the rest of my awesome list will be redundant.
Uniprot - Has all the information you will ever need to know for individual proteins and the go to for protein FASTA databases.
Biogrid - protein-protein interaction database
KEGG - biological pathway database
Reactome - nicer looking biological pathway database
2021 - CPTAC - python/R - API interaface to publically available cancer datasets - paper
2021 - ppx - python - Python interface to proteomics data repositories - paper
2017 - Fragpipe - Java - It's a very fast search engine with a nice GUI. The software is modular, it consists of MSfragger the database search algorithm, Philosopher that analyzes the database results, as well as others for PTM and TMT integration.
2008 - MaxQuant is probably the most used and well known DDA software. Developed by Jurgon Cox, this completely free software is user friendly and is always being updated with new and original features. There is even a youtube that has tons of videos on how to use the software. - paper
2015 - Peptide-shaker is like the swiss army knife of search tools. You can search data with multiple search engines inclduing, comet, tide, andromeda, mascot, X!Tandem and more that I've never heard of. paper
2010 - skyline - software for targeted proteomics - paper
2012 - Comet - C++ - Free and open-source search engine, lately it's had several - paper
2020 - DIA-NN - C/C++ - free and open source search tool for DIA data that uses neural networks, works using either a library or a FASTA database. - paper
2023 - Sage - Rust - most likely the current fasest search engine, it's completely terminal based but if you learn to use it, it will be worth it - paper
2019 - MaxQuant Live (not sure where to put this) for real time monitoring of MS data and acquistion.
2009 - PAW_pipeline - python - a pretty much stock python raw file protoemics pipeline tool. It includes functions, to convert files, run comet, produce histograms. Can also do TMT - paper
2015 - Ursgal - python - combines multiple search engine algorithms, postprocessing algorithms, and stastis on the output from multiple search engines - paper1 paper2
2019 - DIAlignR - R - DIA retention time alignment of targetd MS data, including DIA and SWATH-MS - paper
2021 - Monocle - C# - for monoisotopic peak and accurate precursor m/z detection in shotgun proteomics experiments. - paper
2021 - RawBeans is a upgraded program of RawMeat. It's a raw data quaility control tool that help identify insturment issues relating to spray instability, problems with fragmentation or unequal loading. This program can be used on a stand alone PC or included in a pipeline. - paper
2021 - mokapot - python - Semisupervised Learning for Peptide Detection - paper
2021 - qcloud2 - cloud based quality control pipeline, can be integrated with nextflow and openMS - paper
2021 - DIAproteomics - python - a module that can be added to a openMS workflow for the analysis of DIA data - paper
2012/2020 - MSnbase - R - provides MS data structures, allows you to process, quantify, visualize raw data - paper
2015 - MaRaCluster - C++ - clustering technique to identify fragment spectra stemming from the same peptide species - paper
2015 - pyproteome - python - analyzes proteomics data, can filter, normalize, perform motif and pathway enrichment. Currently only supports ProteomeDiscoverer .msf search files - paper
2018 - pyteomics - python - proteomics framework tools - paper
2018 - RawTools - C# - quality control checking of raw files, can assist in method development and insturment quality control - paper
2018 - MSstatsQC - R - provides methods for multiple peptide monitoring using raw MS files, works for DDA and DIA data - paper
2018 - rawDiag - R - Package that can be used in conjustion with rawrr - paper
2020 - COSS - java - user-friendly spectral library search tool - paper
2021 - rawrr - R - A great package that can read in raw thermo files! Thats great to me, because I always find it tedious to convert a raw file into a mzML or mzXML file - paper
2021 - PSpecteR - R - User Friendly and Interactive for Visualizing the quality of Top-Down and Bottom-Up Proteomics - paper
2022 - RforMassSpectrometry - R - a massive project that contains multiple helpful packages including RforMassSpectrometry, MsExperiment, Spectra, QFeatures, PSMatch, Chromatograms, MsCoreUtils, and MetaboCoreUtils.
2023 - mpwR - R - package that allows you to directly compare the output of raw search engines such as MQ, DIANN, spectronaut and I think PD. It's also helpful if you're testing out different settings within your search engine and you want to quickly see how each performs. - paper
2014 - MSstats - R - DDA/shotgun, bottom-up, SRM, DIA - paper
2018 - PaDuA - python - proteomics and phosphoproteomics data analysis - paper
2020 - MSstatsTMT - R - TMT shotgun proteomics - paper
2020 - proteiNorm - R - TMT and unlabeled, has multiple options for normalization and statistical analysis - paper
2020 - DEqMS - R - Developed ontop of limma, but takes into account variability in PSMs. Works on both labelled and unlabelled samples - paper
2021 - MSstatsPTM - labeled and unlabeled PTM data analysis - paper
PermFDP - R - Package to perform multiple hypothesis correction using permutation based FDP. One of the better performing methods for multiple test corrections. - paper the paper isn't on the tool, it's just a paper that uses it and compares it to other methods.
2019 - fgsea - R - fast gene set enrichment analysis - paper
2019 - pathfindR - R - active subnetwork oriented pathway enrichment analyses that uses protein-protein ineteraction networks to enchance the standard pathway analysis method - paper
2020 - lipidR - R - lipidomics data analysis - paper
2021 - phosphoRWHN - R - pathway enrichment for phosphoproteomics data - paper
2021 - leapR - R - package for multiple pathway analysis - paper
2017 - KSEAapp - R - Kinase substrate enrichment analysis. I would recommend using with a freshly downloaded kinase-substarte database from phosphositeplus - paper
2015 - rnotifx - R - motif enrichment analyssis of PTMs on proteins, probably mostly used for phosphorylation - paper
2021 - ClipsMS - python - analysis of terminal and internal fragments in top-down mass spectrometry data - paper
2015 - moCluster - R - Integration of multiple omics datasets to identify patterns - paper
2019 - MOGSA - R - Multiple omics data integrative clustering and gene set analysis - paper
cytoscape - visualizing protein-protein interaction netweorks
2012 - ProteoWizard - Great software for converting one MS file type to another. I mostly use it ot convert thermo .raw files to mzML - paper
2019 - IPSC - Interactive Peptide Spectrum Annotator, web based utility for shotgun mass spectrum annotation - paper
2020 - PeCorA - R - peptide correlation analysis - paper
2021 - ProteaseGuru - C# - tool for In Silico Database Digestion, optimize bottom up experiments - paper
2021 - DeepLC - python - predicts retention times for peptides that have unseen modifications - paper