LongCOVID

A repository to share code for long COVID predictions based on a random forest classifier as well as a univariate association analysis of proteomic features to longCOVID labels.

Requirements

python 3.7.4 scikit-learn 1.1.3 pandas 1.5.2 scipy 1.9.3 numpy 1.23.5 shap 0.41.0 statsmodels 0.13.5 openpyxl 3.0.10

Required input data

The input data required to execute these scripts can be obtained from . Please include these in a folder Data. This should comprise:

Proteomics_Clinical_Data_220902_Acute_plus_healthy_v5.xlsx
Proteomics_Clinical_Data_220902_6M_timepoint_v4.xlsx
Proteomics_Clinical_Data_220902_Labels_v2.xlsx
Table S2 Biological protein cluster compositions.xlsx

Execution

We provide the data splits used in partitions. Relevant label dictionaries need to be generated based on the label data file listed above. Run the file prediction_RF.py to generate model predictions, association analysis either for individual proteomic features, or clusters thereof can be obtained using associationAnalysis.py, and associationClusters.py respectively. In combineInterpreations.py we combine the SHAP analysis results of multiple cross validation folds.

Acknowledgements

This project has received funding from the European Union's Horizon 2020 research and innovation programme under the Marie Sklodowska-Curie grant agreement No 813533 (K.B.).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

LongCOVID

Requirements

Required input data

Execution

Acknowledgements

Files

README.md

Latest commit

History

README.md

File metadata and controls

LongCOVID

Requirements

Required input data

Execution

Acknowledgements