Classification algorithms evaluated against published clustering methods for automated flow cytometry data analysis
- Evaluation data used in the study is unavailable due to patient confidentiality.
- To use your own input data, a pickled pandas Dataframe is required in the format where each row is an FC event, each column is a marker and additional columns
label
andsample_name
indicate the event label and the name of the sample the event is from, respectively.
benchmarking.py
begins the pipeline to run clustering methods on the input data and match outputs to labelled populations using the Hungarian assignment algorithm, then evaluate performance.- FLOCK requires installation from here.
- flowgrid requires installation from here.
helper_match_evaluate_multiple.R
was made available by Weber and Robinson, 2016.
classification.py
will run Decision Tree, Random Forest and XGBoost classifiers to train and evaluate on input data.
figures.py
allows recreation of the figures in the publication, except for; Fig 6, Fig 7 and Table 2 which would require the publication of confidential patient data.figs
is the folder where the figures will be saved to.
- Data from the study which is required by
figure.py
to recreate publication figures. confusion_matrix
contains the confusion matrices generated from each trained classification model.f1.pickle
contains the results of the evaluation of the classification methods.output
contains the results and confusion matrices generated from each benchmarked clustering method.