The purpose of this project is to segment individual organoids from microscopy data and classify them into morphological groups.
Due to the nature of the data, the project is divided into two parts: segmentation and classification.
To obtain ground-truth data to train the supervised segmentation and classification models, an annotation using QuPath is expected.
Alternatively, data folders with individual images, their crops containing one object at a time with a file name containing (class_name)
and the masks for the whole images can be provided as well.
It is worth to mention, that the segmentation step uses a model from an album catalog, and therefore can be replaced with any other model (e.g. nnUNet, CSBDeep, DeepCell, etc.). We just assumed star-convex polygons/blob-like structures as a start and provide a python script for StarDist2D. If you want to change it, install another album solution from the catalog and run it via CLI or album-gui.
After cloning the repository, make sure to have pytorch2 with GPU support installed. (We will provide the requirements for conda as YAML soon.)
Run each of the steps from the directory of the corresponding script.
- Open project with annotations in QuPath
- Use
src/export_geojsons_and_rois.groovy
file with instructions on top to export cells & geojsons - use
src/sort_qupath_object_output.py
to sort the exported data into the correct directory structure for training a classification model
- Use
src/convert_label.py
to convert geojson files to tiffs - Use
src/preprocess_dataset.py
to preprocess images and masks - Use
src/split.py
to split segmentation data in train and test data, and create correct dataset structure for segmentation models - Use
src/split.py
to split classification data in train and test data, and create correct dataset structure for classification models
- Add image-challenges-catalog to your album.solutions installation.
- StarDist expects the subfolders for train/test to be named "images" and "masks" (not "labels")
- Use
src/segmentation_album.py
to execute StarDist2D with our recommended settings. Alternatively, use the album-gui or CLI to run any other model provided in the catalog
- Use
src/classification/nn_pytorch.py
to train a classifier and store the model in themodels/
directory. - This step can perform a grid search on a few hyperparameters and model architectures and will return the best performing model whjile displaying the results in TensorBoard.
- Use your segmentation model to predict segmentation masks for a whole directory with new data (album solution: stardist_predict)
- Use
src/segmentation_to_instance_classification.py
to extract the objects from the segmentation masks and store them in a new directory
- Use your trained classifier to predict the class of the objects (
src/classification/nn_pytorch.py
) and obtain a csv file with the predictions for each object
- data_sets: contains the preprocessed and model-specific reordered data sets used for training and testing
- ...: one subdirectory for each model
- test: contains the test data set
- img: contains the test images
- as tiff files
- mask: contains the test labels
- as tiff files
- img: contains the test images
- train: contains the train data set
- img: contains the train images
- as tiff files
- mask: contains the train labels
- as tiff files
- img: contains the train images
- test: contains the test data set
- ...: one subdirectory for each model
- preprocessed: contains the preprocessed data sets
- anno_to_mask: contains the masks generated from the annotations (using the src/convert_label.py script)
- as tiff files
- images: contains the preprocessed images (using the src/preprocess_dataset.py script)
- as tiff files
- labels: contains the labels preprocessed from anno_to_mask (using the src/preprocess_dataset.py script)
- as tiff files
- anno_to_mask: contains the masks generated from the annotations (using the src/convert_label.py script)
- raw_data: contains the raw data sets (unchanged)
- annotations_json: contains the annotations in geojson format
- as geojson files
- raw_images: contains the raw images
- as jpg & tiff files
- annotations_json: contains the annotations in geojson format
- convert_label.py: converts the annotations from geojson to tiff format
- preprocess_dataset.py: preprocesses the data set (raw images and tiff labels) to the preprocessed data set
We planned to convert this into an easy-to-use pipeline and deploy this as an 'album catalog', so each individual step gets a GUI and the directory structure is not relying on the working directories of each script to be executed. This shall increase replicability and usability.
We adjusted and tested the pipeline on a very specific dataset of microscopy images of organoids with QuPath-annotations from @tum [Technische Universität München: Friederike Ebner, Krzysztof Flisikowski, Qixia Chan, Theresa Pauli and Wei Liang (Corresponding Person)]. Thank you for providing this dataset! Therefore, the preprocessing steps are adjusted to these images (4032x3040 px).
This project sprouted from the seminar "Deep Learning for biomedical applications" of Prof. Dr. rer. nat. Vitaly Belik @Freie Universität Berlin.