orgaSeg: From image annotations to instance classifications

The purpose of this project is to segment individual organoids from microscopy data and classify them into morphological groups.
Due to the nature of the data, the project is divided into two parts: segmentation and classification.
To obtain ground-truth data to train the supervised segmentation and classification models, an annotation using QuPath is expected.
Alternatively, data folders with individual images, their crops containing one object at a time with a file name containing (class_name) and the masks for the whole images can be provided as well.
It is worth to mention, that the segmentation step uses a model from an album catalog, and therefore can be replaced with any other model (e.g. nnUNet, CSBDeep, DeepCell, etc.). We just assumed star-convex polygons/blob-like structures as a start and provide a python script for StarDist2D. If you want to change it, install another album solution from the catalog and run it via CLI or album-gui.

Training Pipeline

After cloning the repository, make sure to have pytorch2 with GPU support installed. (We will provide the requirements for conda as YAML soon.)
Run each of the steps from the directory of the corresponding script.

Export data

Open project with annotations in QuPath
Use src/export_geojsons_and_rois.groovy file with instructions on top to export cells & geojsons
use src/sort_qupath_object_output.py to sort the exported data into the correct directory structure for training a classification model

Preprocess data

Use src/convert_label.py to convert geojson files to tiffs
Use src/preprocess_dataset.py to preprocess images and masks
Use src/split.py to split segmentation data in train and test data, and create correct dataset structure for segmentation models
Use src/split.py to split classification data in train and test data, and create correct dataset structure for classification models

Train segmentation

Add image-challenges-catalog to your album.solutions installation.
StarDist expects the subfolders for train/test to be named "images" and "masks" (not "labels")
Use src/segmentation_album.py to execute StarDist2D with our recommended settings. Alternatively, use the album-gui or CLI to run any other model provided in the catalog

Train classifier

Use src/classification/nn_pytorch.py to train a classifier and store the model in the models/ directory.
This step can perform a grid search on a few hyperparameters and model architectures and will return the best performing model whjile displaying the results in TensorBoard.

Inference Pipeline

Segmentation

Use your segmentation model to predict segmentation masks for a whole directory with new data (album solution: stardist_predict)

Extract Objects

Use src/segmentation_to_instance_classification.py to extract the objects from the segmentation masks and store them in a new directory

Classification

Use your trained classifier to predict the class of the objects (src/classification/nn_pytorch.py) and obtain a csv file with the predictions for each object

Directory structure

`data/` directory

data_sets: contains the preprocessed and model-specific reordered data sets used for training and testing
- ...: one subdirectory for each model
  - test: contains the test data set
    - img: contains the test images
      - as tiff files
    - mask: contains the test labels
      - as tiff files
  - train: contains the train data set
    - img: contains the train images
      - as tiff files
    - mask: contains the train labels
      - as tiff files
preprocessed: contains the preprocessed data sets
- anno_to_mask: contains the masks generated from the annotations (using the src/convert_label.py script)
  - as tiff files
- images: contains the preprocessed images (using the src/preprocess_dataset.py script)
  - as tiff files
- labels: contains the labels preprocessed from anno_to_mask (using the src/preprocess_dataset.py script)
  - as tiff files
raw_data: contains the raw data sets (unchanged)
- annotations_json: contains the annotations in geojson format
  - as geojson files
- raw_images: contains the raw images
  - as jpg & tiff files

`src/` directory

convert_label.py: converts the annotations from geojson to tiff format
preprocess_dataset.py: preprocesses the data set (raw images and tiff labels) to the preprocessed data set

Outlook

We planned to convert this into an easy-to-use pipeline and deploy this as an 'album catalog', so each individual step gets a GUI and the directory structure is not relying on the working directories of each script to be executed. This shall increase replicability and usability.

Acknowledgements

We adjusted and tested the pipeline on a very specific dataset of microscopy images of organoids with QuPath-annotations from @tum [Technische Universität München: Friederike Ebner, Krzysztof Flisikowski, Qixia Chan, Theresa Pauli and Wei Liang (Corresponding Person)]. Thank you for providing this dataset! Therefore, the preprocessing steps are adjusted to these images (4032x3040 px).
This project sprouted from the seminar "Deep Learning for biomedical applications" of Prof. Dr. rer. nat. Vitaly Belik @Freie Universität Berlin.

Name		Name	Last commit message	Last commit date
Latest commit History 97 Commits
.vscode		.vscode
data/raw_data		data/raw_data
models		models
results		results
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
inference.png		inference.png
requirements.txt		requirements.txt
training.png		training.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

orgaSeg: From image annotations to instance classifications

Training Pipeline

Export data

Preprocess data

Train segmentation

Train classifier

Inference Pipeline

Segmentation

Extract Objects

Classification

Directory structure

`data/` directory

`src/` directory

Outlook

Acknowledgements

About

Releases

Contributors 3

Languages

License

Scaramir/orgaSeg

Folders and files

Latest commit

History

Repository files navigation

orgaSeg: From image annotations to instance classifications

Training Pipeline

Export data

Preprocess data

Train segmentation

Train classifier

Inference Pipeline

Segmentation

Extract Objects

Classification

Directory structure

data/ directory

src/ directory

Outlook

Acknowledgements

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Contributors 3

Languages

`data/` directory

`src/` directory