-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Move plugin repos into this main repo #325
Merged
Merged
Changes from 7 commits
Commits
Show all changes
15 commits
Select commit
Hold shift + click to select a range
a2ae189
move plugins into this repo
CarlinLiao 50f055f
move dockerfiles and add version files
CarlinLiao d90d6e7
add build and push recipes
CarlinLiao 92a5de9
add recipe to populate submodules
CarlinLiao 3f85f44
hotfix workflow plugin image naming
CarlinLiao 0cd0580
Merge branch 'main' into issue323
CarlinLiao c1fb36f
fix another typo
CarlinLiao 370370b
Make some changes to plugin development process documentation
jimmymathews 4f72922
Syntax
jimmymathews 8c14c4e
Expand names of plugins
jimmymathews 5d963d0
Attempt relative link
jimmymathews 430083c
clarify template after move
CarlinLiao 52779d2
hotfix graph-transformer build
CarlinLiao 1f4831d
hotfix plugin push recipe
CarlinLiao 6e28e9b
Merge branch 'main' into issue323
jimmymathews File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,3 @@ | ||
[submodule "plugin/graph_processing/graph-transformer/tmi2022"] | ||
path = plugin/graph_processing/graph-transformer/tmi2022 | ||
url = [email protected]:CarlinLiao/tmi2022.git |
Large diffs are not rendered by default.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,32 @@ | ||
FROM pytorch/pytorch:2.1.2-cuda11.8-cudnn8-runtime | ||
WORKDIR /app | ||
|
||
# Install apt packages you need here, and then clean up afterward | ||
RUN apt-get update | ||
RUN apt-get install -y \ | ||
libhdf5-serial-dev \ | ||
libatlas-base-dev \ | ||
libblas-dev \ | ||
liblapack-dev \ | ||
gfortran \ | ||
libpq-dev | ||
RUN rm -rf /var/lib/apt/lists/* | ||
|
||
# Install python packages you need here | ||
ENV PIP_NO_CACHE_DIR=1 | ||
RUN pip install h5py==3.10.0 | ||
RUN pip install numpy==1.24.3 | ||
RUN pip install scipy==1.10.1 | ||
RUN pip install dgl -f https://data.dgl.ai/wheels/cu118/repo.html | ||
RUN pip install dglgo -f https://data.dgl.ai/wheels-test/repo.html | ||
ENV DGLBACKEND=pytorch | ||
RUN pip install cg-gnn==0.3.2 | ||
|
||
# Make the files you need in this directory available everywhere in the container | ||
ADD . /app | ||
RUN chmod +x train.py | ||
RUN mv train.py /usr/local/bin/spt-plugin-train-on-graphs | ||
RUN chmod +x /app/print_graph_config.sh | ||
RUN mv /app/print_graph_config.sh /usr/local/bin/spt-plugin-print-graph-request-configuration | ||
RUN chmod +x /app/print_training_config.sh | ||
RUN mv /app/print_training_config.sh /usr/local/bin/spt-plugin-print-training-configuration |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,34 @@ | ||
# Use cuda.Dockerfile if you have a CUDA-enabled GPU | ||
FROM python:3.11-slim-buster | ||
WORKDIR /app | ||
|
||
# Install apt packages you need here, and then clean up afterward | ||
RUN apt-get update | ||
RUN apt-get install -y \ | ||
libhdf5-serial-dev \ | ||
libatlas-base-dev \ | ||
libblas-dev \ | ||
liblapack-dev \ | ||
gfortran \ | ||
libpq-dev | ||
RUN rm -rf /var/lib/apt/lists/* | ||
|
||
# Install python packages you need here | ||
ENV PIP_NO_CACHE_DIR=1 | ||
RUN pip install h5py==3.10.0 | ||
RUN pip install numpy==1.24.3 | ||
RUN pip install scipy==1.10.1 | ||
RUN pip install torch --index-url https://download.pytorch.org/whl/cpu | ||
RUN pip install dgl -f https://data.dgl.ai/wheels/repo.html | ||
RUN pip install dglgo -f https://data.dgl.ai/wheels-test/repo.html | ||
ENV DGLBACKEND=pytorch | ||
RUN pip install cg-gnn==0.3.2 | ||
|
||
# Make the files you need in this directory available everywhere in the container | ||
ADD . /app | ||
RUN chmod +x train.py | ||
RUN mv train.py /usr/local/bin/spt-plugin-train-on-graphs | ||
RUN chmod +x /app/print_graph_config.sh | ||
RUN mv /app/print_graph_config.sh /usr/local/bin/spt-plugin-print-graph-request-configuration | ||
RUN chmod +x /app/print_training_config.sh | ||
RUN mv /app/print_training_config.sh /usr/local/bin/spt-plugin-print-training-configuration |
27 changes: 27 additions & 0 deletions
27
build/plugins/graph_processing/graph-transformer.dockerfile
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,27 @@ | ||
FROM pytorch/pytorch:2.1.2-cuda11.8-cudnn8-runtime | ||
WORKDIR /app | ||
|
||
# Install apt packages you need here, and then clean up afterward | ||
RUN apt-get update | ||
RUN rm -rf /var/lib/apt/lists/* | ||
|
||
# Install python packages you need here | ||
ENV PIP_NO_CACHE_DIR=1 | ||
RUN pip install h5py==3.10.0 | ||
RUN pip install numpy==1.24.3 | ||
RUN pip install scipy==1.10.1 | ||
RUN pip install pandas | ||
RUN pip install pillow | ||
RUN pip install tensorboardX | ||
RUN pip install opencv-python | ||
RUN pip install einops | ||
RUN pip install torch-geometric | ||
|
||
# Make the files you need in this directory available everywhere in the container | ||
ADD . /app | ||
RUN chmod +x train.py | ||
RUN mv train.py /usr/local/bin/spt-plugin-train-on-graphs | ||
RUN chmod +x /app/print_graph_config.sh | ||
RUN mv /app/print_graph_config.sh /usr/local/bin/spt-plugin-print-graph-request-configuration | ||
RUN chmod +x /app/print_training_config.sh | ||
RUN mv /app/print_training_config.sh /usr/local/bin/spt-plugin-print-training-configuration |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,3 @@ | ||
# Plugins | ||
|
||
This directory contains various plugins and plugin archetypes for the SPT platform. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,22 @@ | ||
# Graph processing | ||
|
||
These plugins create and process cell graphs to make downstream predictions. | ||
|
||
## Development process | ||
|
||
Addition of new graph processing plugins to SPT is done in three steps: | ||
1. **Exploratory stage**: a user evaluates their processing model by manually using the output and upload functions described in the following section and examining their results locally or in an SPT instance after upload. | ||
2. **Proposal stage**: If a user determines that their graph model is a good candidate for inclusion into SPT as a default graph deep learning plugin, that could be run on every new study imported into SPT, they would containerize their model according to the Docker template defined in `template/`, upload it to a fork of this repository, and open an issue or pull request for SPT to include their plugin. | ||
3. **Inclusion stage**: The SPT maintainers would review the proposed plugin and, if it is accepted, upload the container to the SPT Docker page and modify SPT code to pull down and use that Docker image by default in SPT. | ||
|
||
## Graph processing plugins should look like this | ||
|
||
Graph processing plugins are to be made available as Docker images, built from a Dockerfile following the template provided in `Dockerfile`. | ||
|
||
Each plugin should have the following commands available from anywhere in the Docker image: | ||
* `spt-plugin-print-graph-request-configuration`, which prints to `stdout` the configuration file intended to be used by this plugin to fetch graphs from an SPT instance to use for model training. An empty configuration file and a shell script to do this is provided in this repo, as well as the command needed to make this available in the template `Dockerfile`. | ||
* `spt-plugin-train-on-graphs` trains the model and outputs a CSV of importance scores that can be read by `spt graphs upload-importances`. A template `train.py` is provided that uses a command line interface specified in `train_cli.py`. The template `Dockerfile` provides a command to make this script available anywhere in the Docker image. Its arguments are | ||
1. `--input_directory`, the path to the directory containing the graphs to train on. | ||
2. `--config_file`, the path to the configuration file. This should be optional, and if not provided `spt-plugin-train-on-graphs` should use reasonable defaults. | ||
3. `--output_directory`, the path to the directory in which to save the trained model, importance scores, and any other artifacts deemed important enough to save, like performance reports. | ||
* `spt-plugin-print-training-configuration`, which prints to `stdout` an example configuration file for running `spt-plugin-train-on-graphs`, populated either with example values or the reasonable defaults used by the command. An empty configuration file and a shell script to do this is provided in this repo, as well as the command needed to make this available in the template `Dockerfile`. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,3 @@ | ||
# spt-cg-gnn | ||
|
||
This builds the cg-gnn SPT plugin as a Docker image. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,7 @@ | ||
[graph-generation] | ||
validation_data_percent = 0 | ||
test_data_percent = 15 | ||
; use_channels = true | ||
; use_phenotypes = true | ||
; cells_per_roi_target = 5000 | ||
target_name = <name of your target> |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,2 @@ | ||
#!/bin/sh | ||
cat /app/graph.config |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,2 @@ | ||
#!/bin/sh | ||
cat /app/training.config |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,130 @@ | ||
#!/usr/bin/env python3 | ||
"""Convert SPT graph objects to CG-GNN graph objects and run training and evaluation with them.""" | ||
|
||
from sys import path | ||
from configparser import ConfigParser | ||
from os import remove | ||
from os.path import join, exists | ||
from configparser import ConfigParser | ||
from warnings import warn | ||
|
||
from numpy import nonzero # type: ignore | ||
from networkx import to_scipy_sparse_array # type: ignore | ||
from torch import ( | ||
FloatTensor, | ||
IntTensor, # type: ignore | ||
) | ||
from dgl import DGLGraph, graph | ||
from cggnn.util import GraphData, save_cell_graphs, load_cell_graphs | ||
from cggnn.util.constants import INDICES, CENTROIDS, FEATURES, IMPORTANCES | ||
from cggnn.run import train_and_evaluate | ||
|
||
path.append('/app') # noqa | ||
from train_cli import parse_arguments, DEFAULT_CONFIG_FILE | ||
from util import HSGraph, GraphData as SPTGraphData, load_hs_graphs, save_hs_graphs | ||
|
||
|
||
def _convert_spt_graph(g_spt: HSGraph) -> DGLGraph: | ||
"""Convert a SPT HSGraph to a CG-GNN cell graph.""" | ||
num_nodes = g_spt.node_features.shape[0] | ||
g_dgl = graph([]) | ||
g_dgl.add_nodes(num_nodes) | ||
g_dgl.ndata[INDICES] = IntTensor(g_spt.histological_structure_ids) | ||
g_dgl.ndata[CENTROIDS] = FloatTensor(g_spt.centroids) | ||
g_dgl.ndata[FEATURES] = FloatTensor(g_spt.node_features) | ||
# Note: channels and phenotypes are binary variables, but DGL only supports FloatTensors | ||
edge_list = nonzero(g_spt.adj.toarray()) | ||
g_dgl.add_edges(list(edge_list[0]), list(edge_list[1])) | ||
return g_dgl | ||
|
||
|
||
def _convert_spt_graph_data(g_spt: SPTGraphData) -> GraphData: | ||
"""Convert a SPT GraphData object to a CG-GNN/DGL GraphData object.""" | ||
return GraphData( | ||
graph=_convert_spt_graph(g_spt.graph), | ||
label=g_spt.label, | ||
name=g_spt.name, | ||
specimen=g_spt.specimen, | ||
set=g_spt.set, | ||
) | ||
|
||
|
||
def _convert_spt_graphs_data(graphs_data: list[SPTGraphData]) -> list[GraphData]: | ||
"""Convert a list of SPT HSGraphs to CG-GNN cell graphs.""" | ||
return [_convert_spt_graph_data(g_spt) for g_spt in graphs_data] | ||
|
||
|
||
def _convert_dgl_graph(g_dgl: DGLGraph) -> HSGraph: | ||
"""Convert a DGLGraph to a CG-GNN cell graph.""" | ||
return HSGraph( | ||
adj=to_scipy_sparse_array(g_dgl.to_networkx()), | ||
node_features=g_dgl.ndata[FEATURES].detach().cpu().numpy(), | ||
centroids=g_dgl.ndata[CENTROIDS].detach().cpu().numpy(), | ||
histological_structure_ids=g_dgl.ndata[INDICES].detach().cpu().numpy(), | ||
importances=g_dgl.ndata[IMPORTANCES].detach().cpu().numpy() if (IMPORTANCES in g_dgl.ndata) | ||
else None, | ||
) | ||
|
||
|
||
def _convert_dgl_graph_data(g_dgl: GraphData) -> SPTGraphData: | ||
return SPTGraphData( | ||
graph=_convert_dgl_graph(g_dgl.graph), | ||
label=g_dgl.label, | ||
name=g_dgl.name, | ||
specimen=g_dgl.specimen, | ||
set=g_dgl.set, | ||
) | ||
|
||
|
||
def _convert_dgl_graphs_data(graphs_data: list[GraphData]) -> list[SPTGraphData]: | ||
"""Convert a list of DGLGraphs to CG-GNN cell graphs.""" | ||
return [_convert_dgl_graph_data(g_dgl) for g_dgl in graphs_data] | ||
|
||
|
||
def _handle_random_seed_values(random_seed_value: str | None) -> int | None: | ||
if (random_seed_value is not None) and (str(random_seed_value).strip().lower() != "none"): | ||
return int(random_seed_value) | ||
return None | ||
|
||
|
||
if __name__ == '__main__': | ||
args = parse_arguments() | ||
config_file = ConfigParser() | ||
config_file.read(args.config_file) | ||
random_seed: int | None = None | ||
if 'general' in config_file: | ||
random_seed = _handle_random_seed_values(config_file['general'].get('random_seed', None)) | ||
if 'cg-gnn' not in config_file: | ||
warn('No cg-gnn section in config file. Using default values.') | ||
config_file.read(DEFAULT_CONFIG_FILE) | ||
config = config_file['cg-gnn'] | ||
|
||
in_ram: bool = config.getboolean('in_ram', True) | ||
batch_size: int = config.getint('batch_size', 32) | ||
epochs: int = config.getint('epochs', 10) | ||
learning_rate: float = config.getfloat('learning_rate', 1e-3) | ||
k_folds: int = config.getint('k_folds', 5) | ||
explainer: str = config.get('explainer', 'pp') | ||
merge_rois: bool = config.getboolean('merge_rois', True) | ||
if random_seed is None: | ||
random_seed = _handle_random_seed_values(config.get('random_seed', None)) | ||
|
||
spt_graphs, _ = load_hs_graphs(args.input_directory) | ||
save_cell_graphs(_convert_spt_graphs_data(spt_graphs), args.output_directory) | ||
|
||
model, graphs_data, hs_id_to_importances = train_and_evaluate(args.output_directory, | ||
in_ram, | ||
batch_size, | ||
epochs, | ||
learning_rate, | ||
k_folds, | ||
explainer, | ||
merge_rois, | ||
random_seed) | ||
|
||
save_hs_graphs(_convert_dgl_graphs_data(load_cell_graphs(args.output_directory)[0]), | ||
args.output_directory) | ||
for filename in ('graphs.bin', 'graph_info.pkl'): | ||
graphs_file = join(args.output_directory, filename) | ||
if exists(graphs_file): | ||
remove(graphs_file) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,28 @@ | ||
#!/usr/bin/env python3 | ||
"""Process arguments to training command.""" | ||
|
||
from argparse import ArgumentParser | ||
|
||
DEFAULT_CONFIG_FILE = 'training.config' | ||
|
||
|
||
def parse_arguments(): | ||
"""Parse arguments.""" | ||
arg_parser = ArgumentParser() | ||
arg_parser.add_argument( | ||
'--input_directory', | ||
type=str, | ||
help='Path to the directory containing the cell graphs to be used for training.', | ||
) | ||
arg_parser.add_argument( | ||
'--config_file', | ||
type=str, | ||
help='Path to config file.', | ||
default=DEFAULT_CONFIG_FILE, | ||
) | ||
arg_parser.add_argument( | ||
'--output_directory', | ||
type=str, | ||
help='Path to the directory containing the cell graphs to be used for training.', | ||
) | ||
return arg_parser.parse_args() |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,8 @@ | ||
[cg-gnn] | ||
; in_ram = true | ||
batch_size = 1 | ||
epochs = 5 | ||
learning_rate = 1e-3 | ||
k_folds = 0 | ||
; explainer_model = pp | ||
merge_rois = true |
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like the Dockerfile uses an ADD instruction to place
graph.config
into the Docker image of the built plugin, e.g. thecg-gnn
plugin. But this file is only a template (emptytarget_name
). How are the contents ofgraph.config
to be specified at runtime for different datasets using the same Docker image?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the in-built
graph.config
is not used, and a separategraph.config
is meant to be supplied at runtime, then there doesn't seem to be much use for this command:spt-plugin-print-graph-request-configuration
.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I forgot to update it when it says "this repo" to "the template" since this was originally the README for this template only. The
graph.config
file is not used within the Docker images, it's a (suggested default) setting file for the user to use with SPT when extracting the graph files to run the image(s) on.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, yes that is what I thought.
But then why is
spt-plugin-print-graph-request-configuration
used?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's so you can get a
graph.config
file to use. It prints to stdout and then you copy it into your local directory and runspt graph extract
with it.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But the file it emits is never directly useable, right? If the idea is to manually edit the emitted template just before doing some run, it seems simpler to use the source code version of
graph.config
directly from this repository. The main runtime usage of the Docker container, to do its calculation, doesn't use thespt-plugin-print-graph-request-configuration
command, if I understand correctly.I think a similar remark applies to
spt-plugin-print-training-configuration
. What is the rationale for incorporating some of these development-specific actions into the docker image? Is it to avoid the need to access git source code in addition to a Docker image pull? If this is the case, we can perhaps make these 2:spt-plugin-print-graph-request-configuration
spt-plugin-print-training-configuration
into commands provided by the template image only, and not indicated as "required" by the plugins themselves (since they are not required for the main action, training/processing).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I put them in at your request a while back. I believe the rationale was to provide a way to get the recommended default settings without having to muck about with the Docker internals / source code.
The training variables
cg-gnn
wants are different from the training variablesgraph-transformer
wants, and likewise the graph formats plugins want may be different from plugin to plugin (although in practice they're identical so far), so making it a template command would not work.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, yeah this seems like the best that can be done now.