PANDA-Train-Track as part of the PANDA machine learning for tracking project is a heavily modified version of the train-track repository. It implements a way to execute different stages of the machine learning pipeline via the command line using YAML files for configuration.
Installation should be done via one of the conda environment files in the stttrkx/envs
directory. If you want to install an editable stand-alone version execute
git clone https://github.com/n-idw/panda-train-track.git
to download the repository and then
pip install -e panda-train-track
to install PANDA-Train-Track using the pip package installer.
The aim of TrainTrack is simple: Given any set of self-contained PyTorch Lightning modules, run them in a serial and trackable way.
At its heart, TrainTrack is nothing more than a loop over the stages defined in a YAML configuration file. A template for a YAML file containing the configuration for the pipeline and different stages can be found in stttrkx/configs/pipeline_example.yaml
. The model configuration is also done using YAML files in the /configs
folder of every PyTorch Lightning module. Example configureation files should be present in every of these folders, e.g., stttrkx/LightningModules/Processing/configs/processing_example.yaml
for the processing stage.
To launch traintrack and see all the implemented options run:
traintrack -h
The simplest way to run a pipeline would be:
traintrack path/to/your/pipeline_config.yaml
traintrack assumes a certain directory & code structure when configuring different . stages. If a stage is configured in the YAML file as follows:
model_library : /path/to/pyTorchLightingModules
stages:
- {
set : stageDir,
name : className,
config : modelConfig.yaml
}
traintrack
assumes that the directory structure is the following:
📂 /path/to/pyTorchLightingModules/
├── 📂 stageDir/
│ ├── 📂 configs/
│ │ ├── 📜 modelConfig.yaml
│ │ └── ...
│ ├── 📂 Models/
│ │ ├── 📜 modelFile1.py
│ │ ├── 📜 modelFile2.py
│ │ ├── 📜 modelFile3.py
│ │ └── ...
└──...
And that one of the modelFiles.py
contains a class with the name className
. Furthermore it is assumed that the class either has a function prepare_data()
for processing data, or training_step()
for training and inference.