This repo contains the code for "Model Agnostic Interpretability for Multiple Instance Learning".
Paper: https://arxiv.org/abs/2201.11701
Executable scripts can be found in the scripts
directory.
Source code can be found in the src
directory.
One copy of each trained model can be found in models
.
Outputs from experiments can be found in out
.
Results can be found in results
.
We use five custom data set implementations:
mnist_bags.py
, crc_dataset.py
, sival_dataset
, musk_dataset
and tef_dataset
;
all inherit from mil_dataset.py
.
Rather than returning a single instance, they return a bag of instances and a single label.
Sources:
- SIVAL: http://pages.cs.wisc.edu/~bsettles/data/
- MNIST: https://pytorch.org/vision/stable/datasets.html#mnist
- CRC: https://warwick.ac.uk/fac/cross_fac/tia/data/crchistolabelednucleihe/
- Musk: https://archive.ics.uci.edu/ml/datasets/Musk+%28Version+2%29
- Tiger, Elephant and Fox: http://www.cs.columbia.edu/~andrews/mil/datasets.html
The models are implemented in src/model
.
We provide trained versions of these models in the models directory.
The training scripts are in scripts/train
.
These can be used to train single or multiple models.
They were tuned using the scripts in scripts/tune
.
The interpretability functionality can be found in the src/interpretability
directory.
The methods are implemented in interpretability/instance_attribution
.
Our experiment scripts can be found in scripts/experiments
.
These produce the sample size figures found in the paper.
The output scripts can be found in scripts/out
.
These produce the interpretability outputs found in the paper.
The milli_weights_plot
file produces the plots for the MILLI curve and integral.
All paths are relative to the root of the repo, so scripts must be executed from this location.
Required libraries can be found in requirements.txt
.