We provide Jupyter notebooks that contain performance visualization via GradCAM images (for checkpoint models), superclass performance, model cascades, and oracle upper bound. We need predictions, ground truths, and softmax probabilities for the following notebooks, therefore run pytorch_inference.py
with saving flags as shown below.
cd ../inference/
python pytorch_inference.py --path <final_weight.pt> --dataset <V2/A/Sketch/R/V1> \
[--tta] [--mrl] [--efficient] [--rep_size <dim>] [--old_ckpt] \
--save_softmax --save_gt --save_predictions
Only adaptive classification using cascades, uses the --tta
flag.
This notebook visualizes model attribution for each image. As required preprocessing, we store each image as a torch tensor, arranged class-wise. This notebook illustrates that using smaller representation size for classification can result in model confusion between classes within the same superclass (e.g. Rock Python vs Boa Constrictor as in Figure 9.b below).
This notebook evaluates our greedy scheme for model cascading, based on maximum probability thresholding.
Based on WordNet hierarchy, we evaluate the MRL model on 30 randomly chosen superclasses. The code is based on the MadryLab robustness package.
While overall accuracy increases with a gradual increase in capacity, we observe that certain instances and classes are more accurate when evaluated with lower dimensions. Therefore, for each input image, there is an ideal rep. size which leads to the maximum possible achievable accuracy for MRL (or the oracle performance), which we compute in this notebook.