FearNet is an image classification model capable of identifying images likely to trigger phobias. The architecture leverages 8 pre-trained computer vision models tuned to output an input vector to a MLP architecture. The input to the ensemble is any image (that is re-sized to 128x128x3). The MLP output is a 17-long tensor, w/ each output element describing the presence probability of each phobia in the input image.
Recall/ Sensitivity is used to gauge the performance of the model due to the significance of false negatives over false positives in the classification problem.
Hyper Params: 128x128img, LR = 0.001, Epoch = 20, Batch Size = 64, No BN (unless model inherent, i.e. VGG)
Baseline: 13.67%
3-DCNN Ensemble: 14.48%
Resnet152 (1 unfrozen fc layer): ~69%
Resnet152 (1 unfrozen fc layer + 1 ext. fc layer): ~75%
Resnet152 (1 unfrozen fc layer + 2 ext. fc layers): ~75%
VGG19_BN (1 unfrozen fc layer): ~73%
VGG19_BN (1 unfrozen fc layer + 1 ext. fc layer): ~73-74%
Densenet161 (1 unfrozen fc layer): 75%
Densenet161 (1 unfrozen fc layer + 1 ext. fc layer): ~75-76%
Densenet161 (1 unfrozen fc layer): ~79%
Densenet161 (1 unfrozen fc layer + 1 ext. fc layer): ~80-81%
Densenet161 (1 unfrozen fc layer + 2 ext. fc layer): ~80%
Resnext101 (1 unfrozen fc layer): ~78%
Resnext101 (1 unfrozen fc layer + 1 ext. fc layer): ~78-79%
Resnext101 (1 unfrozen fc layer + 2 ext. fc layer): ~78-79%
Wres101 (1 unfrozen fc layer): ~75%
Wres101 (1 unfrozen fc layer + 1 ext. fc layer): ~76%
Alexnet (1 unfrozen fc layer): ~70%
Googlenet (1 unfrozen fc layer): ~72%
Shufflenet (1 unfrozen fc layer): ~73%
Densenet161 (1 unfrozen fc layer + 1 ext. fc layer): ~80.5%
Resnext101 (1 unfrozen fc layer + 1 ext. fc layer): ~80-81%
Resnext101 (1 unfrozen fc layer + 2 ext. fc layer): ~80-81%
Wres101 (1 unfrozen fc layer + 1 ext. fc layer): ~77%
Alexnet (1 unfrozen fc layer + 1 ext. fc layer): ~73-74%
Googlenet (1 unfrozen fc layer + 1 ext. fc layer): ~73%
Shufflenet (1 unfrozen fc layer + 1 ext. fc layer): ~75% (was 77 up to 50 epochs)
8-Transfer-Learned-Model Ensemble w/ Averaged Output: ~86-87%
8-TL Ensemble w/ 1 layer MLP: ~84%
8-TL Ensemble w/ 2 layer MLP: ~86%