Deep Learning approach on PyTorch to whether you can you judge a book by it's title and cover.
The common idiom states that one can't judge a book by it's cover. This project seeks to try and do so using Deep Learning Techniques on PyTorch. Hence, the problem statement is formulated as "Given an image or the title of a book, is it possible to classify the book into the correct genre?"
The learning goals in this project was:
- Implement an end-to-end framework for image classification from scratch using a custom dataset
- Try and learn how to implement the pre-trained BERT classifier, and learn implementation techniques in NLP.
For this project, I found the Uchida Book Dataset, which has data of about 57,000 books taken from Amazon. This dataset contains book cover images, title, author, and subcategories for each respective book. Each of the books are classified into 30 classes.
An important aspect of the dataset to be noted is that a book on Amazon can have multiple genres associated with this. However, when creating the dataset, the authors randomly chose one class out of the many classes that a book may be associated with. Hence, it would be prudent to also use Top 3 and Top 5 percent accuracy when comparing the results of the network because in reality a book may be in classes other than it's assigned classes.
This project is divided into two parts:
- Classification based on title using BERT
- Classification based on cover image using ResNet
For the second task, I used a subset of the dataset due to computational constraints. In specific, I used only books from 10 classes to make predictions. Since the CSV file with the title was considerably smaller in size, we use all 30 classes for the title prediction model.
For this portion of the project, I used a pre-trained networks and modified the output layer such that it outputs only 10 classes. I referred to this tutorial on the PyTorch website to load a custom dataset, and implement the project. The network was trained for 25 epochs with the Adam optimizer. A comparison of the results are shown below.
The results are shown in the table below:
Model | Top 1 Accuracy % | Top 3 Accuracy % | Top 5 Accuracy % |
---|---|---|---|
ResNet50 | 41.424 | 71.614 | 87.1875 |
ResNet101 | |||
VGG-19 | |||
DenseNet121 |
For this, we use the entire dataset, i.e. we classify amongst all the 30 classes in the dataset. I used this helpful tutorial for spam classification and accordingly modified it for the problem at hand. From the results below, we see that the BERT base model gains considerably good accuracy after training for 50 epochs with the AdamW optimizer.
Model | Top 1 Accuracy % | Top 3 Accuracy % | Top 5 Accuracy % |
---|---|---|---|
BERT | 44.982 | 68.193 | 78.491 |
Try and create a combined method that takes uses both the title and the cover image and then uses both information to make a combined prediction of the image.