[PR] Adding benchmarks between models #31

LuchoTurtle · 2023-12-12T02:21:17Z

closes #12

This PR will create a benchmark between some multimodal models that are available on Bumblebee that allow image captioning.

I'm going to make a performance text with COCO dataset (perhaps the most famous open-source labelled set) to evaluate the performance of each model in Elixir.

Although the performance benchmark tests will be made in Elixir, the results will be exported to a file and then processed with Python, because it has more support for libraries to perform NLP metric evaluation (R was also considered, but Python is more beginner-friendly for anyone that's curious with this repo).

To measure the model performance, I'll try to get scores on different metrics: BLEU, CIDER, METEOR, SPICE and ROUGE-L. Although not all of these will be measured (BLEU and ROUGE are probably the most relevant), it's interesting to mention them as alternative routes.

codecov · 2023-12-12T02:23:54Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (9d38d67) 100.00% compared to head (8ecf891) 100.00%.
Report is 36 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff            @@
##              main       #31   +/-   ##
=========================================
  Coverage   100.00%   100.00%           
=========================================
  Files            2         3    +1     
  Lines           54        76   +22     
=========================================
+ Hits            54        76   +22

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

…, alongside README.

…nsor for each image.

LuchoTurtle · 2023-12-21T00:18:31Z

I think this PR is getting big already and it provides a good foundation to add more benchmarks in the future - it's already fairly streamlined.

The process of seeing the supported models in Bumblebee is a bit cumbersome. I'm using https://jonatanklosko-bumblebee-tools.hf.space/apps/repository-inspector in conjunction and, for image captioning (not to be confused with image classification) is fairly limited. It seems that on the top 10 most downloaded models in Hugging Face, only BLIP-base and BLIP-large are supported.

I've added benchmarks for ResNet-50 because, even though it falls under image classification, it sometimes yields proper captions. However, the metric scores are bad (when compared with the others, which is expected).

I'm submitting for review now 👌

…benchmark folder.

nelsonic

Great additions @LuchoTurtle 👌

LuchoTurtle added 18 commits November 29, 2023 17:22

chore: Add clarifying information.

8d909ba

feat: Setting up SQLite3.

1e64baf

chore: Switch to Postgres.

2c85c78

chore: Adding test suports.

e71c980

chore: Change to description.

07d42b3

fix: Fixing information to 5MB.

f15610f

chore: Only show examples if user hasn't uploaded anythin.

6977a30

feat: Adding way to upload to S3.

76f46ea

fix: Fixing tests.

ed42262

fix: Don't allow person to upload another image while it's predicting.

cbc883e

feat: Uploading image to S3 and adding image info to db.

33013e5

fix: Fixing tests working.

142a4e6

fix: Fixing workflow file to tests are working with database.

06dc05a

feat: Adding section to README.

bd1f7ad

feat: Showing notification if there is an error uploading the image.

7a4eb62

feat: Adding section on feedback with toast component.

511887f

chore: Adding more tests.

76a458b

feat: Adding baseline in Elixir to run script to benchmark models.

7dc22fd

LuchoTurtle self-assigned this Dec 12, 2023

LuchoTurtle added 6 commits December 12, 2023 17:24

feat: Add COCO dataset of images.

7f57f17

feat: Adding Jupyter notebook to download COCO images and annotations…

6a27e82

…, alongside README.

chore: Pre-processing images.

ea0972f

chore: Pre-processing images.

9e48ffe

feat: Changing formatter and retrieving captions and pre-processed te…

7f740da

…nsor for each image.

fix: Returning captions only instead of array.

0d791a4

LuchoTurtle added 4 commits December 19, 2023 18:40

feat: Finishing BLEU score.

3ffeff1

feat: Installing nltk to get METEOR score. Adding intro in notebook.

316ddc8

feat: Adding METEOR score.

94ff0ec

feat: Adding WER.

84fccc4

This was referenced Dec 19, 2023

Error deploying due to toastify-js #36

Closed

Comment on ensuring that uploaded files are images #35

Closed

LuchoTurtle added 11 commits December 20, 2023 18:44

feat: Finishing table materialization.

7d8cdbf

chore: Installing tabulate to get markdown to create table.

70c0d08

feat: Adding output markdown.

03eb76c

feat: Adding section about metrics jupyter notebook.

19ee28d

feat: Adding results for resnet-50.

c41dbea

feat: Adding results for resnet-50.

822f277

feat: Adding BLIP large results.

c9b1f05

chore: Defaulting to BLIP-base in run.exs.

9d1bab3

fix: Changing the column names of the table.

5c48c4d

feat: Adding size of models.

60f010c

feat: Updating README.

6cf2b19

LuchoTurtle marked this pull request as ready for review December 21, 2023 00:18

LuchoTurtle added awaiting-review An issue or pull request that needs to be reviewed and removed in-progress An issue or pull request that is being worked on by the assigned person labels Dec 21, 2023

LuchoTurtle assigned nelsonic and unassigned LuchoTurtle Dec 21, 2023

LuchoTurtle added 2 commits December 21, 2023 00:26

fix: Fixing the titles of README and adding section referring to the …

90bb193

…benchmark folder.

feat: Add missing information on both READMEs.

8ecf891

nelsonic approved these changes Dec 24, 2023

View reviewed changes

nelsonic merged commit f9890ba into main Dec 24, 2023
3 checks passed

nelsonic deleted the evaluation branch December 24, 2023 17:26

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[PR] Adding benchmarks between models #31

[PR] Adding benchmarks between models #31

LuchoTurtle commented Dec 12, 2023

codecov bot commented Dec 12, 2023 •

edited

Loading

LuchoTurtle commented Dec 21, 2023

nelsonic left a comment

[PR] Adding benchmarks between models #31

[PR] Adding benchmarks between models #31

Conversation

LuchoTurtle commented Dec 12, 2023

codecov bot commented Dec 12, 2023 • edited Loading

Codecov Report

LuchoTurtle commented Dec 21, 2023

nelsonic left a comment

Choose a reason for hiding this comment

codecov bot commented Dec 12, 2023 •

edited

Loading