-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[PR] Adding benchmarks between models #31
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #31 +/- ##
=========================================
Coverage 100.00% 100.00%
=========================================
Files 2 3 +1
Lines 54 76 +22
=========================================
+ Hits 54 76 +22 ☔ View full report in Codecov by Sentry. |
…, alongside README.
…nsor for each image.
I think this PR is getting big already and it provides a good foundation to add more benchmarks in the future - it's already fairly streamlined. The process of seeing the supported models in I've added benchmarks for I'm submitting for review now 👌 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great additions @LuchoTurtle 👌
closes #12
This PR will create a benchmark between some multimodal models that are available on
Bumblebee
that allow image captioning.I'm going to make a performance text with COCO dataset (perhaps the most famous open-source labelled set) to evaluate the performance of each model in
Elixir
.Although the performance benchmark tests will be made in
Elixir
, the results will be exported to a file and then processed withPython
, because it has more support for libraries to perform NLP metric evaluation (R
was also considered, butPython
is more beginner-friendly for anyone that's curious with this repo).To measure the model performance, I'll try to get scores on different metrics:
BLEU
,CIDER
,METEOR
,SPICE
andROUGE-L
. Although not all of these will be measured (BLEU
andROUGE
are probably the most relevant), it's interesting to mention them as alternative routes.