You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I cloned the repo last month (before the most recently updated bug pertaining to the evaluation was fixed) but I made the (one line?) fix locally. I then tried training a model from scratch and the following are the results I obtained:
epochs = 25 xe / 25 sc (as described in the paper)
Bleu_1: 0.419
Bleu_2: 0.262
Bleu_3: 0.165
Bleu_4: 0.101
METEOR: 0.166
ROUGE_L: 0.313
CIDEr: 0.257
Here are the results the paper claims to achieve (using epochs = 25 xe / 25 sc):
Bleu_1: 43.54
Bleu_2: 27.44
Bleu_3: 17.33
Bleu_4: 10.58
METEOR: 17.86
CIDEr: 30.63
Any ideas for this discrepancy?
The text was updated successfully, but these errors were encountered:
Hi @arjung128 , thanks for the issue. The reinforcement learning segment of training is very hyperparameter sensitive and the default parameters in the repo (lr, etc.) are not optimal. In particular, the CIDEr score seems to have large run-to-run variance. That being said, it's good to see that all your other metrics (BLEU-1, BLEU-4, METEOR) are very close to those reported in the paper.
Due to the sensitivity of the RL training, I've been developing another version of this repo based on a different approach to paragraph captioning. It should be much more stable while still giving good results.
I'll release it soon (have to go through some conference submission stuff first). Sorry to keep you waiting in the meantime!
I cloned the repo last month (before the most recently updated bug pertaining to the evaluation was fixed) but I made the (one line?) fix locally. I then tried training a model from scratch and the following are the results I obtained:
epochs = 25 xe / 25 sc (as described in the paper)
Bleu_1: 0.419
Bleu_2: 0.262
Bleu_3: 0.165
Bleu_4: 0.101
METEOR: 0.166
ROUGE_L: 0.313
CIDEr: 0.257
epochs = 30 xe / 170 sc (default in the repo)
Bleu_1: 0.430
Bleu_2: 0.271
Bleu_3: 0.171
Bleu_4: 0.105
METEOR: 0.170
ROUGE_L: 0.312
CIDEr: 0.270
Here are the results the paper claims to achieve (using epochs = 25 xe / 25 sc):
Bleu_1: 43.54
Bleu_2: 27.44
Bleu_3: 17.33
Bleu_4: 10.58
METEOR: 17.86
CIDEr: 30.63
Any ideas for this discrepancy?
The text was updated successfully, but these errors were encountered: