Cannot Replicate Results Described in Paper #8

arjung128 · 2019-07-11T20:07:30Z

I cloned the repo last month (before the most recently updated bug pertaining to the evaluation was fixed) but I made the (one line?) fix locally. I then tried training a model from scratch and the following are the results I obtained:

epochs = 25 xe / 25 sc (as described in the paper)
Bleu_1: 0.419
Bleu_2: 0.262
Bleu_3: 0.165
Bleu_4: 0.101
METEOR: 0.166
ROUGE_L: 0.313
CIDEr: 0.257

epochs = 30 xe / 170 sc (default in the repo)
Bleu_1: 0.430
Bleu_2: 0.271
Bleu_3: 0.171
Bleu_4: 0.105
METEOR: 0.170
ROUGE_L: 0.312
CIDEr: 0.270

Here are the results the paper claims to achieve (using epochs = 25 xe / 25 sc):
Bleu_1: 43.54
Bleu_2: 27.44
Bleu_3: 17.33
Bleu_4: 10.58
METEOR: 17.86
CIDEr: 30.63

Any ideas for this discrepancy?

lukemelas · 2019-07-11T21:50:04Z

Hi @arjung128 , thanks for the issue. The reinforcement learning segment of training is very hyperparameter sensitive and the default parameters in the repo (lr, etc.) are not optimal. In particular, the CIDEr score seems to have large run-to-run variance. That being said, it's good to see that all your other metrics (BLEU-1, BLEU-4, METEOR) are very close to those reported in the paper.

Due to the sensitivity of the RL training, I've been developing another version of this repo based on a different approach to paragraph captioning. It should be much more stable while still giving good results.

I'll release it soon (have to go through some conference submission stuff first). Sorry to keep you waiting in the meantime!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cannot Replicate Results Described in Paper #8

Cannot Replicate Results Described in Paper #8

arjung128 commented Jul 11, 2019

lukemelas commented Jul 11, 2019

Cannot Replicate Results Described in Paper #8

Cannot Replicate Results Described in Paper #8

Comments

arjung128 commented Jul 11, 2019

lukemelas commented Jul 11, 2019