Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot Replicate Results Described in Paper #8

Open
arjung128 opened this issue Jul 11, 2019 · 1 comment
Open

Cannot Replicate Results Described in Paper #8

arjung128 opened this issue Jul 11, 2019 · 1 comment

Comments

@arjung128
Copy link

I cloned the repo last month (before the most recently updated bug pertaining to the evaluation was fixed) but I made the (one line?) fix locally. I then tried training a model from scratch and the following are the results I obtained:

epochs = 25 xe / 25 sc (as described in the paper)
Bleu_1: 0.419
Bleu_2: 0.262
Bleu_3: 0.165
Bleu_4: 0.101
METEOR: 0.166
ROUGE_L: 0.313
CIDEr: 0.257

epochs = 30 xe / 170 sc (default in the repo)
Bleu_1: 0.430
Bleu_2: 0.271
Bleu_3: 0.171
Bleu_4: 0.105
METEOR: 0.170
ROUGE_L: 0.312
CIDEr: 0.270

Here are the results the paper claims to achieve (using epochs = 25 xe / 25 sc):
Bleu_1: 43.54
Bleu_2: 27.44
Bleu_3: 17.33
Bleu_4: 10.58
METEOR: 17.86
CIDEr: 30.63

Any ideas for this discrepancy?

@lukemelas
Copy link
Owner

Hi @arjung128 , thanks for the issue. The reinforcement learning segment of training is very hyperparameter sensitive and the default parameters in the repo (lr, etc.) are not optimal. In particular, the CIDEr score seems to have large run-to-run variance. That being said, it's good to see that all your other metrics (BLEU-1, BLEU-4, METEOR) are very close to those reported in the paper.

Due to the sensitivity of the RL training, I've been developing another version of this repo based on a different approach to paragraph captioning. It should be much more stable while still giving good results.

I'll release it soon (have to go through some conference submission stuff first). Sorry to keep you waiting in the meantime!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants