Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Very different result with the paper #29

Open
lipingtang17 opened this issue Nov 11, 2019 · 4 comments
Open

Very different result with the paper #29

lipingtang17 opened this issue Nov 11, 2019 · 4 comments

Comments

@lipingtang17
Copy link

lipingtang17 commented Nov 11, 2019

Dear Mikel,

Thank you for sharing your great work with us.

I'm running your codes and trying to reproduce the result that you reported in your ACL 2018 paper. But I could not get a comparable result.

I got all required datasets and embedding file by ./get_data.sh and used them to train the model by
python3 map_embeddings.py --acl2018 --cuda SRC.EMB TRG.EMB SRC_MAPPED.EMB TRG_MAPPED.EMB

The results you reported in the paper are: 48.13 for EN-IT, 48.19 for EN-DE, 32.63 for EN-FI and 37.33 for EN-ES respectively. However, I got the results for 4 language pair are: 21.04 for EN-IT, 38.6 for EN-DE, 18.64 for EN-FI and 12.68 for EN-ES respectively. My evaluation code is:
python3 eval_translation.py SRC_MAPPED.EMB TRG_MAPPED.EMB -d TEST.DICT --retrieval csls

My results are nearly only one half of what you reported. But I totally don't know why. Could you help me? Thank you very much!

@lipingtang17
Copy link
Author

I have another question about the evaluation policy. You calculated the coverage, that is the percentage of test words that are in the cutoff vocabulary. Among these "in-vocabulary" words, the percentage of correctly predicted word pair is calculated as the accuracy. So are those "out-of-vocabulary" words which occurred in the test ignored? is it reasonable or is it the common way that the community use?
Looking forward to your reply! Thanks very much!

@artetxem
Copy link
Owner

You must be doing something wrong. It might be that you are using the test dictionary in the reverse direction. In that case simply swap src_mapped.emb and trg_mapped.emb when calling the evaluation script.

Also, you should get 100% coverage if you are using the provided data. If not, you are definitely doing something wrong (encoding issues or using the test dictionary in the reverse direction are the only things that come to mind).

@lipingtang17
Copy link
Author

You must be doing something wrong. It might be that you are using the test dictionary in the reverse direction. In that case simply swap src_mapped.emb and trg_mapped.emb when calling the evaluation script.

Also, you should get 100% coverage if you are using the provided data. If not, you are definitely doing something wrong (encoding issues or using the test dictionary in the reverse direction are the only things that come to mind).

I reversed the embedding files and it works now! Thank you very much!

@TheShayegh
Copy link

You can close this issue. Nah?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants