Very different result with the paper #29

lipingtang17 · 2019-11-11T12:35:55Z

Dear Mikel,

Thank you for sharing your great work with us.

I'm running your codes and trying to reproduce the result that you reported in your ACL 2018 paper. But I could not get a comparable result.

I got all required datasets and embedding file by ./get_data.sh and used them to train the model by
python3 map_embeddings.py --acl2018 --cuda SRC.EMB TRG.EMB SRC_MAPPED.EMB TRG_MAPPED.EMB

The results you reported in the paper are: 48.13 for EN-IT, 48.19 for EN-DE, 32.63 for EN-FI and 37.33 for EN-ES respectively. However, I got the results for 4 language pair are: 21.04 for EN-IT, 38.6 for EN-DE, 18.64 for EN-FI and 12.68 for EN-ES respectively. My evaluation code is:
python3 eval_translation.py SRC_MAPPED.EMB TRG_MAPPED.EMB -d TEST.DICT --retrieval csls

My results are nearly only one half of what you reported. But I totally don't know why. Could you help me? Thank you very much!

lipingtang17 · 2019-11-11T12:45:58Z

I have another question about the evaluation policy. You calculated the coverage, that is the percentage of test words that are in the cutoff vocabulary. Among these "in-vocabulary" words, the percentage of correctly predicted word pair is calculated as the accuracy. So are those "out-of-vocabulary" words which occurred in the test ignored? is it reasonable or is it the common way that the community use?
Looking forward to your reply! Thanks very much!

artetxem · 2019-11-11T15:42:02Z

You must be doing something wrong. It might be that you are using the test dictionary in the reverse direction. In that case simply swap src_mapped.emb and trg_mapped.emb when calling the evaluation script.

Also, you should get 100% coverage if you are using the provided data. If not, you are definitely doing something wrong (encoding issues or using the test dictionary in the reverse direction are the only things that come to mind).

lipingtang17 · 2019-11-12T03:18:12Z

You must be doing something wrong. It might be that you are using the test dictionary in the reverse direction. In that case simply swap src_mapped.emb and trg_mapped.emb when calling the evaluation script.

Also, you should get 100% coverage if you are using the provided data. If not, you are definitely doing something wrong (encoding issues or using the test dictionary in the reverse direction are the only things that come to mind).

I reversed the embedding files and it works now! Thank you very much!

TheShayegh · 2022-12-17T23:14:00Z

You can close this issue. Nah?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Very different result with the paper #29

Very different result with the paper #29

lipingtang17 commented Nov 11, 2019 •

edited

Loading

lipingtang17 commented Nov 11, 2019

artetxem commented Nov 11, 2019

lipingtang17 commented Nov 12, 2019

TheShayegh commented Dec 17, 2022

Very different result with the paper #29

Very different result with the paper #29

Comments

lipingtang17 commented Nov 11, 2019 • edited Loading

lipingtang17 commented Nov 11, 2019

artetxem commented Nov 11, 2019

lipingtang17 commented Nov 12, 2019

TheShayegh commented Dec 17, 2022

lipingtang17 commented Nov 11, 2019 •

edited

Loading