-
Notifications
You must be signed in to change notification settings - Fork 131
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Very different result with the paper #29
Comments
I have another question about the evaluation policy. You calculated the coverage, that is the percentage of test words that are in the cutoff vocabulary. Among these "in-vocabulary" words, the percentage of correctly predicted word pair is calculated as the accuracy. So are those "out-of-vocabulary" words which occurred in the test ignored? is it reasonable or is it the common way that the community use? |
You must be doing something wrong. It might be that you are using the test dictionary in the reverse direction. In that case simply swap src_mapped.emb and trg_mapped.emb when calling the evaluation script. Also, you should get 100% coverage if you are using the provided data. If not, you are definitely doing something wrong (encoding issues or using the test dictionary in the reverse direction are the only things that come to mind). |
I reversed the embedding files and it works now! Thank you very much! |
You can close this issue. Nah? |
Dear Mikel,
Thank you for sharing your great work with us.
I'm running your codes and trying to reproduce the result that you reported in your ACL 2018 paper. But I could not get a comparable result.
I got all required datasets and embedding file by ./get_data.sh and used them to train the model by
python3 map_embeddings.py --acl2018 --cuda SRC.EMB TRG.EMB SRC_MAPPED.EMB TRG_MAPPED.EMB
The results you reported in the paper are: 48.13 for EN-IT, 48.19 for EN-DE, 32.63 for EN-FI and 37.33 for EN-ES respectively. However, I got the results for 4 language pair are: 21.04 for EN-IT, 38.6 for EN-DE, 18.64 for EN-FI and 12.68 for EN-ES respectively. My evaluation code is:
python3 eval_translation.py SRC_MAPPED.EMB TRG_MAPPED.EMB -d TEST.DICT --retrieval csls
My results are nearly only one half of what you reported. But I totally don't know why. Could you help me? Thank you very much!
The text was updated successfully, but these errors were encountered: