low accuracy #16

15091444119 · 2018-07-30T03:20:23Z

I get only 10% accuracy on EN-DE using WMT16 as training data.
The identical and unsupervised method does not differ much.

15091444119 · 2018-07-30T03:21:00Z

How can I improve it?

zhangxiangnick · 2018-07-30T15:19:13Z

What do you mean by "using WMT16 as training data"?

I tried the unsupervised command in this repo on FastText embeddings of EN and DE last time, it works well. At least some over 50% accuracy on MUSE EN-DE bilingual dictionary.

15091444119 · 2018-07-31T00:08:55Z

I mean use wmt16 corpus to train word2vec.

I have found my bug and got 40% accuracy on MUSE EN-DE test dictionary. What's your training corpus?Is FastText better than word2vec？

zhangxiangnick · 2018-08-05T02:52:10Z

I didn't train my own embeddings. I used FastText pre-trained embeddings.

hassyGo · 2018-08-16T21:24:13Z

Word2vec embeddings are purely co-occurrence-based, whereas fasttext embeddings additionally take into account character information.
Therefore it is hard to directly compare them in general context.

yaserkl · 2018-08-31T05:32:53Z

@artetxem I've used a similar approach using ELMO word embedding. I have two almost identical vocab files in English which I extracted their embeddings using ELMO. I just wanted to try out this library and see how it find matches between these two almost identical files as follows:

python3 map_embeddings.py --identical SRC.EMB SEMI-SRC.EMB SRC_MAPPED.EMB TRG_MAPPED.EMB

And then tried out to find the similarities of a few simple english words like (was, she, is, the) using the shared embeddings by this command:
python3 eval_translation.py SRC_MAPPED.EMB TRG_MAPPED.EMB -d TEST.DICT

but the accuracy was 0.0% for me!!

Also, another question, why the resulting shared embeddings for target embedding has the same words as the SRC.EMB embedding file? I'm not sure how we can use the TRG_MAPPED.EMB file for instance for a Dutch text if it contains the same words from SRC.EMB (in English). I think I'm missing something, here.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

low accuracy #16

low accuracy #16

15091444119 commented Jul 30, 2018

15091444119 commented Jul 30, 2018

zhangxiangnick commented Jul 30, 2018

15091444119 commented Jul 31, 2018

zhangxiangnick commented Aug 5, 2018

hassyGo commented Aug 16, 2018

yaserkl commented Aug 31, 2018

low accuracy #16

low accuracy #16

Comments

15091444119 commented Jul 30, 2018

15091444119 commented Jul 30, 2018

zhangxiangnick commented Jul 30, 2018

15091444119 commented Jul 31, 2018

zhangxiangnick commented Aug 5, 2018

hassyGo commented Aug 16, 2018

yaserkl commented Aug 31, 2018