You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi Mikel!
I apply all the steps which your toolkit required in paper on urdu- english corpus. But get very poor bleu score like 0.5 or 0.9.
data Preprocessing
step 1) monolingual data on apply: tokenization, true casing and cleaning 1-50 sentence length with moses.
step 2)word embeddings with word2vec parameters epco=5, window_size=5, window_size =5 and dimension=300 then apply MUSE for alignment mapped on shared space with Vecmap.
size of my corpus is 13k. (it's enough?)
my query is this toolkit support urdu language.
and second i use parameter toolkit default.
if effect parameter on model training kindly please share.
The text was updated successfully, but these errors were encountered:
Hi Mikel!
I apply all the steps which your toolkit required in paper on urdu- english corpus. But get very poor bleu score like 0.5 or 0.9.
data Preprocessing
step 1) monolingual data on apply: tokenization, true casing and cleaning 1-50 sentence length with moses.
step 2)word embeddings with word2vec parameters epco=5, window_size=5, window_size =5 and dimension=300 then apply MUSE for alignment mapped on shared space with Vecmap.
size of my corpus is 13k. (it's enough?)
my query is this toolkit support urdu language.
and second i use parameter toolkit default.
if effect parameter on model training kindly please share.
The text was updated successfully, but these errors were encountered: