You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have followed your steps in this article https://huggingface.co/blog/how-to-train to train a model in Greek language.All files I used is in UTF-8 encoding. When using ByteLevelBPETokenizer I get weird symbols. I read in other issues here that this is normal but there is no normal character in my file merges.txt. Also when I try to print it to see if it tokenizes a word correctly it prints this:
Is this normal? Or ByteLevelBPETokenizer is not suitable for Greek characters? Also is it possible to tranform this output to readable string to check if it is correct?
Example of merges.txt:
ĠÏĦ ο
ÏĦ η
ĠÎ ½
ĠÏĦ οÏħ
Thank you
The text was updated successfully, but these errors were encountered:
gdet
changed the title
ByteLevelBPETokenizer with Greek problem.
ByteLevelBPETokenizer with Greek give weird symbols.
Apr 8, 2020
gdet
changed the title
ByteLevelBPETokenizer with Greek give weird symbols.
ByteLevelBPETokenizer with Greek gives weird symbols.
Apr 8, 2020
Hello,
I have followed your steps in this article https://huggingface.co/blog/how-to-train to train a model in Greek language.All files I used is in UTF-8 encoding. When using ByteLevelBPETokenizer I get weird symbols. I read in other issues here that this is normal but there is no normal character in my file merges.txt. Also when I try to print it to see if it tokenizes a word correctly it prints this:
Is this normal? Or ByteLevelBPETokenizer is not suitable for Greek characters? Also is it possible to tranform this output to readable string to check if it is correct?
Example of merges.txt:
Thank you
The text was updated successfully, but these errors were encountered: