Using GPT2Tokenizer convert Arabic characters to symbols #63
Answered
by
WissamAntoun
AbeerAbuZayed
asked this question in
Q&A
-
I am trying to use (aragpt2-base) for Arabic text classification task as follows:
When I use the tokenizer it converts the Arabic characters to symbols like this |
Beta Was this translation helpful? Give feedback.
Answered by
WissamAntoun
Jan 26, 2021
Replies: 1 comment 5 replies
-
This is normal don't worry about it, GPT2 tokenizer works on a byte-level so it will treat each character as a 4 bytes, so printing it as it is won't work. |
Beta Was this translation helpful? Give feedback.
5 replies
Answer selected by
AbeerAbuZayed
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
This is normal don't worry about it, GPT2 tokenizer works on a byte-level so it will treat each character as a 4 bytes, so printing it as it is won't work.