Skip to content

Using GPT2Tokenizer convert Arabic characters to symbols #63

Answered by WissamAntoun
AbeerAbuZayed asked this question in Q&A
Discussion options

You must be logged in to vote

This is normal don't worry about it, GPT2 tokenizer works on a byte-level so it will treat each character as a 4 bytes, so printing it as it is won't work.

Replies: 1 comment 5 replies

Comment options

You must be logged in to vote
5 replies
@AbeerAbuZayed
Comment options

@WissamAntoun
Comment options

@AbeerAbuZayed
Comment options

@WissamAntoun
Comment options

@AbeerAbuZayed
Comment options

Answer selected by AbeerAbuZayed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
2 participants