Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

support additional special tokens #29

Open
Derekglk opened this issue Nov 16, 2023 · 1 comment
Open

support additional special tokens #29

Derekglk opened this issue Nov 16, 2023 · 1 comment

Comments

@Derekglk
Copy link

Derekglk commented Nov 16, 2023

Hello @wangkuiyi ,

It seems this tokenizer only supports one special token "<|endoftext|>".
Does it support other additional special tokens? For instatnce the ones we added in special_tokens_map.json,
like
"<|user|>", "<|assistant|>", "<s>", "</s>" and "<unk>"?

Thanks!

@PengWenChen
Copy link

Hi there~
I would also like to ask about adding special tokens.
The case is that for some models, such as Qwen1.5, the special tokens are not in the vocab.json or merges.txt at first.
They seem to be added later in huggingface Rust tokenizer implementation.
Does this repo also support this adding special tokens feature? Thank you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants