-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix tokenizer mismatch bug between model and tokenizer for THUDM/glm-… #2672
Conversation
Did you try out the change? There doesn't seem to be a |
Sorry I misread that. Also I found this branch used natively within Transformers, and it also provides a tokenizer.json, but loading the model requires some changes Are you interested in using this branch to modify the glm4 example? https://huggingface.co/THUDM/glm-4-9b/tree/refs%2Fpr%2F15 |
Yes, they are the same. Also, when I was running the inference, I encountered some problems and it seemed that I could not finish the inference. I am trying it and I will give you feedback if I have more information. |
I tried the glm4 example and used the tokenizer.json from THUDM/codegeex4-all-9b, but I couldn't output eos_token. The token output never ended until the length of sample_len was reached, and I found that the token output was repeated. The code is the same as the glm4 example, the only difference is that I loaded the st and tokenizer.json files from the local huggingface-cli download file.
In addition, I saw that GLM officially provided the HF version of glm4, https://huggingface.co/THUDM/glm-4-9b-chat-hf. Can I use candle for inference at present? Could you provide some help or troubleshooting suggestions? Thanks @LaurentMazare |
Not sure to understand what your problem is exactly, I've refactored a bit the glm4 example so that it is closer to other examples and from what I see most generations properly end with an eos token being produced, e.g.
Maybe you can provide more details, ideally with a simple way to reproduce the isuse? |
The output after I run it is as follows,Look at the end of the last line, the same token will be output until the length of sample_len reaches 2048, so I ended it early.:
This is the code copied from the glm4 example with almost no changes, nnly one line is added to print token information:
This is the code that is loaded, with some changes:
|
Could you try running the same code that I ran (so the glm4 example from the current github version) and see if it behaves differently compared to what I got?
|
I pulled the version you submitted this time (#2694) and ran it, but there was still no output of eos_token, which caused tokens to be generated all the time.
I can only change sample_len to 512 so that it ends after reaching the length.
Sometimes it can end on its own
Also, these answers are terrible, and are completely different from the glm4 I deployed using python transformers. Look at the output of my python transformers:
|
Ok so it seems that it produces the eos token from time to time. |
OK, thanks. I'll check these first, and if I find that it's a candle example problem, I'll submit an issue or a fixed PR. |
Fix tokenizer mismatch bug between model and tokenizer for THUDM/glm-4-9b example