-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Question about Reproducing Figure 4 - Inference Time vs Vocabulary Size #1
Comments
Hi thanks for the questions!
To obtain the optimal trade-off, like in Figure 4, also make sure that you normalise both time usage and NSL first - at the same vocabulary size. |
Thanks for your reply. Both time usage and NSL has been normalized In my test. However, when I multiply these two metrics together, the result remains monotonic within a vocabulary size of up to 290k. |
"to avoid possible caching optimisations, make sure to use random tokens in each batch" -- I should try this suggestion, thanks :) |
I am currently trying to reproduce the results shown in Figure 4 - Inference Time vs Vocabulary Size from your project. I have a couple of questions regarding the methodology used for this figure:
What inference framework was utilized to measure the inference time?
Was the embedding layer modified to special vocab size before testing the inference speed?
Thanks
The text was updated successfully, but these errors were encountered: