Question about Reproducing Figure 4 - Inference Time vs Vocabulary Size #1

wowfingerlicker · 2024-11-28T09:24:51Z

I am currently trying to reproduce the results shown in Figure 4 - Inference Time vs Vocabulary Size from your project. I have a couple of questions regarding the methodology used for this figure:

What inference framework was utilized to measure the inference time?
Was the embedding layer modified to special vocab size before testing the inference speed?

Thanks

wowfingerlicker · 2024-11-28T09:30:38Z

in my experiments, the NSL * inference time curve is continuously decreasing and does not exhibit an inflection point as shown in your figure 4 - Time Optimal Vocabulary

gautierdag · 2024-11-28T12:18:02Z

Hi thanks for the questions!

Just used multiple runs on the same hardware and tracked time naively. As long as you pick a unit of time like iteration/ms or batch/ms and keep that constant for all experiments you should find - not unsurprisingly that time increases as vocab size increases. We used an internal Meta repo for the inference, but to avoid possible caching optimisations, make sure to use random tokens in each batch.
A few special tokens are negligible. What matters is you adjust the number of tokens in the embedding layer to be to the size of the vocabulary.

To obtain the optimal trade-off, like in Figure 4, also make sure that you normalise both time usage and NSL first - at the same vocabulary size.

wowfingerlicker · 2024-12-16T07:36:03Z

Thanks for your reply. Both time usage and NSL has been normalized In my test. However, when I multiply these two metrics together, the result remains monotonic within a vocabulary size of up to 290k.

wowfingerlicker · 2024-12-16T07:45:43Z

"to avoid possible caching optimisations, make sure to use random tokens in each batch"

-- I should try this suggestion, thanks :)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question about Reproducing Figure 4 - Inference Time vs Vocabulary Size #1

Question about Reproducing Figure 4 - Inference Time vs Vocabulary Size #1

wowfingerlicker commented Nov 28, 2024

wowfingerlicker commented Nov 28, 2024

gautierdag commented Nov 28, 2024 •

edited

Loading

wowfingerlicker commented Dec 16, 2024

wowfingerlicker commented Dec 16, 2024

Question about Reproducing Figure 4 - Inference Time vs Vocabulary Size #1

Question about Reproducing Figure 4 - Inference Time vs Vocabulary Size #1

Comments

wowfingerlicker commented Nov 28, 2024

wowfingerlicker commented Nov 28, 2024

gautierdag commented Nov 28, 2024 • edited Loading

wowfingerlicker commented Dec 16, 2024

wowfingerlicker commented Dec 16, 2024

gautierdag commented Nov 28, 2024 •

edited

Loading