Modify perplexity script #108

SunMarc · 2024-03-05T21:42:51Z

What does this PR do ?

This PR changes the script to calculate the perplexity. This perplexity calculation is compatible with the one in llama.cpp, so we can compare the results with ggml model. See the following thread for more information. I used it a lot for calculating the perplexity of quantized models such as awq, gptq.

With this script, we get the correct perplexity for gemma or mistral. cc @younesbelkada

SunMarc · 2024-03-05T21:44:55Z

bench/generation/perplexity.py

@@ -64,7 +275,7 @@ def perplexity(
    stride: int = 512,
 ):
    dtype = torch.float32 if device.type == "cpu" else torch.float16
-    tokenizer = AutoTokenizer.from_pretrained(model_id)
+    tokenizer = AutoTokenizer.from_pretrained(model_id, use_fast=False)


I set it to False since the slow tokenizer is better at processing long string. I only takes a few seconds to process the dataset this way compared to the fast tokenizer (default)

dacorvo · 2024-03-06T09:21:35Z

Merged as #110 that removes also the old code.

change perplexity script

b4c19f4

SunMarc requested a review from dacorvo March 5, 2024 21:42

SunMarc commented Mar 5, 2024

View reviewed changes

style

397b144

younesbelkada mentioned this pull request Mar 6, 2024

Benchmarks: Add missing latency and accuracy benchmarks #109

Merged

dacorvo mentioned this pull request Mar 6, 2024

Merge: change perplexity script #110

Merged

dacorvo closed this Mar 6, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Modify perplexity script #108

Modify perplexity script #108

SunMarc commented Mar 5, 2024

SunMarc Mar 5, 2024 •

edited

Loading

dacorvo commented Mar 6, 2024

Modify perplexity script #108

Modify perplexity script #108

Conversation

SunMarc commented Mar 5, 2024

What does this PR do ?

SunMarc Mar 5, 2024 • edited Loading

Choose a reason for hiding this comment

dacorvo commented Mar 6, 2024

SunMarc Mar 5, 2024 •

edited

Loading