You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi, I'm generating LLM sequences with some of the HF models such as pythia-1.4b. Some of my generations result in a sequence consisting only of form feed token, which is 12th ASCII character.
ZeroDivisionError Traceback (most recent call last)
[<ipython-input-1-8625f8bf1df7>](https://localhost:8080/#) in <cell line: 8>()
6 reference = chr(12)
7
----> 8 bleu_score = bleu.compute(
9 predictions=[prediction], references=[[reference]]
10 )["bleu"]
2 frames
[/usr/local/lib/python3.10/dist-packages/evaluate/module.py](https://localhost:8080/#) in compute(self, predictions, references, **kwargs)
465 inputs = {input_name: self.data[input_name] for input_name in self._feature_names()}
466 with temp_seed(self.seed):
--> 467 output = self._compute(**inputs, **compute_kwargs)
468
469 if self.buf_writer is not None:
[~/.cache/huggingface/modules/evaluate_modules/metrics/evaluate-metric--bleu/9e0985c1200e367cce45605ce0ecb5ede079894e0f24f54613fca08eeb8aff76/bleu.py](https://localhost:8080/#) in _compute(self, predictions, references, tokenizer, max_order, smooth)
120 references = [[tokenizer(r) for r in ref] for ref in references]
121 predictions = [tokenizer(p) for p in predictions]
--> 122 score = compute_bleu(
123 reference_corpus=references, translation_corpus=predictions, max_order=max_order, smooth=smooth
124 )
[~/.cache/huggingface/modules/evaluate_modules/metrics/evaluate-metric--bleu/9e0985c1200e367cce45605ce0ecb5ede079894e0f24f54613fca08eeb8aff76/nmt_bleu.py](https://localhost:8080/#) in compute_bleu(reference_corpus, translation_corpus, max_order, smooth)
101 geo_mean = 0
102
--> 103 ratio = float(translation_length) / reference_length
104
105 if ratio > 1.0:
ZeroDivisionError: float division by zero
The expected behaviour would be that the score should still be computed for this character even though this is a non-printable character. I believe this will happen with other non-printable characters. Is this an intended behaviour?
The text was updated successfully, but these errors were encountered:
Hi, I'm generating LLM sequences with some of the HF models such as pythia-1.4b. Some of my generations result in a sequence consisting only of form feed token, which is 12th ASCII character.
The following code results in an error:
The expected behaviour would be that the score should still be computed for this character even though this is a non-printable character. I believe this will happen with other non-printable characters. Is this an intended behaviour?
The text was updated successfully, but these errors were encountered: