You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
It is important to note that higher compression rates could also lead to deteriorated downstream performance, since shorter sequences give less effective FLOPs to a model to reason (Goyal et al., 2023). This is a consequence of the modern Transformer decoder architecture in which every token requires an additional forward pass to generate. Therefore even seemingly low-information tokens might still provide gains on downstream task. This is evidenced by Goyal et al. (2023), who propose Pause Tokens, special empty tokens added to the context to enable the model to 'pause' its reasoning and add FLOPs during inference.
在评估tokenizer的部分给出的是tokenizer自身的评估指标,比如压缩率
但是,高压缩率的tokenizer并不意味模型的效果也更好,是否能给出最终模型层面的效果?
例如:sentencepiece实验中的BLUE
https://github.com/google/sentencepiece/blob/master/doc/experiments.md#english-to-japanese
The text was updated successfully, but these errors were encountered: