Replies: 1 comment
-
config.json中的vocab_size是embedding层的大小,实际词表大小用len(tokenizer),其它都可以直接截断。 |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
您好,我希望使用Qwen2.5-7B蒸馏Qwen2.5-1.5B ,但是两者的vocab size不同(152064和151936),这导致两者在计算logits的交叉熵时形状不匹配,请问如何设置才能在读取模型的时候,使两者的vocab以及embedding层对齐呢?
Beta Was this translation helpful? Give feedback.
All reactions