You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Seems like there is a mismatch between the actual state dict keys and the model's configuratino in DeepSeek. Can you open an issue in the hub repo (https://huggingface.co/deepseek-ai/DeepSeek-V3/tree/main), since the error is related to model weights?
System Info
I use AutoModelForCausalLM.from_pretrained to load DeepSeek_V3, it raises below warning:
then I print the model state keys, it only has 60 layers, however, the deepseek v3 weight actual has 61 layers, the last layer is missing.
How to fix it? Thank you~
Who can help?
No response
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
model = AutoModelForCausalLM.from_pretrained(
"path/to/deepseek_v3_bf16,
device_map="cpu",
torch_dtype="auto",
trust_remote_code=True,
)
print(model.state_dict().keys())
Expected behavior
AutoModelForCausalLM.from_pretrained could load deepseek v3 correctly.
The text was updated successfully, but these errors were encountered: