Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some weights of the model checkpoint at /models/DeepSeek-V3_bf16 were not used when initializing DeepseekV3ForCausalLM #35688

Open
4 tasks
Godlovecui opened this issue Jan 14, 2025 · 2 comments
Labels

Comments

@Godlovecui
Copy link

System Info

I use AutoModelForCausalLM.from_pretrained to load DeepSeek_V3, it raises below warning:

Image
then I print the model state keys, it only has 60 layers, however, the deepseek v3 weight actual has 61 layers, the last layer is missing.

Image

How to fix it? Thank you~

Who can help?

No response

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction

model = AutoModelForCausalLM.from_pretrained(
"path/to/deepseek_v3_bf16,
device_map="cpu",
torch_dtype="auto",
trust_remote_code=True,
)
print(model.state_dict().keys())

Expected behavior

AutoModelForCausalLM.from_pretrained could load deepseek v3 correctly.

@Godlovecui Godlovecui added the bug label Jan 14, 2025
@zucchini-nlp
Copy link
Member

@Godlovecui hey!

Seems like there is a mismatch between the actual state dict keys and the model's configuratino in DeepSeek. Can you open an issue in the hub repo (https://huggingface.co/deepseek-ai/DeepSeek-V3/tree/main), since the error is related to model weights?

@Godlovecui
Copy link
Author

Godlovecui commented Jan 15, 2025

@zucchini-nlp
ok, Thank you for your reply. The issue is opened here:
https://huggingface.co/deepseek-ai/DeepSeek-V3/discussions/62

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants