Some weights of the model checkpoint at /models/DeepSeek-V3_bf16 were not used when initializing DeepseekV3ForCausalLM #35688

Godlovecui · 2025-01-14T08:19:54Z

System Info

I use AutoModelForCausalLM.from_pretrained to load DeepSeek_V3, it raises below warning:

then I print the model state keys, it only has 60 layers, however, the deepseek v3 weight actual has 61 layers, the last layer is missing.

How to fix it? Thank you~

Who can help?

No response

Information

The official example scripts
My own modified scripts

Tasks

An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
My own task or dataset (give details below)

Reproduction

model = AutoModelForCausalLM.from_pretrained(
"path/to/deepseek_v3_bf16,
device_map="cpu",
torch_dtype="auto",
trust_remote_code=True,
)
print(model.state_dict().keys())

Expected behavior

AutoModelForCausalLM.from_pretrained could load deepseek v3 correctly.

The text was updated successfully, but these errors were encountered:

zucchini-nlp · 2025-01-14T10:35:44Z

@Godlovecui hey!

Seems like there is a mismatch between the actual state dict keys and the model's configuratino in DeepSeek. Can you open an issue in the hub repo (https://huggingface.co/deepseek-ai/DeepSeek-V3/tree/main), since the error is related to model weights?

Godlovecui · 2025-01-15T02:22:44Z

@zucchini-nlp
ok, Thank you for your reply. The issue is opened here:
https://huggingface.co/deepseek-ai/DeepSeek-V3/discussions/62

Godlovecui added the bug label Jan 14, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Some weights of the model checkpoint at /models/DeepSeek-V3_bf16 were not used when initializing DeepseekV3ForCausalLM #35688

Some weights of the model checkpoint at /models/DeepSeek-V3_bf16 were not used when initializing DeepseekV3ForCausalLM #35688

Godlovecui commented Jan 14, 2025

zucchini-nlp commented Jan 14, 2025

Godlovecui commented Jan 15, 2025 •

edited

Loading

Some weights of the model checkpoint at /models/DeepSeek-V3_bf16 were not used when initializing DeepseekV3ForCausalLM #35688

Some weights of the model checkpoint at /models/DeepSeek-V3_bf16 were not used when initializing DeepseekV3ForCausalLM #35688

Comments

Godlovecui commented Jan 14, 2025

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

zucchini-nlp commented Jan 14, 2025

Godlovecui commented Jan 15, 2025 • edited Loading

Godlovecui commented Jan 15, 2025 •

edited

Loading