Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

llama_cpp_python-0.3.4-cp310-CU12 deepseekv3 /home/runner/work/llama-cpp-python/llama-cpp-python/vendor/llama.cpp/src/llama.cpp:5474: GGML_ASSERT(hparams.n_expert <= LLAMA_MAX_EXPERTS) failed #1896

Open
PeifengRen opened this issue Jan 15, 2025 · 0 comments

Comments

@PeifengRen
Copy link

log:
gml_cuda_init: GGML_CUDA_FORCE_MMQ: yes
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 8 CUDA devices:
Device 0: NVIDIA GeForce RTX 3090, compute capability 8.6, VMM: yes
Device 1: NVIDIA GeForce RTX 3090, compute capability 8.6, VMM: yes
Device 2: NVIDIA GeForce RTX 3090, compute capability 8.6, VMM: yes
Device 3: NVIDIA GeForce RTX 3090, compute capability 8.6, VMM: yes
Device 4: NVIDIA GeForce RTX 3090, compute capability 8.6, VMM: yes
Device 5: NVIDIA GeForce RTX 3090, compute capability 8.6, VMM: yes
Device 6: NVIDIA GeForce RTX 3090, compute capability 8.6, VMM: yes
Device 7: NVIDIA GeForce RTX 3090, compute capability 8.6, VMM: yes
llama_load_model_from_file: using device CUDA0 (NVIDIA GeForce RTX 3090) - 23997 MiB free
llama_load_model_from_file: using device CUDA1 (NVIDIA GeForce RTX 3090) - 23997 MiB free
llama_load_model_from_file: using device CUDA2 (NVIDIA GeForce RTX 3090) - 23997 MiB free
llama_load_model_from_file: using device CUDA3 (NVIDIA GeForce RTX 3090) - 23997 MiB free
llama_load_model_from_file: using device CUDA4 (NVIDIA GeForce RTX 3090) - 23997 MiB free
llama_load_model_from_file: using device CUDA5 (NVIDIA GeForce RTX 3090) - 23997 MiB free
llama_load_model_from_file: using device CUDA6 (NVIDIA GeForce RTX 3090) - 23997 MiB free
llama_load_model_from_file: using device CUDA7 (NVIDIA GeForce RTX 3090) - 23997 MiB free
llama_model_loader: additional 8 GGUFs metadata loaded.
llama_model_loader: loaded meta data with 46 key-value pairs and 1025 tensors from /raid/deepSeek-gguf/DeepSeek-V3-Q4_K_M/DeepSeek-V3-Q4_K_M-00001-of-00009.gguf (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv 0: general.architecture str = deepseek2
llama_model_loader: - kv 1: general.type str = model
llama_model_loader: - kv 2: general.name str = DeepSeek V3 BF16
llama_model_loader: - kv 3: general.size_label str = 256x20B
llama_model_loader: - kv 4: deepseek2.block_count u32 = 61
llama_model_loader: - kv 5: deepseek2.context_length u32 = 163840
llama_model_loader: - kv 6: deepseek2.embedding_length u32 = 7168
llama_model_loader: - kv 7: deepseek2.feed_forward_length u32 = 18432
llama_model_loader: - kv 8: deepseek2.attention.head_count u32 = 128
llama_model_loader: - kv 9: deepseek2.attention.head_count_kv u32 = 128
llama_model_loader: - kv 10: deepseek2.rope.freq_base f32 = 10000.000000
llama_model_loader: - kv 11: deepseek2.attention.layer_norm_rms_epsilon f32 = 0.000001
llama_model_loader: - kv 12: deepseek2.expert_used_count u32 = 8
llama_model_loader: - kv 13: general.file_type u32 = 15
llama_model_loader: - kv 14: deepseek2.leading_dense_block_count u32 = 3
llama_model_loader: - kv 15: deepseek2.vocab_size u32 = 129280
llama_model_loader: - kv 16: deepseek2.attention.q_lora_rank u32 = 1536
llama_model_loader: - kv 17: deepseek2.attention.kv_lora_rank u32 = 512
llama_model_loader: - kv 18: deepseek2.attention.key_length u32 = 192
llama_model_loader: - kv 19: deepseek2.attention.value_length u32 = 128
llama_model_loader: - kv 20: deepseek2.expert_feed_forward_length u32 = 2048
llama_model_loader: - kv 21: deepseek2.expert_count u32 = 256
llama_model_loader: - kv 22: deepseek2.expert_shared_count u32 = 1
llama_model_loader: - kv 23: deepseek2.expert_weights_scale f32 = 2.500000
llama_model_loader: - kv 24: deepseek2.expert_weights_norm bool = true
llama_model_loader: - kv 25: deepseek2.expert_gating_func u32 = 2
llama_model_loader: - kv 26: deepseek2.rope.dimension_count u32 = 64
llama_model_loader: - kv 27: deepseek2.rope.scaling.type str = yarn
llama_model_loader: - kv 28: deepseek2.rope.scaling.factor f32 = 40.000000
llama_model_loader: - kv 29: deepseek2.rope.scaling.original_context_length u32 = 4096
llama_model_loader: - kv 30: deepseek2.rope.scaling.yarn_log_multiplier f32 = 0.100000
llama_model_loader: - kv 31: tokenizer.ggml.model str = gpt2
llama_model_loader: - kv 32: tokenizer.ggml.pre str = deepseek-v3
Exception ignored on calling ctypes callback function: <function llama_log_callback at 0x7f8826b84280>
Traceback (most recent call last):
File "/home/appuser/miniconda3/envs/deepseekv3_gpu/lib/python3.10/site-packages/llama_cpp/_logger.py", line 39, in llama_log_callback
print(text.decode("utf-8"), end="", flush=True, file=sys.stderr)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xef in position 128: invalid continuation byte
llama_model_loader: - kv 34: tokenizer.ggml.token_type arr[i32,129280] = [3, 3, 3, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
llama_model_loader: - kv 35: tokenizer.ggml.merges arr[str,127741] = ["Ġ t", "Ġ a", "i n", "Ġ Ġ", "h e...
llama_model_loader: - kv 36: tokenizer.ggml.bos_token_id u32 = 0
llama_model_loader: - kv 37: tokenizer.ggml.eos_token_id u32 = 1
llama_model_loader: - kv 38: tokenizer.ggml.padding_token_id u32 = 1
llama_model_loader: - kv 39: tokenizer.ggml.add_bos_token bool = true
llama_model_loader: - kv 40: tokenizer.ggml.add_eos_token bool = false
llama_model_loader: - kv 41: tokenizer.chat_template str = {% if not add_generation_prompt is de...
llama_model_loader: - kv 42: general.quantization_version u32 = 2
llama_model_loader: - kv 43: split.no u16 = 0
llama_model_loader: - kv 44: split.count u16 = 9
llama_model_loader: - kv 45: split.tensors.count i32 = 1025
llama_model_loader: - type f32: 361 tensors
llama_model_loader: - type q4_K: 606 tensors
llama_model_loader: - type q6_K: 58 tensors
/home/runner/work/llama-cpp-python/llama-cpp-python/vendor/llama.cpp/src/llama.cpp:5474: GGML_ASSERT(hparams.n_expert <= LLAMA_MAX_EXPERTS) failed
/home/appuser/miniconda3/envs/deepseekv3_gpu/lib/python3.10/site-packages/llama_cpp/lib/libggml-base.so(+0xffeb)[0x7f891f457feb]
/home/appuser/miniconda3/envs/deepseekv3_gpu/lib/python3.10/site-packages/llama_cpp/lib/libggml-base.so(ggml_abort+0x156)[0x7f891f458566]
/home/appuser/miniconda3/envs/deepseekv3_gpu/lib/python3.10/site-packages/llama_cpp/lib/libllama.so(+0xa0232)[0x7f891f5ad232]
/home/appuser/miniconda3/envs/deepseekv3_gpu/lib/python3.10/site-packages/llama_cpp/lib/libllama.so(llama_load_model_from_file+0x660)[0x7f891f5b1700]
/home/appuser/miniconda3/envs/deepseekv3_gpu/lib/python3.10/lib-dynload/../../libffi.so.8(+0xa052)[0x7f891f698052]
/home/appuser/miniconda3/envs/deepseekv3_gpu/lib/python3.10/lib-dynload/../../libffi.so.8(+0x8925)[0x7f891f696925]
/home/appuser/miniconda3/envs/deepseekv3_gpu/lib/python3.10/lib-dynload/../../libffi.so.8(ffi_call+0xde)[0x7f891f69706e]
/home/appuser/miniconda3/envs/deepseekv3_gpu/lib/python3.10/lib-dynload/_ctypes.cpython-310-x86_64-linux-gnu.so(+0x91e7)[0x7f891f6a81e7]
/home/appuser/miniconda3/envs/deepseekv3_gpu/lib/python3.10/lib-dynload/_ctypes.cpython-310-x86_64-linux-gnu.so(+0x1223e)[0x7f891f6b123e]
python(_PyObject_MakeTpCall+0x25b)[0x4f747b]
python(_PyEval_EvalFrameDefault+0x53e6)[0x4f3516]
python(_PyFunction_Vectorcall+0x6f)[0x4fe13f]
python(_PyObject_FastCallDictTstate+0x17d)[0x4f687d]
python[0x5075b8]
python(_PyObject_MakeTpCall+0x2ab)[0x4f74cb]
python(_PyEval_EvalFrameDefault+0x56d2)[0x4f3802]
python(_PyFunction_Vectorcall+0x6f)[0x4fe13f]
python(_PyObject_FastCallDictTstate+0x17d)[0x4f687d]
python[0x5075b8]
python(_PyObject_MakeTpCall+0x2ab)[0x4f74cb]
python(_PyEval_EvalFrameDefault+0x56d2)[0x4f3802]
python[0x5953a2]
python(PyEval_EvalCode+0x87)[0x5952e7]
python[0x5c6737]
python[0x5c1870]
python[0x459839]
python(_PyRun_SimpleFileObject+0x19f)[0x5bbdff]
python(_PyRun_AnyFileObject+0x43)[0x5bbb63]
python(Py_RunMain+0x38d)[0x5b891d]
python(Py_BytesMain+0x39)[0x5885d9]
/lib/x86_64-linux-gnu/libc.so.6(+0x29d90)[0x7f8921703d90]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x80)[0x7f8921703e40]
python[0x58848e]
Aborted (core dumped)

Main error: /home/runner/work/llama-cpp-python/llama-cpp-python/vendor/llama.cpp/src/llama.cpp:5474: GGML_ASSERT(hparams.n_expert <= LLAMA_MAX_EXPERTS) failed. waht should i do?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant