Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] load llama model with tp > 1 #2829

Open
5 tasks done
Xu-Chen opened this issue Jan 10, 2025 · 1 comment
Open
5 tasks done

[Bug] load llama model with tp > 1 #2829

Xu-Chen opened this issue Jan 10, 2025 · 1 comment
Labels
bug Something isn't working

Comments

@Xu-Chen
Copy link
Contributor

Xu-Chen commented Jan 10, 2025

Checklist

  • 1. I have searched related issues but cannot get the expected help.
  • 2. The bug has not been fixed in the latest version.
  • 3. Please note that if the bug-related issue you submitted lacks corresponding environment info and a minimal reproducible demo, it will be challenging for us to reproduce and resolve the issue, reducing the likelihood of receiving feedback.
  • 4. If the issue you raised is not a bug but a question, please raise a discussion at https://github.com/sgl-project/sglang/discussions/new/choose Otherwise, it will be closed.
  • 5. Please use English, otherwise it will be closed.

Describe the bug

When deploying llama 3.1 70B AWQ int4 with tp=4, the following issue occurs:

RuntimeError: The size of tensor a (7168) must match the size of tensor b (28672) at non-singleton dimension 0

Reproduction

On 4 * A100

python -m sglang.launch_server --model hugging-quants/Meta-Llama-3.1-70B-Instruct-AWQ-INT4 --tp 4

Environment

Python: 3.10.13 (main, Sep 11 2023, 13:44:35) [GCC 11.2.0]
CUDA available: True
GPU 0,1,2,3: NVIDIA A100-SXM4-80GB
GPU 0,1,2,3 Compute Capability: 8.0
CUDA_HOME: /usr/local/cuda
NVCC: Cuda compilation tools, release 12.1, V12.1.105
CUDA Driver Version: 525.147.05
PyTorch: 2.5.1+cu124
sglang: 0.4.1.post4
flashinfer: 0.1.6+cu121torch2.4
triton: 3.1.0
transformers: 4.47.1
torchao: 0.7.0
numpy: 1.26.0
aiohttp: 3.9.3
fastapi: 0.115.6
hf_transfer: 0.1.9
huggingface_hub: 0.27.1
interegular: 0.3.3
modelscope: 1.22.0
orjson: 3.10.14
packaging: 24.2
psutil: 6.1.1
pydantic: 2.10.4
multipart: 0.0.20
zmq: 25.1.2
uvicorn: 0.27.0
uvloop: 0.19.0
vllm: 0.6.4.post1
openai: 1.59.4
anthropic: 0.42.0
decord: 0.6.0
NVIDIA Topology:
	GPU0	GPU1	GPU2	GPU3	NIC0	NIC1	NIC2	NIC3	NIC4	NIC5	NIC6	NIC7	NIC8CPU Affinity	NUMA Affinity
GPU0	 X 	NV12	NV12	NV12	NODE	NODE	NODE	PXB	PXB	SYS	SYS	SYS	SYS0-31,64-95	0
GPU1	NV12	 X 	NV12	NV12	NODE	NODE	NODE	PXB	PXB	SYS	SYS	SYS	SYS0-31,64-95	0
GPU2	NV12	NV12	 X 	NV12	SYS	SYS	SYS	SYS	SYS	PXB	PXB	NODE	NODE32-63,96-127	1
GPU3	NV12	NV12	NV12	 X 	SYS	SYS	SYS	SYS	SYS	PXB	PXB	NODE	NODE32-63,96-127	1
NIC0	NODE	NODE	SYS	SYS	 X 	NODE	NODE	NODE	NODE	SYS	SYS	SYS	SYS
NIC1	NODE	NODE	SYS	SYS	NODE	 X 	PIX	NODE	NODE	SYS	SYS	SYS	SYS
NIC2	NODE	NODE	SYS	SYS	NODE	PIX	 X 	NODE	NODE	SYS	SYS	SYS	SYS
NIC3	PXB	PXB	SYS	SYS	NODE	NODE	NODE	 X 	PIX	SYS	SYS	SYS	SYS
NIC4	PXB	PXB	SYS	SYS	NODE	NODE	NODE	PIX	 X 	SYS	SYS	SYS	SYS
NIC5	SYS	SYS	PXB	PXB	SYS	SYS	SYS	SYS	SYS	 X 	PIX	NODE	NODE
NIC6	SYS	SYS	PXB	PXB	SYS	SYS	SYS	SYS	SYS	PIX	 X 	NODE	NODE
NIC7	SYS	SYS	NODE	NODE	SYS	SYS	SYS	SYS	SYS	NODE	NODE	 X 	PIX
NIC8	SYS	SYS	NODE	NODE	SYS	SYS	SYS	SYS	SYS	NODE	NODE	PIX	 X

Legend:

  X    = Self
  SYS  = Connection traversing PCIe as well as the SMP interconnect between NUMA nodes (e.g., QPI/UPI)
  NODE = Connection traversing PCIe as well as the interconnect between PCIe Host Bridges within a NUMA node
  PHB  = Connection traversing PCIe as well as a PCIe Host Bridge (typically the CPU)
  PXB  = Connection traversing multiple PCIe bridges (without traversing the PCIe Host Bridge)
  PIX  = Connection traversing at most a single PCIe bridge
  NV#  = Connection traversing a bonded set of # NVLinks

NIC Legend:

  NIC0: mlx5_0
  NIC1: mlx5_1
  NIC2: mlx5_2
  NIC3: mlx5_3
  NIC4: mlx5_4
  NIC5: mlx5_5
  NIC6: mlx5_6
  NIC7: mlx5_7
  NIC8: mlx5_8


ulimit soft: 1024
@Xu-Chen Xu-Chen changed the title [Bug] load llama model with tp [Bug] load llama model with tp > 1 Jan 10, 2025
@Xu-Chen Xu-Chen closed this as completed Jan 10, 2025
@Xu-Chen Xu-Chen reopened this Jan 10, 2025
@Xu-Chen
Copy link
Contributor Author

Xu-Chen commented Jan 10, 2025

This PR may be the cause of this issue.
Sorry to bother you @merrymercy , could you please take a look at this issue?

@zhyncs zhyncs added the bug Something isn't working label Jan 10, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants