Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] embedding model failed with --enable-metrics #2800

Open
5 tasks done
CatherineSue opened this issue Jan 8, 2025 · 1 comment
Open
5 tasks done

[Bug] embedding model failed with --enable-metrics #2800

CatherineSue opened this issue Jan 8, 2025 · 1 comment

Comments

@CatherineSue
Copy link
Contributor

CatherineSue commented Jan 8, 2025

Checklist

  • 1. I have searched related issues but cannot get the expected help.
  • 2. The bug has not been fixed in the latest version.
  • 3. Please note that if the bug-related issue you submitted lacks corresponding environment info and a minimal reproducible demo, it will be challenging for us to reproduce and resolve the issue, reducing the likelihood of receiving feedback.
  • 4. If the issue you raised is not a bug but a question, please raise a discussion at https://github.com/sgl-project/sglang/discussions/new/choose Otherwise, it will be closed.
  • 5. Please use English, otherwise it will be closed.

Describe the bug

When I launch the latest SGLang server with e5-mistral-7b-instruct with --enable-metrics, the server crashes.

Docker command:

docker run -itd   --gpus \"device=0\"  \
 --shm-size 10g   \
 -v /raid/models:/models   \
 --ulimit nofile=65535:65535   \
 --network host   \
 --name sglang-latest-e5-metrics-7   \
 lmsysorg/sglang:latest \
 python3 -m sglang.launch_server  \
 --model-path=intfloat/e5-mistral-7b-instruct   \
 --tp 1   --port=8080   --enable-metrics \
 --is-embedding

After the server starts, it can't receive any requests. For instance, curl http://localhost:8080/health_generate will hang forever. During debug, I discovered that the Scheduler never receives any request (this line never executes). This doesn't happen with generation models.

I also discovered that this line might crash as BatchEmbeddingOut doesn't have completion_tokens field.

Reproduction

Docker command:

docker run -itd   --gpus \"device=0\"  \
 --shm-size 10g   \
 -v /raid/models:/models   \
 --ulimit nofile=65535:65535   \
 --network host   \
 --name sglang-latest-e5-metrics-7   \
 lmsysorg/sglang:latest \
 python3 -m sglang.launch_server  \
 --model-path=intfloat/e5-mistral-7b-instruct  \
 --tp 1   --port=8080   --enable-metrics \
 --is-embedding

curl commands:
curl http://localhost:8080/health_generate

curl http://localhost:8080/v1/embeddings \      
    -H "Content-Type: application/json" \
    -d '{
        "model": "intfloat/e5-mistral-7b-instruct",
        "input": "Write a short story set in a dystopian future where artificial intelligence governs the world, and humans are trying to regain their independence. The story should explore themes of freedom, control, and the nature of consciousness."
    }'                 

Environment

Python: 3.10.16 (main, Dec  4 2024, 08:53:37) [GCC 9.4.0]
CUDA available: True
GPU 0: NVIDIA H100 80GB HBM3
GPU 0 Compute Capability: 9.0
CUDA_HOME: /usr/local/cuda
NVCC: Cuda compilation tools, release 12.4, V12.4.131
CUDA Driver Version: 560.35.03
PyTorch: 2.5.1+cu124
sglang: 0.4.1.post4
flashinfer: 0.1.6+cu124torch2.4
triton: 3.1.0
transformers: 4.47.1
torchao: 0.7.0
numpy: 1.26.4
aiohttp: 3.11.11
fastapi: 0.115.6
hf_transfer: 0.1.8
huggingface_hub: 0.27.0
interegular: 0.3.3
modelscope: 1.21.1
orjson: 3.10.13
packaging: 24.2
psutil: 6.1.1
pydantic: 2.10.4
multipart: 0.0.20
zmq: 26.2.0
uvicorn: 0.34.0
uvloop: 0.21.0
vllm: 0.6.4.post1
openai: 1.59.3
anthropic: 0.42.0
decord: 0.6.0
NVIDIA Topology: 
        GPU0    NIC0    NIC1    NIC2    NIC3    NIC4    NIC5    NIC6    NIC7    NIC8    NIC9    NIC10   NIC11   NIC12   NIC13   NIC14   NIC15   NIC16   NIC17   CPU Affinity    NUMA Affinity     GPU NUMA ID
GPU0     X      PXB     PXB     NODE    NODE    NODE    NODE    NODE    NODE    NODE    SYS     SYS     SYS     SYS     SYS     SYS     SYS     SYS     SYS     0-55,112-167    0N/A
NIC0    PXB      X      PIX     NODE    NODE    NODE    NODE    NODE    NODE    NODE    SYS     SYS     SYS     SYS     SYS     SYS     SYS     SYS     SYS
NIC1    PXB     PIX      X      NODE    NODE    NODE    NODE    NODE    NODE    NODE    SYS     SYS     SYS     SYS     SYS     SYS     SYS     SYS     SYS
NIC2    NODE    NODE    NODE     X      NODE    NODE    NODE    NODE    NODE    NODE    SYS     SYS     SYS     SYS     SYS     SYS     SYS     SYS     SYS
NIC3    NODE    NODE    NODE    NODE     X      PIX     NODE    NODE    NODE    NODE    SYS     SYS     SYS     SYS     SYS     SYS     SYS     SYS     SYS
NIC4    NODE    NODE    NODE    NODE    PIX      X      NODE    NODE    NODE    NODE    SYS     SYS     SYS     SYS     SYS     SYS     SYS     SYS     SYS
NIC5    NODE    NODE    NODE    NODE    NODE    NODE     X      PIX     NODE    NODE    SYS     SYS     SYS     SYS     SYS     SYS     SYS     SYS     SYS
NIC6    NODE    NODE    NODE    NODE    NODE    NODE    PIX      X      NODE    NODE    SYS     SYS     SYS     SYS     SYS     SYS     SYS     SYS     SYS
NIC7    NODE    NODE    NODE    NODE    NODE    NODE    NODE    NODE     X      PIX     SYS     SYS     SYS     SYS     SYS     SYS     SYS     SYS     SYS
NIC8    NODE    NODE    NODE    NODE    NODE    NODE    NODE    NODE    PIX      X      SYS     SYS     SYS     SYS     SYS     SYS     SYS     SYS     SYS
NIC9    SYS     SYS     SYS     SYS     SYS     SYS     SYS     SYS     SYS     SYS      X      PIX     NODE    NODE    NODE    NODE    NODE    NODE    NODE
NIC10   SYS     SYS     SYS     SYS     SYS     SYS     SYS     SYS     SYS     SYS     PIX      X      NODE    NODE    NODE    NODE    NODE    NODE    NODE
NIC11   SYS     SYS     SYS     SYS     SYS     SYS     SYS     SYS     SYS     SYS     NODE    NODE     X      NODE    NODE    NODE    NODE    NODE    NODE
NIC12   SYS     SYS     SYS     SYS     SYS     SYS     SYS     SYS     SYS     SYS     NODE    NODE    NODE     X      PIX     NODE    NODE    NODE    NODE
NIC13   SYS     SYS     SYS     SYS     SYS     SYS     SYS     SYS     SYS     SYS     NODE    NODE    NODE    PIX      X      NODE    NODE    NODE    NODE
NIC14   SYS     SYS     SYS     SYS     SYS     SYS     SYS     SYS     SYS     SYS     NODE    NODE    NODE    NODE    NODE     X      PIX     NODE    NODE
NIC15   SYS     SYS     SYS     SYS     SYS     SYS     SYS     SYS     SYS     SYS     NODE    NODE    NODE    NODE    NODE    PIX      X      NODE    NODE
NIC16   SYS     SYS     SYS     SYS     SYS     SYS     SYS     SYS     SYS     SYS     NODE    NODE    NODE    NODE    NODE    NODE    NODE     X      PIX
NIC17   SYS     SYS     SYS     SYS     SYS     SYS     SYS     SYS     SYS     SYS     NODE    NODE    NODE    NODE    NODE    NODE    NODE    PIX      X 

Legend:

  X    = Self
  SYS  = Connection traversing PCIe as well as the SMP interconnect between NUMA nodes (e.g., QPI/UPI)
  NODE = Connection traversing PCIe as well as the interconnect between PCIe Host Bridges within a NUMA node
  PHB  = Connection traversing PCIe as well as a PCIe Host Bridge (typically the CPU)
  PXB  = Connection traversing multiple PCIe bridges (without traversing the PCIe Host Bridge)
  PIX  = Connection traversing at most a single PCIe bridge
  NV#  = Connection traversing a bonded set of # NVLinks

NIC Legend:

  NIC0: mlx5_0
  NIC1: mlx5_1
  NIC2: mlx5_2
  NIC3: mlx5_3
  NIC4: mlx5_4
  NIC5: mlx5_5
  NIC6: mlx5_6
  NIC7: mlx5_7
  NIC8: mlx5_8
  NIC9: mlx5_9
  NIC10: mlx5_10
  NIC11: mlx5_11
  NIC12: mlx5_12
  NIC13: mlx5_13
  NIC14: mlx5_14
  NIC15: mlx5_15
  NIC16: mlx5_16
  NIC17: mlx5_17


ulimit soft: 65535
@CatherineSue
Copy link
Contributor Author

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant