-
Notifications
You must be signed in to change notification settings - Fork 685
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Benchmark] sglang successful requests issue (may related to env) #2805
Comments
python3 -m sglang.check_env
Legend: X = Self NIC Legend: NIC0: mlx5_0 ulimit soft: 65536 |
sglang,0.4.0.post2
python -m sglang.launch_server --model-path /mnt/home/Llama-3.1-8B-Instruct --enable-torch-compile --disable-radix-cache
python3 -m sglang.bench_serving --backend sglang --dataset-name sharegpt --dataset-path /mnt/home/ShareGPT_V3_unfiltered_cleaned_split.json --num-prompts 3000 --output-file /mnt/home/offline_sglang.jsonl
============ Serving Benchmark Result ============
Backend: sglang
Traffic request rate: inf
Max reqeuest concurrency: not set
Successful requests: 1648
Benchmark duration (s): 170.41
Total input tokens: 369103
Total generated tokens: 326408
Total generated tokens (retokenized): 326356
Request throughput (req/s): 9.67
Input token throughput (tok/s): 2165.94
Output token throughput (tok/s): 1915.40
Total token throughput (tok/s): 4081.34
----------------End-to-End Latency----------------
Mean E2E Latency (ms): 80126.93
Median E2E Latency (ms): 81160.44
---------------Time to First Token----------------
Mean TTFT (ms): 44294.80
Median TTFT (ms): 31463.00
P99 TTFT (ms): 106154.19
-----Time per Output Token (excl. 1st token)------
Mean TPOT (ms): 289.75
Median TPOT (ms): 208.83
P99 TPOT (ms): 1533.27
---------------Inter-token Latency----------------
Mean ITL (ms): 182.46
Median ITL (ms): 145.49
P99 ITL (ms): 562.80
vllm,vllm0.6.3.post1
python -m vllm.entrypoints.openai.api_server --model /mnt/home/Llama-3.1-8B-Instruct --disable-log-requests
python3 bench_serving.py --backend vllm --dataset-name sharegpt --dataset-path /mnt/home/ShareGPT_V3_unfiltered_cleaned_split.json --num-prompts 3000 --output-file /mnt/home/offline_vllm.jsonl
============ Serving Benchmark Result ============
Backend: vllm
Traffic request rate: inf
Max reqeuest concurrency: not set
Successful requests: 2947
Benchmark duration (s): 334.35
Total input tokens: 660878
Total generated tokens: 572708
Total generated tokens (retokenized): 572537
Request throughput (req/s): 8.81
Input token throughput (tok/s): 1976.62
Output token throughput (tok/s): 1712.91
Total token throughput (tok/s): 3689.54
----------------End-to-End Latency----------------
Mean E2E Latency (ms): 152238.67
Median E2E Latency (ms): 151892.38
---------------Time to First Token----------------
Mean TTFT (ms): 130851.32
Median TTFT (ms): 126929.85
P99 TTFT (ms): 270278.77
-----Time per Output Token (excl. 1st token)------
Mean TPOT (ms): 112.41
Median TPOT (ms): 115.50
P99 TPOT (ms): 145.59
---------------Inter-token Latency----------------
Mean ITL (ms): 110.68
Median ITL (ms): 112.92
P99 ITL (ms): 493.36
The text was updated successfully, but these errors were encountered: