TGI drops requests when 150 requests are sent continuously at the rate of 5 Request Per Second in AMD 8 X MI300x with Llama 3.1 405B #2635

Bihan · 2024-10-11T09:14:41Z

System Info

TGI Docker Image: ghcr.io/huggingface/text-generation-inference:sha-11d7af7-rocm
MODEL: meta-llama/Llama-3.1-405B-Instruct

Hardware used:
Intel® Xeon® Platinum 8470 2G, 52C/104T, 16GT/s, 105M Cache, Turbo, HT (350W) [x2]
AMD MI300X GPU OAM 192GB 750W GPUs [x8]
64GB RDIMM, 4800MT/s Dual Rank [x32]

Hardware provided by: hotaisle

Deployed using: dstack

Information

Docker
The CLI directly

Tasks

An officially supported command
My own modifications

Reproduction

Steps to reproduce

Using the above mentioned docker image provision the machine.
RUN text-generation-launcher --port 8000 --num-shard 8 --sharded true --max-concurrent-requests 8192 --max-total-tokens 130000 --max-input-tokens 125000
Clone the benchmarking repo to use benchmarking script by pulling the benchmarking repo.
pip install aiohttp
RUN python benchmark_serving.py --backend tgi --model meta-llama/Llama-3.1-405B-Instruct --dataset-name sonnet --sonnet-input-len 1000 --endpoint /generate_stream --dataset-path="sonnet.txt" --num-prompt=150 --request-rate=5

Expected behavior

ALL 150 requests should be successful.

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TGI drops requests when 150 requests are sent continuously at the rate of 5 Request Per Second in AMD 8 X MI300x with Llama 3.1 405B #2635

TGI drops requests when 150 requests are sent continuously at the rate of 5 Request Per Second in AMD 8 X MI300x with Llama 3.1 405B #2635

Bihan commented Oct 11, 2024

TGI drops requests when 150 requests are sent continuously at the rate of 5 Request Per Second in AMD 8 X MI300x with Llama 3.1 405B #2635

TGI drops requests when 150 requests are sent continuously at the rate of 5 Request Per Second in AMD 8 X MI300x with Llama 3.1 405B #2635

Comments

Bihan commented Oct 11, 2024

System Info

Information

Tasks

Reproduction

Expected behavior