Skip to content

Commit

Permalink
Merge branch 'master' into fix/archiver_create_folder
Browse files Browse the repository at this point in the history
  • Loading branch information
agunapal authored Oct 3, 2024
2 parents 69301fb + 9d10087 commit e79e763
Show file tree
Hide file tree
Showing 7 changed files with 12 additions and 11 deletions.
8 changes: 4 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -65,10 +65,10 @@ Refer to [torchserve docker](docker/README.md) for details.
#### VLLM Engine
```bash
# Make sure to install torchserve with pip or conda as described above and login with `huggingface-cli login`
python -m ts.llm_launcher --model_id meta-llama/Meta-Llama-3.1-8B-Instruct --disable_token_auth
python -m ts.llm_launcher --model_id meta-llama/Llama-3.2-3B-Instruct --disable_token_auth

# Try it out
curl -X POST -d '{"model":"meta-llama/Meta-Llama-3.1-8B-Instruct", "prompt":"Hello, my name is", "max_tokens": 200}' --header "Content-Type: application/json" "http://localhost:8080/predictions/model/1.0/v1/completions"
curl -X POST -d '{"model":"meta-llama/Llama-3.2-3B-Instruct", "prompt":"Hello, my name is", "max_tokens": 200}' --header "Content-Type: application/json" "http://localhost:8080/predictions/model/1.0/v1/completions"
```

#### TRT-LLM Engine
Expand All @@ -85,9 +85,9 @@ curl -X POST -d '{"prompt":"count from 1 to 9 in french ", "max_tokens": 100}' -

```bash
#export token=<HUGGINGFACE_HUB_TOKEN>
docker build --pull . -f docker/Dockerfile.llm -t ts/llm
docker build --pull . -f docker/Dockerfile.vllm -t ts/vllm

docker run --rm -ti --shm-size 10g --gpus all -e HUGGING_FACE_HUB_TOKEN=$token -p 8080:8080 -v data:/data ts/llm --model_id meta-llama/Meta-Llama-3-8B-Instruct --disable_token_auth
docker run --rm -ti --shm-size 10g --gpus all -e HUGGING_FACE_HUB_TOKEN=$token -p 8080:8080 -v data:/data ts/vllm --model_id meta-llama/Meta-Llama-3-8B-Instruct --disable_token_auth

# Try it out
curl -X POST -d '{"model":"meta-llama/Meta-Llama-3-8B-Instruct", "prompt":"Hello, my name is", "max_tokens": 200}' --header "Content-Type: application/json" "http://localhost:8080/predictions/model/1.0/v1/completions"
Expand Down
File renamed without changes.
4 changes: 2 additions & 2 deletions docs/llm_deployment.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ The launcher can either be used standalone or in combination with our provided T

To launch the docker we first need to build it:
```bash
docker build . -f docker/Dockerfile.llm -t ts/llm
docker build . -f docker/Dockerfile.vllm -t ts/vllm
```

Models are usually loaded from the HuggingFace hub and are cached in a [docker volume](https://docs.docker.com/storage/volumes/) for faster reload.
Expand All @@ -22,7 +22,7 @@ export token=<HUGGINGFACE_HUB_TOKEN>

You can then go ahead and launch a TorchServe instance serving your selected model:
```bash
docker run --rm -ti --shm-size 1g --gpus all -e HUGGING_FACE_HUB_TOKEN=$token -p 8080:8080 -v data:/data ts/llm --model_id meta-llama/Meta-Llama-3-8B-Instruct --disable_token_auth
docker run --rm -ti --shm-size 1g --gpus all -e HUGGING_FACE_HUB_TOKEN=$token -p 8080:8080 -v data:/data ts/vllm --model_id meta-llama/Meta-Llama-3-8B-Instruct --disable_token_auth
```

To change the model you just need to exchange the identifier given to the `--model_id` parameter.
Expand Down
2 changes: 1 addition & 1 deletion examples/large_models/vllm/llama3/Readme.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ To leverage the power of vLLM we fist need to install it using pip in out develo
```bash
python -m pip install -r ../requirements.txt
```
For later deployments we can make vLLM part of the deployment environment by adding the requirements.txt while building the model archive in step 2 (see [here](../../../../model-archiver/README.md#model-specific-custom-python-requirements) for details) or we can make it part of a docker image like [here](../../../../docker/Dockerfile.llm).
For later deployments we can make vLLM part of the deployment environment by adding the requirements.txt while building the model archive in step 2 (see [here](../../../../model-archiver/README.md#model-specific-custom-python-requirements) for details) or we can make it part of a docker image like [here](../../../../docker/Dockerfile.vllm).

### Step 1: Download Model from HuggingFace

Expand Down
2 changes: 1 addition & 1 deletion examples/large_models/vllm/lora/Readme.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ To leverage the power of vLLM we fist need to install it using pip in out develo
```bash
python -m pip install -r ../requirements.txt
```
For later deployments we can make vLLM part of the deployment environment by adding the requirements.txt while building the model archive in step 2 (see [here](../../../../model-archiver/README.md#model-specific-custom-python-requirements) for details) or we can make it part of a docker image like [here](../../../../docker/Dockerfile.llm).
For later deployments we can make vLLM part of the deployment environment by adding the requirements.txt while building the model archive in step 2 (see [here](../../../../model-archiver/README.md#model-specific-custom-python-requirements) for details) or we can make it part of a docker image like [here](../../../../docker/Dockerfile.vllm).

### Step 1: Download Model from HuggingFace

Expand Down
2 changes: 1 addition & 1 deletion examples/large_models/vllm/mistral/Readme.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ To leverage the power of vLLM we fist need to install it using pip in out develo
```bash
python -m pip install -r ../requirements.txt
```
For later deployments we can make vLLM part of the deployment environment by adding the requirements.txt while building the model archive in step 2 (see [here](../../../../model-archiver/README.md#model-specific-custom-python-requirements) for details) or we can make it part of a docker image like [here](../../../../docker/Dockerfile.llm).
For later deployments we can make vLLM part of the deployment environment by adding the requirements.txt while building the model archive in step 2 (see [here](../../../../model-archiver/README.md#model-specific-custom-python-requirements) for details) or we can make it part of a docker image like [here](../../../../docker/Dockerfile.vllm).

### Step 1: Download Model from HuggingFace

Expand Down
5 changes: 3 additions & 2 deletions ts/llm_launcher.py
Original file line number Diff line number Diff line change
Expand Up @@ -168,8 +168,9 @@ def main(args):

model_store_path = Path(args.model_store)
model_store_path.mkdir(parents=True, exist_ok=True)
if args.engine == "trt_llm":
model_snapshot_path = download_model(args.model_id)
model_snapshot_path = (
download_model(args.model_id) if args.engine == "trt_llm" else None
)

with create_mar_file(args, model_snapshot_path):
if args.engine == "trt_llm":
Expand Down

0 comments on commit e79e763

Please sign in to comment.