Skip to content

Latest commit

 

History

History
167 lines (127 loc) · 5.97 KB

bls_quick_start.md

File metadata and controls

167 lines (127 loc) · 5.97 KB

BLS Model Quick Start

The steps below will guide you through using Model Analyzer in Docker mode to profile and analyze a simple BLS model: bls.

Step 1: Download the BLS model bls and composing model add


1. Create a new directory and enter it

mkdir <new_dir> && cd <new_dir>

2. Start a git repository

git init && git remote add -f origin https://github.com/triton-inference-server/model_analyzer.git

3. Enable sparse checkout, and download the examples directory, which contains the bls and add models

git config core.sparseCheckout true && \
echo 'examples' >> .git/info/sparse-checkout && \
git pull origin main

Step 2: Pull and Run the SDK Container


1. Pull the SDK container:

docker pull nvcr.io/nvidia/tritonserver:24.12-py3-sdk

2. Run the SDK container

docker run -it --gpus 1 \
      --shm-size 2G \
      -v /var/run/docker.sock:/var/run/docker.sock \
      -v $(pwd)/examples/quick-start:$(pwd)/examples/quick-start \
      --net=host nvcr.io/nvidia/tritonserver:24.12-py3-sdk

Important: The example above uses a single GPU. If you are running on multiple GPUs, you may need to increase the shared memory size accordingly

Step 3: Profile the bls model


The examples/quick-start directory is an example Triton Model Repository that contains the BLS model bls which calculates the sum of two inputs using add model.

An example model analyzer YAML config that performs a BLS model search

model_repository: <path-to-examples-quick-start>
profile_models:
  - bls
bls_composing_models: add
perf_analyzer_flags:
  input-data: <path-to-examples-quick-start>/bls_input_data.json
triton_launch_mode: docker
triton_docker_shm_size: 2G
output_model_repository_path: <path-to-output-model-repo>/<output_dir>
export_path: profile_results

Important: You must specify an <output_dir> subdirectory. You cannot have output_model_repository_path point directly to <path-to-output-model-repo>

Important: If you already ran this earlier in the container, you can overwrite earlier results by adding the override_output_model_repository: true field to the YAML file.

Important: All models must be in the same repository

Important: bls model takes "MODEL_NAME" as one of its inputs. We must include "add" in the input data JSON file as "MODEL_NAME" for this example to function. Otherwise, Perf Analyzer will produce random data for "MODEL_NAME," resulting in failed inferences.

Run the Model Analyzer profile subcommand inside the container with:

model-analyzer profile -f /path/to/config.yml

The Model analyzer uses Quick Search algorithm for profiling the BLS model. After the quick search is completed, Model Analyzer will then sweep concurrencies for the top three configurations and then create a summary report and CSV outputs.

Here is an example result summary, run on a Tesla V100 GPU:

Result Summary Top Result Summary Table

You will note that the top model configuration has a higher throughput than the other configurations.


The measured data and summary report will be placed inside the ./profile_results directory. The directory will be structured as follows.

$HOME
|-- model_analyzer
    |-- profile_results
        |-- perf_analyzer_error.log
        |-- plots
        |   |-- detailed
        |   |   |-- bls_config_7
        |   |   |   `-- latency_breakdown.png
        |   |   |-- bls_config_8
        |   |   |   `-- latency_breakdown.png
        |   |   `-- bls_config_9
        |   |       `-- latency_breakdown.png
        |   `-- simple
        |       |-- bls
        |       |   |-- gpu_mem_v_latency.png
        |       |   `-- throughput_v_latency.png
        |       |-- bls_config_7
        |       |   |-- cpu_mem_v_latency.png
        |       |   |-- gpu_mem_v_latency.png
        |       |   |-- gpu_power_v_latency.png
        |       |   `-- gpu_util_v_latency.png
        |       |-- bls_config_8
        |       |   |-- cpu_mem_v_latency.png
        |       |   |-- gpu_mem_v_latency.png
        |       |   |-- gpu_power_v_latency.png
        |       |   `-- gpu_util_v_latency.png
        |       `-- bls_config_9
        |           |-- cpu_mem_v_latency.png
        |           |-- gpu_mem_v_latency.png
        |           |-- gpu_power_v_latency.png
        |           `-- gpu_util_v_latency.png
        |-- reports
        |   |-- detailed
        |   |   |-- bls_config_7
        |   |   |   `-- detailed_report.pdf
        |   |   |-- bls_config_8
        |   |   |   `-- detailed_report.pdf
        |   |   `-- bls_config_9
        |   |       `-- detailed_report.pdf
        |   `-- summaries
        |       `-- bls
        |           `-- result_summary.pdf
        `-- results
            |-- metrics-model-gpu.csv
            |-- metrics-model-inference.csv
            `-- metrics-server-only.csv

Note: Above configurations, bls_config_7, bls_config_8, and bls_config_9 are generated as the top configurations when running profiling on a single Tesla V100 GPU. However, running on multiple GPUs or different model GPUs may result in different top configurations.