Model Analyzer GPU Memory Usage Differences #847

KimiJL · 2024-03-25T20:59:15Z

Version: nvcr.io/nvidia/tritonserver:24.01-py3-sdk

For a profiled model, the GPU Memory Usage (MB) shown in results/metrics-model-gpu.csv is different from model result_summary.pdf.

In my case, metrics-model-gpu.csv shows 1592.8 while the pdf report shows 1031.

Could be my misunderstanding, do these two metrics represent the same thing? I am looking for the maximum GPU usage for a given model, so which would be the more accurate result?

KimiJL · 2024-03-25T21:04:05Z

Additional Context:

I am using an instance with two GPUs, though the model is limited to a single instance.

I have noticed that if I added up the GPU memory of both GPUs from csv, then divide by 2, I (470.8 + 1592.8) / 2 = 1031.8, i'm getting near the pdf result. Could be a coincidence?

tgerdesnv · 2024-04-18T16:41:34Z

Hi @KimiJL, sorry for the slow response. I just returned from vacation.

I suspect that your observation is not a coincidence and that there is a bug. We will have to investigate further.

May I ask, were you running in local mode? Or docker or remote?

KimiJL · 2024-04-18T17:28:37Z

Hi @tgerdesnv thanks for the response,

I was running in in --triton-launch-mode=docker

tgerdesnv · 2024-04-24T17:56:01Z

@KimiJL I have confirmed that the values in the pdfs are in fact the averages across the GPUs. The values in metrics-model-gpu.csv are the raw values per-gpu. So, in your case, the total maximum memory usage by the model on your machine would be 470.8 + 1592.8

I will fix Model Analyzer to show total memory usage, or clarify the labels to indicate that it is average memory usage.

KimiJL · 2024-04-24T18:19:01Z

@tgerdesnv great, thank you for the clarification, that makes sense!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Model Analyzer GPU Memory Usage Differences #847

Model Analyzer GPU Memory Usage Differences #847

KimiJL commented Mar 25, 2024

KimiJL commented Mar 25, 2024

tgerdesnv commented Apr 18, 2024

KimiJL commented Apr 18, 2024

tgerdesnv commented Apr 24, 2024

KimiJL commented Apr 24, 2024

Model Analyzer GPU Memory Usage Differences #847

Model Analyzer GPU Memory Usage Differences #847

Comments

KimiJL commented Mar 25, 2024

KimiJL commented Mar 25, 2024

tgerdesnv commented Apr 18, 2024

KimiJL commented Apr 18, 2024

tgerdesnv commented Apr 24, 2024

KimiJL commented Apr 24, 2024