Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DCGM initialization error #910

Open
minhhoai1001 opened this issue Jul 9, 2024 · 1 comment
Open

DCGM initialization error #910

minhhoai1001 opened this issue Jul 9, 2024 · 1 comment

Comments

@minhhoai1001
Copy link

I run docker on server A100:
docker run -it --rm --gpus all --net=host
-v /var/run/docker.sock:/var/run/docker.sock
-v ${PWD}:/workspace/ --shm-size 8G
nvcr.io/nvidia/tritonserver:22.12-py3-sdk

then I run:
model-analyzer profile --model-repository /workspace/model_repository --profile-models feature_extract --triton-launch-mode=docker --triton-docker-shm-size=8G --output-model-repository-path /workspace/model_optimizer/feature_extract --export-path ./report

I got error:
[Model Analyzer] Initializing GPUDevice handles
CacheManager Init Failed. Error: -17
Traceback (most recent call last):
File "/usr/local/bin/model-analyzer", line 8, in
sys.exit(main())
File "/usr/local/lib/python3.8/dist-packages/model_analyzer/entrypoint.py", line 251, in main
gpus = GPUDeviceFactory().verify_requested_gpus(config.gpus)
File "/usr/local/lib/python3.8/dist-packages/model_analyzer/device/gpu_device_factory.py", line 36, in init
self.init_all_devices()
File "/usr/local/lib/python3.8/dist-packages/model_analyzer/device/gpu_device_factory.py", line 55, in init_all_devices
dcgm_handle = dcgm_agent.dcgmStartEmbedded(
File "/usr/local/lib/python3.8/dist-packages/model_analyzer/monitor/dcgm/dcgm_agent.py", line 41, in dcgmStartEmbedded
dcgm_structs._dcgmCheckReturn(ret)
File "/usr/local/lib/python3.8/dist-packages/model_analyzer/monitor/dcgm/dcgm_structs.py", line 646, in _dcgmCheckReturn
raise DCGMError(ret)
model_analyzer.monitor.dcgm.dcgm_structs.DCGMError_InitError: DCGM initialization error

@nv-braf
Copy link
Contributor

nv-braf commented Jul 19, 2024

I see that you are running an older version of tritonserver (22.12). Can you please update to a more recent version and see if that resolves your issue?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

2 participants