Skip to content

Commit

Permalink
fix: pynvml is pinned when using TRTLLM v13 due to breaking change in…
Browse files Browse the repository at this point in the history
… 12.0.0 (#485)

Signed-off-by: Terry Kong <[email protected]>
  • Loading branch information
terrykong authored Jan 21, 2025
1 parent 9512ee8 commit 5f4f6d6
Show file tree
Hide file tree
Showing 2 changed files with 8 additions and 1 deletion.
6 changes: 5 additions & 1 deletion Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -38,7 +38,7 @@ git pull --rebase || true
pip install --no-cache-dir --no-deps -e .
EOF

FROM ${BASE_IMAGE} as final
FROM ${BASE_IMAGE} AS final
LABEL "nemo.library"="nemo-aligner"
WORKDIR /opt
# needed in case git complains that it can't detect a valid email, this email is fake but works
Expand Down Expand Up @@ -70,6 +70,10 @@ RUN git clone https://github.com/NVIDIA/TensorRT-LLM.git && \
pip install -e .
ENV LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda-12/compat/lib.real/

# TODO: This pinning of pynvml is only needed while on TRTLLM v13 since pynvml>=11.5.0 but pynvml==12.0.0 contains a
# breaking change. The last known working verison is 11.5.3
RUN pip install pynvml==11.5.3

# install TransformerEngine
ARG MAX_JOBS
ARG TE_TAG
Expand Down
3 changes: 3 additions & 0 deletions setup/requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -3,3 +3,6 @@ jsonlines
megatron_core>=0.8
nemo_toolkit[nlp]
nvidia-pytriton
# pynvml pin is needed for TRTLLM v0.13.0 since 12.0.0 contains a breaking change.
pynvml==11.5.3
tensorrt-llm==0.13.0

0 comments on commit 5f4f6d6

Please sign in to comment.