-
Notifications
You must be signed in to change notification settings - Fork 63
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add vLLM support to DocSum Helm chart #649
Conversation
Setting as draft because the required image is still missing from DockerHub, and this needs retesting after currently pending DocSum changes for Comps & Examples repos have completed. |
While CI "docsum, gaudi, ci-gaudi-vllm-values" test fails as expected, due to OPEA missing There seems to be a bug in component unrelated to this PR, as also run "llm-uservice, xeon, ci-faqgen-values, common" CI test fails to a package missing from image:
=> |
Rebased to |
CI still fails. PR #659 fixes DocSum issues with updates in other repos, includes the same model ID workaround as this one, and passed CI => better to merge that first & rebase this? "docsum, gaudi, ci-gaudi-vllm-values" fails because CI registry is out of date. Although required image has been at DockerHub for 4 days [1], fetching it still fails: "docsum, xeon, ci-values" fails to connection failure:
"docsum, gaudi, ci-gaudi-tgi-values" fails to similar CI issue:
|
Rebased to @daisy-ycguo CI "docsum, gaudi, ci-gaudi-vllm-values" still fails to CI registry not being up to date with DockerHub: https://hub.docker.com/r/opea/llm-docsum-vllm/tags @lianhao, CI "docsum, gaudi, ci-gaudi-tgi-values" test fails now to test bug? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@eero-t I found pending PR opea-project/GenAIComps#1101 will make a big change, there will be no more llm-docsum-llm image any more, a single llm-docsum image will be able to talk to both tgi and vllm, so maybe we should wait until that PR to be merged first.
Good to know, if CI registry continues missing the DockerHub image, that indeed should finally fix it. |
Rebased to |
Lot of CI failures, so of them due to incomplete update to DocSum refactor, some due to issues outside of this PR... Doc building CI test fails to bugs in doc scripts:
"docsum, gaudi, ci-gaudi-tgi-values":
"Xeon / go-e2e" CI test fails to image pull failure:
"docsum, xeon, ci-values":
"docsum, gaudi, ci-gaudi-vllm-values" probably fails due to missing "llm-uservice, xeon, ci-docsum-values, common":
etc. |
Looking at the related refactor tickets:
Potentially needed:
And possibly also increase max tokens from 1024/2048 to 2048/4096. |
llm-uservice CI issue should be fixed by #696 |
# To use Gaudi device | ||
# helm install docsum docsum --set global.HUGGINGFACEHUB_API_TOKEN=${HFTOKEN} --values docsum/gaudi-values.yaml | ||
# To use Gaudi device with TGI | ||
# helm install docsum docsum --set global.HUGGINGFACEHUB_API_TOKEN=${HFTOKEN} --values docsum/gaudi-tgi-values.yaml ... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why have the command in comments? Remove the # for the actual commands.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is consistent with how all the application READMEs are indicating Helm invocation alternatives. I guess it's to avoid user accidentally copy pasting them.
Another reason why it's commented here, is because it's not a complete command (notice ...
at the end).
# To use Gaudi device with TGI | ||
# helm install docsum docsum --set global.HUGGINGFACEHUB_API_TOKEN=${HFTOKEN} --values docsum/gaudi-tgi-values.yaml ... | ||
# To use Gaudi device with vLLM | ||
# helm install docsum docsum --set global.HUGGINGFACEHUB_API_TOKEN=${HFTOKEN} --values docsum/gaudi-vllm-values.yaml .. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For consistency, your change should be done for all READMEs in this repo, not just this particular one => IMHO it's out of scope for this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree that READMEs need to be updated together.
Signed-off-by: Eero Tamminen <[email protected]>
Rebased to |
"docsum, gaudi, ci-gaudi-vllm-values" CI test is failing to unspecified error:
I suspect that test timeout (5min?) is not long enough, because Gaudi vLLM startup / warmup takes much longer (7min) than Gaudi TGI. I think mapping vLLM cache to persistent storage, shared with other vLLM instances on same node, its startup time could be significantly reduced. However, that does not help with first/cold start, for those the test timeout just needs to increased. @lianhao ? EDIT: or maybe failure is due to vLLM crash (opea-project/GenAIComps#1038)? |
I also suspect this failure is due to probe timeout. Because vllm-gaudi CI in llm-uservice chart run against this configuration file is fine. |
It's not timeout issue, the vllm pod restarted several times because of runtime error. |
The latest opea/vllm-gaudi:latest image has problem and has reported this issue to CI team. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Confirmed the vllm-gaudi issue is related to latest opea/vllm-gaudi:latest image itself.
Description
This was split from Helm vLLM support added in #610. It adds vLLM support to DocSum Helm chart.
(Similarly to how it's already done for ChatQnA app + Agent component, there are
tgi.enabled
&vllm.enabled
flags for selecting which LLM will be used.)Type of change
Dependencies
opea/llm-docsum-vllm:latest
image is currently missing from CI & DockerHub registries:opea-project/GenAIComps#961
(Although corresponding
opea/llm-docsum-tgi:latest
image for TGI, andopea/llm-vllm:latest
vLLM text-generation images already exist.)Tests
Manual testing with
opea/llm-docsum-vllm:latest
image built locally.