Add vLLM support to DocSum Helm chart #649

eero-t · 2024-12-18T18:57:33Z

Description

This was split from Helm vLLM support added in #610. It adds vLLM support to DocSum Helm chart.

(Similarly to how it's already done for ChatQnA app + Agent component, there are tgi.enabled & vllm.enabled flags for selecting which LLM will be used.)

Type of change

New feature (non-breaking change which adds new functionality)

Dependencies

opea/llm-docsum-vllm:latest image is currently missing from CI & DockerHub registries:
opea-project/GenAIComps#961

(Although corresponding opea/llm-docsum-tgi:latest image for TGI, and opea/llm-vllm:latest vLLM text-generation images already exist.)

Tests

Manual testing with opea/llm-docsum-vllm:latest image built locally.

eero-t · 2024-12-18T18:58:34Z

Setting as draft because the required image is still missing from DockerHub, and this needs retesting after currently pending DocSum changes for Comps & Examples repos have completed.

eero-t · 2024-12-20T15:55:37Z

While CI "docsum, gaudi, ci-gaudi-vllm-values" test fails as expected, due to OPEA missing llm-docsum-vllm image...

There seems to be a bug in component unrelated to this PR, as also run "llm-uservice, xeon, ci-faqgen-values, common" CI test fails to a package missing from image:

[pod/llm-uservice20241218190439-5b9b7b79fd-r65l9/llm-uservice20241218190439]
...
   File "/home/user/comps/llms/faq-generation/tgi/langchain/llm.py", line 77, in stream_generator
     from langserve.serialization import WellKnownLCSerializer
   File "/home/user/.local/lib/python3.11/site-packages/langserve/__init__.py", line 8, in <module>
     from langserve.client import RemoteRunnable
   File "/home/user/.local/lib/python3.11/site-packages/langserve/client.py", line 24, in <module>
     from httpx._types import AuthTypes, CertTypes, CookieTypes, HeaderTypes, VerifyTypes
 ImportError: cannot import name 'VerifyTypes' from 'httpx._types' (/home/user/.local/lib/python3.11/site-packages/httpx/_types.py)

=> requirements.txt for llm-faqgen-tgi:latest image generation is not up to date in Comps repo?

@lianhao?

eero-t · 2024-12-30T12:57:10Z

Rebased to main + dropped "draft" status, as the required OPEA image is now available in DockerHub!

eero-t · 2024-12-30T13:27:25Z

CI still fails.

PR #659 fixes DocSum issues with updates in other repos, includes the same model ID workaround as this one, and passed CI => better to merge that first & rebase this?

"docsum, gaudi, ci-gaudi-vllm-values" fails because CI registry is out of date. Although required image has been at DockerHub for 4 days [1], fetching it still fails:
Normal BackOff 4m47s (x18 over 9m42s) kubelet Back-off pulling image "100.83.111.229:5000/opea/llm-docsum-vllm:latest"
[1] https://hub.docker.com/r/opea/llm-docsum-vllm/tags

"docsum, xeon, ci-values" fails to connection failure:

[pod/docsum20241230125518-5599c984c6-dt5fc/docsum20241230125518] aiohttp.client_exceptions.ClientConnectorError: Cannot connect to host 0.0.0.0:7066 ssl:default [Connect call failed ('0.0.0.0', 7066)]
...
testpod: Response check failed, please check the logs in artifacts!

"docsum, gaudi, ci-gaudi-tgi-values" fails to similar CI issue:

[pod/docsum20241230130614-d598c6674-qqhjf/docsum20241230130614] aiohttp.client_exceptions.ClientConnectorError: Cannot connect to host 0.0.0.0:7066 ssl:default [Connect call failed ('0.0.0.0', 7066)]
...
testpod: Response check failed, please check the logs in artifacts!

eero-t · 2025-01-02T13:39:04Z

Rebased to main to get CI tests passing, and dropped already merged fix. However, CI is still broken.

@daisy-ycguo CI "docsum, gaudi, ci-gaudi-vllm-values" still fails to CI registry not being up to date with DockerHub: https://hub.docker.com/r/opea/llm-docsum-vllm/tags
Failed to pull image "100.83.111.229:5000/opea/llm-docsum-vllm:latest": rpc error: code = NotFound desc = failed to pull and unpack image "100.83.111.229:5000/opea/llm-docsum-vllm:latest": failed to resolve reference "100.83.111.229:5000/opea/llm-docsum-vllm:latest": 100.83.111.229:5000/opea/llm-docsum-vllm:latest: not found

@lianhao, CI "docsum, gaudi, ci-gaudi-tgi-values" test fails now to test bug?
[pod/docsum20250102125737-llm-uservice-7d6f8d968f-v9b4w/docsum20250102125737] | huggingface_hub.errors.ValidationError: Input validation error: 'inputs' tokens + 'max_new_tokens' must be <= 4096. Given: 4095 'inputs' tokens and 17 'max_new_tokens'

lianhao

@eero-t I found pending PR opea-project/GenAIComps#1101 will make a big change, there will be no more llm-docsum-llm image any more, a single llm-docsum image will be able to talk to both tgi and vllm, so maybe we should wait until that PR to be merged first.

helm-charts/docsum/values.yaml

eero-t · 2025-01-07T13:43:42Z

@eero-t I found pending PR opea-project/GenAIComps#1101 will make a big change, there will be no more llm-docsum-llm image any more, a single llm-docsum image will be able to talk to both tgi and vllm, so maybe we should wait until that PR to be merged first.

Good to know, if CI registry continues missing the DockerHub image, that indeed should finally fix it.

eero-t · 2025-01-14T12:16:02Z

Rebased to main, squashed "rebase fix" commit, and added commit updating llm-docsum-* image names according to: opea-project/GenAIComps#1101

eero-t · 2025-01-14T12:34:13Z

Lot of CI failures, so of them due to incomplete update to DocSum refactor, some due to issues outside of this PR...

Doc building CI test fails to bugs in doc scripts:

/home/runner/work/GenAIInfra/GenAIInfra/docs/scripts/filter-known-issues.py:47: SyntaxWarning: invalid escape sequence '\s'
  b"(?P<comment>(^\s*#.*\n)+)" \
/home/runner/work/GenAIInfra/GenAIInfra/docs/scripts/filter-known-issues.py:91: SyntaxWarning: invalid escape sequence '\.'
  file_regex = re.compile(".*\.conf$")
WARNING: The config value `myst_enable_extensions' has type `list'; expected `Any'.
WARNING: The config value `myst_fence_as_directive' has type `list'; expected `Any'.
make: *** [Makefile:65: html] Error 2

"docsum, gaudi, ci-gaudi-tgi-values":

Response check failed, please check the logs in artifacts!
Error: Process completed with exit code 1.

"Xeon / go-e2e" CI test fails to image pull failure:

   Normal   BackOff    3m42s (x7 over 8m55s)  kubelet            Back-off pulling image "100.80.243.74:5000/opea/llm-tgi:latest"
  Warning  Failed     3m42s (x7 over 8m55s)  kubelet            Error: ImagePullBackOff
Pod llm-svc-deployment-677bf47dfb-qtt22 logs:
Error from server (BadRequest): container "llm-uservice" in pod "llm-svc-deployment-677bf47dfb-qtt22" is waiting to start: trying and failing to pull image

"docsum, xeon, ci-values":

Response check failed, please check the logs in artifacts!
Error: Process completed with exit code 1.

"docsum, gaudi, ci-gaudi-vllm-values" probably fails due to missing DocSum_COMPONENT_NAME env var.

"llm-uservice, xeon, ci-docsum-values, common":

   File "/home/user/.local/lib/python3.11/site-packages/huggingface_hub/file_download.py", line 302, in _request_wrapper
     hf_raise_for_status(response)
   File "/home/user/.local/lib/python3.11/site-packages/huggingface_hub/utils/_http.py", line 454, in hf_raise_for_status
     raise _format(RepositoryNotFoundError, message, response) from e
huggingface_hub.errors.RepositoryNotFoundError: 401 Client Error. (Request ID: Root=1-6786554f-67e5da2940e1acc410ccee05;0abe054b-411e-4108-bb1c-f1d3dba03521)
 
 Repository Not Found for url: https://huggingface.co/None/resolve/main/tokenizer_config.json.
...
Traceback (most recent call last):
   File "/home/user/comps/llms/src/doc-summarization/opea_docsum_microservice.py", line 26, in <module>
     loader = OpeaComponentLoader(llm_component_name, description=f"OPEA LLM DocSum Component: {llm_component_name}")
...
   File "/home/user/.local/lib/python3.11/site-packages/transformers/utils/hub.py", line 426, in cached_file
     raise EnvironmentError(
 OSError: None is not a local folder and is not a valid model identifier listed on 'https://huggingface.co/models'

etc.

eero-t · 2025-01-14T12:39:45Z

Looking at the related refactor tickets:

Potentially needed:

export LLM_ENDPOINT_PORT=8008
export DOCSUM_PORT=9000
export LLM_ENDPOINT="http://${host_ip}:${LLM_ENDPOINT_PORT}"
export DocSum_COMPONENT_NAME="OPEADocSum_TGI"  # or _vLLM

And possibly also increase max tokens from 1024/2048 to 2048/4096.

lianhao · 2025-01-14T12:42:51Z

@eero-t you may need to wait until PR #696 get landin first

yongfengdu · 2025-01-15T13:07:45Z

llm-uservice CI issue should be fixed by #696
Everything should be fine after you rebased with the latest code

poussa · 2025-01-15T15:03:31Z

helm-charts/docsum/README.md

-# To use Gaudi device
-# helm install docsum docsum --set global.HUGGINGFACEHUB_API_TOKEN=${HFTOKEN} --values docsum/gaudi-values.yaml
+# To use Gaudi device with TGI
+# helm install docsum docsum --set global.HUGGINGFACEHUB_API_TOKEN=${HFTOKEN} --values docsum/gaudi-tgi-values.yaml ...


why have the command in comments? Remove the # for the actual commands.

This is consistent with how all the application READMEs are indicating Helm invocation alternatives. I guess it's to avoid user accidentally copy pasting them.

Another reason why it's commented here, is because it's not a complete command (notice ... at the end).

poussa · 2025-01-15T15:03:41Z

helm-charts/docsum/README.md

+# To use Gaudi device with TGI
+# helm install docsum docsum --set global.HUGGINGFACEHUB_API_TOKEN=${HFTOKEN} --values docsum/gaudi-tgi-values.yaml ...
+# To use Gaudi device with vLLM
+# helm install docsum docsum --set global.HUGGINGFACEHUB_API_TOKEN=${HFTOKEN} --values docsum/gaudi-vllm-values.yaml ..


For consistency, your change should be done for all READMEs in this repo, not just this particular one => IMHO it's out of scope for this PR.

Agree that READMEs need to be updated together.

Signed-off-by: Eero Tamminen <[email protected]>

eero-t · 2025-01-15T17:05:24Z

Rebased to main, fixed llm-uservice image name (missed in #696), added new DOCSUM_BACKEND values for llm-uservice, and dropped redundant MAX_* items for vllm subchart.

eero-t · 2025-01-15T17:26:48Z

"docsum, gaudi, ci-gaudi-vllm-values" CI test is failing to unspecified error:

 [pod/docsum20250115170104-vllm-74fd965586-mnfvc/vllm] INFO 01-15 17:11:26 hpu_model_runner.py:699] Pre-loading model weights on hpu:0 took 13.52 GiB of device memory (13.52 GiB/94.62 GiB used) and 864.7 MiB of host memory (81.43 GiB/1007 GiB used)
-----------------------------------
+ exit 1
Error: Process completed with exit code 1.

I suspect that test timeout (5min?) is not long enough, because Gaudi vLLM startup / warmup takes much longer (7min) than Gaudi TGI.

I think mapping vLLM cache to persistent storage, shared with other vLLM instances on same node, its startup time could be significantly reduced. However, that does not help with first/cold start, for those the test timeout just needs to increased. @lianhao ?

EDIT: or maybe failure is due to vLLM crash (opea-project/GenAIComps#1038)?

lianhao · 2025-01-16T00:42:31Z

"docsum, gaudi, ci-gaudi-vllm-values" CI test is failing to unspecified error:
 [pod/docsum20250115170104-vllm-74fd965586-mnfvc/vllm] INFO 01-15 17:11:26 hpu_model_runner.py:699] Pre-loading model weights on hpu:0 took 13.52 GiB of device memory (13.52 GiB/94.62 GiB used) and 864.7 MiB of host memory (81.43 GiB/1007 GiB used)
-----------------------------------
+ exit 1
Error: Process completed with exit code 1.
I suspect that test timeout (5min?) is not long enough, because Gaudi vLLM startup / warmup takes much longer (7min) than Gaudi TGI.

I think mapping vLLM cache to persistent storage, shared with other vLLM instances on same node, its startup time could be significantly reduced. However, that does not help with first/cold start, for those the test timeout just needs to increased. @lianhao ?

EDIT: or maybe failure is due to vLLM crash (opea-project/GenAIComps#1038)?

I also suspect this failure is due to probe timeout. Because vllm-gaudi CI in llm-uservice chart run against this configuration file is fine.

yongfengdu · 2025-01-16T02:30:42Z

It's not timeout issue, the vllm pod restarted several times because of runtime error.
Looks like the vllm-gaudi:latest image is having trouble running intel-neural-chat, not sure it's a environment issue(CI server's gaudi SW version is 1.19), or configuration issue

yongfengdu · 2025-01-16T03:46:38Z

The latest opea/vllm-gaudi:latest image has problem and has reported this issue to CI team.
We tried the image built before yesterday and it passed.
Merge this first not waiting opea image fix. Other PRs will need to rebase and continue.

lianhao

Confirmed the vllm-gaudi issue is related to latest opea/vllm-gaudi:latest image itself.

eero-t requested review from yongfengdu and lianhao as code owners December 18, 2024 18:57

eero-t marked this pull request as draft December 18, 2024 18:58

eero-t mentioned this pull request Dec 30, 2024

llm-vllm: unify environment variable LLM_MODEL_ID opea-project/GenAIComps#1089

Closed

eero-t force-pushed the docsum-vllm branch from aa4e01a to 613c33b Compare December 30, 2024 12:54

eero-t marked this pull request as ready for review December 30, 2024 12:54

eero-t force-pushed the docsum-vllm branch from 613c33b to 7491ab7 Compare January 2, 2025 12:57

lianhao requested changes Jan 3, 2025

View reviewed changes

helm-charts/docsum/values.yaml Outdated Show resolved Hide resolved

eero-t force-pushed the docsum-vllm branch from 7491ab7 to fe38569 Compare January 7, 2025 13:39

eero-t force-pushed the docsum-vllm branch from fe38569 to 11982c5 Compare January 14, 2025 12:03

poussa reviewed Jan 15, 2025

View reviewed changes

Add vLLM support for DocSum

9f6617a

Signed-off-by: Eero Tamminen <[email protected]>

eero-t force-pushed the docsum-vllm branch from 11982c5 to 9f6617a Compare January 15, 2025 16:58

eero-t mentioned this pull request Jan 15, 2025

Adapt to latest changes in llm microservice famliy #696

Merged

1 task

yongfengdu approved these changes Jan 16, 2025

View reviewed changes

lianhao approved these changes Jan 16, 2025

View reviewed changes

yongfengdu merged commit 0943764 into opea-project:main Jan 16, 2025
11 of 12 checks passed

lianhao mentioned this pull request Jan 16, 2025

[ci-auto] GenAIExample DocSum compose.yaml got changed. #692

Closed

eero-t mentioned this pull request Jan 16, 2025

[Feature] Helm Charts for Txt2Img and SearchQnA. #596

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add vLLM support to DocSum Helm chart #649

Add vLLM support to DocSum Helm chart #649

eero-t commented Dec 18, 2024 •

edited

Loading

eero-t commented Dec 18, 2024 •

edited

Loading

eero-t commented Dec 20, 2024

eero-t commented Dec 30, 2024

eero-t commented Dec 30, 2024

eero-t commented Jan 2, 2025

lianhao left a comment •

edited

Loading

eero-t commented Jan 7, 2025

eero-t commented Jan 14, 2025

eero-t commented Jan 14, 2025

eero-t commented Jan 14, 2025

lianhao commented Jan 14, 2025

yongfengdu commented Jan 15, 2025

poussa Jan 15, 2025

eero-t Jan 15, 2025 •

edited

Loading

poussa Jan 15, 2025

eero-t Jan 15, 2025

yongfengdu Jan 16, 2025

eero-t commented Jan 15, 2025 •

edited

Loading

eero-t commented Jan 15, 2025 •

edited

Loading

lianhao commented Jan 16, 2025

yongfengdu commented Jan 16, 2025

yongfengdu commented Jan 16, 2025

lianhao left a comment

Add vLLM support to DocSum Helm chart #649

Add vLLM support to DocSum Helm chart #649

Conversation

eero-t commented Dec 18, 2024 • edited Loading

Description

Type of change

Dependencies

Tests

eero-t commented Dec 18, 2024 • edited Loading

eero-t commented Dec 20, 2024

eero-t commented Dec 30, 2024

eero-t commented Dec 30, 2024

eero-t commented Jan 2, 2025

lianhao left a comment • edited Loading

Choose a reason for hiding this comment

eero-t commented Jan 7, 2025

eero-t commented Jan 14, 2025

eero-t commented Jan 14, 2025

eero-t commented Jan 14, 2025

lianhao commented Jan 14, 2025

yongfengdu commented Jan 15, 2025

poussa Jan 15, 2025

Choose a reason for hiding this comment

eero-t Jan 15, 2025 • edited Loading

Choose a reason for hiding this comment

poussa Jan 15, 2025

Choose a reason for hiding this comment

eero-t Jan 15, 2025

Choose a reason for hiding this comment

yongfengdu Jan 16, 2025

Choose a reason for hiding this comment

eero-t commented Jan 15, 2025 • edited Loading

eero-t commented Jan 15, 2025 • edited Loading

lianhao commented Jan 16, 2025

yongfengdu commented Jan 16, 2025

yongfengdu commented Jan 16, 2025

lianhao left a comment

Choose a reason for hiding this comment

eero-t commented Dec 18, 2024 •

edited

Loading

eero-t commented Dec 18, 2024 •

edited

Loading

lianhao left a comment •

edited

Loading

eero-t Jan 15, 2025 •

edited

Loading

eero-t commented Jan 15, 2025 •

edited

Loading

eero-t commented Jan 15, 2025 •

edited

Loading