Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add vLLM support to DocSum Helm chart #649

Merged
merged 1 commit into from
Jan 16, 2025

Conversation

eero-t
Copy link
Contributor

@eero-t eero-t commented Dec 18, 2024

Description

This was split from Helm vLLM support added in #610. It adds vLLM support to DocSum Helm chart.

(Similarly to how it's already done for ChatQnA app + Agent component, there are tgi.enabled & vllm.enabled flags for selecting which LLM will be used.)

Type of change

  • New feature (non-breaking change which adds new functionality)

Dependencies

opea/llm-docsum-vllm:latest image is currently missing from CI & DockerHub registries:
opea-project/GenAIComps#961

(Although corresponding opea/llm-docsum-tgi:latest image for TGI, and opea/llm-vllm:latest vLLM text-generation images already exist.)

Tests

Manual testing with opea/llm-docsum-vllm:latest image built locally.

@eero-t eero-t marked this pull request as draft December 18, 2024 18:58
@eero-t
Copy link
Contributor Author

eero-t commented Dec 18, 2024

Setting as draft because the required image is still missing from DockerHub, and this needs retesting after currently pending DocSum changes for Comps & Examples repos have completed.

@eero-t
Copy link
Contributor Author

eero-t commented Dec 20, 2024

While CI "docsum, gaudi, ci-gaudi-vllm-values" test fails as expected, due to OPEA missing llm-docsum-vllm image...

There seems to be a bug in component unrelated to this PR, as also run "llm-uservice, xeon, ci-faqgen-values, common" CI test fails to a package missing from image:

[pod/llm-uservice20241218190439-5b9b7b79fd-r65l9/llm-uservice20241218190439]
...
   File "/home/user/comps/llms/faq-generation/tgi/langchain/llm.py", line 77, in stream_generator
     from langserve.serialization import WellKnownLCSerializer
   File "/home/user/.local/lib/python3.11/site-packages/langserve/__init__.py", line 8, in <module>
     from langserve.client import RemoteRunnable
   File "/home/user/.local/lib/python3.11/site-packages/langserve/client.py", line 24, in <module>
     from httpx._types import AuthTypes, CertTypes, CookieTypes, HeaderTypes, VerifyTypes
 ImportError: cannot import name 'VerifyTypes' from 'httpx._types' (/home/user/.local/lib/python3.11/site-packages/httpx/_types.py)

=> requirements.txt for llm-faqgen-tgi:latest image generation is not up to date in Comps repo?

@lianhao?

@eero-t
Copy link
Contributor Author

eero-t commented Dec 30, 2024

Rebased to main + dropped "draft" status, as the required OPEA image is now available in DockerHub!

@eero-t
Copy link
Contributor Author

eero-t commented Dec 30, 2024

CI still fails.

PR #659 fixes DocSum issues with updates in other repos, includes the same model ID workaround as this one, and passed CI => better to merge that first & rebase this?


"docsum, gaudi, ci-gaudi-vllm-values" fails because CI registry is out of date. Although required image has been at DockerHub for 4 days [1], fetching it still fails:
Normal BackOff 4m47s (x18 over 9m42s) kubelet Back-off pulling image "100.83.111.229:5000/opea/llm-docsum-vllm:latest"
[1] https://hub.docker.com/r/opea/llm-docsum-vllm/tags

"docsum, xeon, ci-values" fails to connection failure:

[pod/docsum20241230125518-5599c984c6-dt5fc/docsum20241230125518] aiohttp.client_exceptions.ClientConnectorError: Cannot connect to host 0.0.0.0:7066 ssl:default [Connect call failed ('0.0.0.0', 7066)]
...
testpod: Response check failed, please check the logs in artifacts!

"docsum, gaudi, ci-gaudi-tgi-values" fails to similar CI issue:

[pod/docsum20241230130614-d598c6674-qqhjf/docsum20241230130614] aiohttp.client_exceptions.ClientConnectorError: Cannot connect to host 0.0.0.0:7066 ssl:default [Connect call failed ('0.0.0.0', 7066)]
...
testpod: Response check failed, please check the logs in artifacts!

@eero-t
Copy link
Contributor Author

eero-t commented Jan 2, 2025

Rebased to main to get CI tests passing, and dropped already merged fix. However, CI is still broken.

@daisy-ycguo CI "docsum, gaudi, ci-gaudi-vllm-values" still fails to CI registry not being up to date with DockerHub: https://hub.docker.com/r/opea/llm-docsum-vllm/tags
Failed to pull image "100.83.111.229:5000/opea/llm-docsum-vllm:latest": rpc error: code = NotFound desc = failed to pull and unpack image "100.83.111.229:5000/opea/llm-docsum-vllm:latest": failed to resolve reference "100.83.111.229:5000/opea/llm-docsum-vllm:latest": 100.83.111.229:5000/opea/llm-docsum-vllm:latest: not found

@lianhao, CI "docsum, gaudi, ci-gaudi-tgi-values" test fails now to test bug?
[pod/docsum20250102125737-llm-uservice-7d6f8d968f-v9b4w/docsum20250102125737] | huggingface_hub.errors.ValidationError: Input validation error: 'inputs' tokens + 'max_new_tokens' must be <= 4096. Given: 4095 'inputs' tokens and 17 'max_new_tokens'

Copy link
Collaborator

@lianhao lianhao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@eero-t I found pending PR opea-project/GenAIComps#1101 will make a big change, there will be no more llm-docsum-llm image any more, a single llm-docsum image will be able to talk to both tgi and vllm, so maybe we should wait until that PR to be merged first.

helm-charts/docsum/values.yaml Outdated Show resolved Hide resolved
@eero-t
Copy link
Contributor Author

eero-t commented Jan 7, 2025

@eero-t I found pending PR opea-project/GenAIComps#1101 will make a big change, there will be no more llm-docsum-llm image any more, a single llm-docsum image will be able to talk to both tgi and vllm, so maybe we should wait until that PR to be merged first.

Good to know, if CI registry continues missing the DockerHub image, that indeed should finally fix it.

@eero-t
Copy link
Contributor Author

eero-t commented Jan 14, 2025

Rebased to main, squashed "rebase fix" commit, and added commit updating llm-docsum-* image names according to: opea-project/GenAIComps#1101

@eero-t
Copy link
Contributor Author

eero-t commented Jan 14, 2025

Lot of CI failures, so of them due to incomplete update to DocSum refactor, some due to issues outside of this PR...

Doc building CI test fails to bugs in doc scripts:

/home/runner/work/GenAIInfra/GenAIInfra/docs/scripts/filter-known-issues.py:47: SyntaxWarning: invalid escape sequence '\s'
  b"(?P<comment>(^\s*#.*\n)+)" \
/home/runner/work/GenAIInfra/GenAIInfra/docs/scripts/filter-known-issues.py:91: SyntaxWarning: invalid escape sequence '\.'
  file_regex = re.compile(".*\.conf$")
WARNING: The config value `myst_enable_extensions' has type `list'; expected `Any'.
WARNING: The config value `myst_fence_as_directive' has type `list'; expected `Any'.
make: *** [Makefile:65: html] Error 2

"docsum, gaudi, ci-gaudi-tgi-values":

Response check failed, please check the logs in artifacts!
Error: Process completed with exit code 1.

"Xeon / go-e2e" CI test fails to image pull failure:

   Normal   BackOff    3m42s (x7 over 8m55s)  kubelet            Back-off pulling image "100.80.243.74:5000/opea/llm-tgi:latest"
  Warning  Failed     3m42s (x7 over 8m55s)  kubelet            Error: ImagePullBackOff
Pod llm-svc-deployment-677bf47dfb-qtt22 logs:
Error from server (BadRequest): container "llm-uservice" in pod "llm-svc-deployment-677bf47dfb-qtt22" is waiting to start: trying and failing to pull image

"docsum, xeon, ci-values":

Response check failed, please check the logs in artifacts!
Error: Process completed with exit code 1.

"docsum, gaudi, ci-gaudi-vllm-values" probably fails due to missing DocSum_COMPONENT_NAME env var.

"llm-uservice, xeon, ci-docsum-values, common":

   File "/home/user/.local/lib/python3.11/site-packages/huggingface_hub/file_download.py", line 302, in _request_wrapper
     hf_raise_for_status(response)
   File "/home/user/.local/lib/python3.11/site-packages/huggingface_hub/utils/_http.py", line 454, in hf_raise_for_status
     raise _format(RepositoryNotFoundError, message, response) from e
huggingface_hub.errors.RepositoryNotFoundError: 401 Client Error. (Request ID: Root=1-6786554f-67e5da2940e1acc410ccee05;0abe054b-411e-4108-bb1c-f1d3dba03521)
 
 Repository Not Found for url: https://huggingface.co/None/resolve/main/tokenizer_config.json.
...
Traceback (most recent call last):
   File "/home/user/comps/llms/src/doc-summarization/opea_docsum_microservice.py", line 26, in <module>
     loader = OpeaComponentLoader(llm_component_name, description=f"OPEA LLM DocSum Component: {llm_component_name}")
...
   File "/home/user/.local/lib/python3.11/site-packages/transformers/utils/hub.py", line 426, in cached_file
     raise EnvironmentError(
 OSError: None is not a local folder and is not a valid model identifier listed on 'https://huggingface.co/models'

etc.

@eero-t
Copy link
Contributor Author

eero-t commented Jan 14, 2025

Looking at the related refactor tickets:

Potentially needed:

export LLM_ENDPOINT_PORT=8008
export DOCSUM_PORT=9000
export LLM_ENDPOINT="http://${host_ip}:${LLM_ENDPOINT_PORT}"
export DocSum_COMPONENT_NAME="OPEADocSum_TGI"  # or _vLLM

And possibly also increase max tokens from 1024/2048 to 2048/4096.

@lianhao
Copy link
Collaborator

lianhao commented Jan 14, 2025

@eero-t you may need to wait until PR #696 get landin first

@yongfengdu
Copy link
Collaborator

llm-uservice CI issue should be fixed by #696
Everything should be fine after you rebased with the latest code

# To use Gaudi device
# helm install docsum docsum --set global.HUGGINGFACEHUB_API_TOKEN=${HFTOKEN} --values docsum/gaudi-values.yaml
# To use Gaudi device with TGI
# helm install docsum docsum --set global.HUGGINGFACEHUB_API_TOKEN=${HFTOKEN} --values docsum/gaudi-tgi-values.yaml ...
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why have the command in comments? Remove the # for the actual commands.

Copy link
Contributor Author

@eero-t eero-t Jan 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is consistent with how all the application READMEs are indicating Helm invocation alternatives. I guess it's to avoid user accidentally copy pasting them.

Another reason why it's commented here, is because it's not a complete command (notice ... at the end).

# To use Gaudi device with TGI
# helm install docsum docsum --set global.HUGGINGFACEHUB_API_TOKEN=${HFTOKEN} --values docsum/gaudi-tgi-values.yaml ...
# To use Gaudi device with vLLM
# helm install docsum docsum --set global.HUGGINGFACEHUB_API_TOKEN=${HFTOKEN} --values docsum/gaudi-vllm-values.yaml ..
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For consistency, your change should be done for all READMEs in this repo, not just this particular one => IMHO it's out of scope for this PR.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree that READMEs need to be updated together.

Signed-off-by: Eero Tamminen <[email protected]>
@eero-t
Copy link
Contributor Author

eero-t commented Jan 15, 2025

Rebased to main, fixed llm-uservice image name (missed in #696), added new DOCSUM_BACKEND values for llm-uservice, and dropped redundant MAX_* items for vllm subchart.

@eero-t
Copy link
Contributor Author

eero-t commented Jan 15, 2025

"docsum, gaudi, ci-gaudi-vllm-values" CI test is failing to unspecified error:

 [pod/docsum20250115170104-vllm-74fd965586-mnfvc/vllm] INFO 01-15 17:11:26 hpu_model_runner.py:699] Pre-loading model weights on hpu:0 took 13.52 GiB of device memory (13.52 GiB/94.62 GiB used) and 864.7 MiB of host memory (81.43 GiB/1007 GiB used)
-----------------------------------
+ exit 1
Error: Process completed with exit code 1.

I suspect that test timeout (5min?) is not long enough, because Gaudi vLLM startup / warmup takes much longer (7min) than Gaudi TGI.

I think mapping vLLM cache to persistent storage, shared with other vLLM instances on same node, its startup time could be significantly reduced. However, that does not help with first/cold start, for those the test timeout just needs to increased. @lianhao ?

EDIT: or maybe failure is due to vLLM crash (opea-project/GenAIComps#1038)?

@lianhao
Copy link
Collaborator

lianhao commented Jan 16, 2025

"docsum, gaudi, ci-gaudi-vllm-values" CI test is failing to unspecified error:

 [pod/docsum20250115170104-vllm-74fd965586-mnfvc/vllm] INFO 01-15 17:11:26 hpu_model_runner.py:699] Pre-loading model weights on hpu:0 took 13.52 GiB of device memory (13.52 GiB/94.62 GiB used) and 864.7 MiB of host memory (81.43 GiB/1007 GiB used)
-----------------------------------
+ exit 1
Error: Process completed with exit code 1.

I suspect that test timeout (5min?) is not long enough, because Gaudi vLLM startup / warmup takes much longer (7min) than Gaudi TGI.

I think mapping vLLM cache to persistent storage, shared with other vLLM instances on same node, its startup time could be significantly reduced. However, that does not help with first/cold start, for those the test timeout just needs to increased. @lianhao ?

EDIT: or maybe failure is due to vLLM crash (opea-project/GenAIComps#1038)?

I also suspect this failure is due to probe timeout. Because vllm-gaudi CI in llm-uservice chart run against this configuration file is fine.

@yongfengdu
Copy link
Collaborator

It's not timeout issue, the vllm pod restarted several times because of runtime error.
Looks like the vllm-gaudi:latest image is having trouble running intel-neural-chat, not sure it's a environment issue(CI server's gaudi SW version is 1.19), or configuration issue

@yongfengdu
Copy link
Collaborator

The latest opea/vllm-gaudi:latest image has problem and has reported this issue to CI team.
We tried the image built before yesterday and it passed.
Merge this first not waiting opea image fix. Other PRs will need to rebase and continue.

Copy link
Collaborator

@lianhao lianhao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Confirmed the vllm-gaudi issue is related to latest opea/vllm-gaudi:latest image itself.

@yongfengdu yongfengdu merged commit 0943764 into opea-project:main Jan 16, 2025
11 of 12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants