Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add kubernetes deployment for GenAIComps #1104

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

yongfengdu
Copy link
Collaborator

Description

The summary of the proposed changes as long as the relevant motivation and context.

Issues

List the issue or RFC link this PR is working on. If there is no such link, please mark it as n/a.

Type of change

List the type of change like below. Please delete options that are not relevant.

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds new functionality)
  • Breaking change (fix or feature that would break existing design and interface)
  • Others (enhancement, documentation, validation, etc.)

Dependencies

List the newly introduced 3rd party dependency if exists.

Tests

Describe the tests that you ran to verify your changes.

comps/3rd_parties/tgi/deployment/kubernetes/README.md Outdated Show resolved Hide resolved
comps/llms/deployment/kubernetes/README.md Outdated Show resolved Hide resolved
Copy link
Collaborator

@mkbhanda mkbhanda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some questions ..

Any reason why no failureThreshold on the readinessProbe for CPU and gaudi values?

Why do we not mention any image for CPU .. is it in some common file. Likewise all the min/max token lengths.

What about model id with Gaudi?

Thank you @yongfengdu for this PR.

comps/llms/deployment/kubernetes/README.md Outdated Show resolved Hide resolved
@yongfengdu
Copy link
Collaborator Author

Some questions ..

Any reason why no failureThreshold on the readinessProbe for CPU and gaudi values?
These values were tuned by @eero-t and I just borrowed them.
The default failureThreshold is 3.
For livenessProbe failure, the pod will be killed/restarted by kubelet, I think that's why the failureThreshold was set to a bigger value.

Why do we not mention any image for CPU .. is it in some common file. Likewise all the min/max token lengths.
There are default values in values.yaml, which is included in the helm chart. But it might be a good idea to mention them in cpu-values.yaml and gaudi-values.yaml, even if they are the same as default values.yaml.
I'll refine the *-values.yaml file.
What about model id with Gaudi?

Thank you @yongfengdu for this PR.

@yongfengdu
Copy link
Collaborator Author

Here is the link for default values.
https://github.com/opea-project/GenAIInfra/blob/main/helm-charts/common/tgi/values.yaml
It's better not set duplicate VARs to reduce maintenance effort.
For example the TGI image's tag:
tag: "2.4.0-intel-cpu"
It's a value need frequently change, we won't have it in multiple places.

Some questions ..

Any reason why no failureThreshold on the readinessProbe for CPU and gaudi values?

Why do we not mention any image for CPU .. is it in some common file. Likewise all the min/max token lengths.

What about model id with Gaudi?

Thank you @yongfengdu for this PR.

@eero-t
Copy link
Contributor

eero-t commented Jan 7, 2025

Any reason why no failureThreshold on the readinessProbe for CPU and gaudi values?

These values were tuned by @eero-t and I just borrowed them.

I was concentrating more on when things actually start working, rather than when they start failing, so it's a rather rough value.

The default failureThreshold is 3.

They could be explicitly mentioned also for readinessProbes.

For Gaudi, I think the readiness probe failureThreshold could be smaller, but I haven't tested it.

For CPU it's definitely better to keep it >1 because perf on CPU is so unpredictable (because underlying HW can differ and pods do not have fine-tuned resource requests / limits).

For livenessProbe failure, the pod will be killed/restarted by kubelet, I think that's why the failureThreshold was set to a bigger value.

I've never seen OPEA pods deadlock, especially in a way that would be solved by restarting the pod. I.e. liveness probe restarts just make things worse, as service may then never reach ready/live state => IMHO liveness probes are just harmful, and could be removed.

@poussa
Copy link

poussa commented Jan 7, 2025

Why do we have kubernetes/deployment in the pathname? It should be either kubernetes or deployment, not both. Or what is the logic here?

@yongfengdu
Copy link
Collaborator Author

Why do we have kubernetes/deployment in the pathname? It should be either kubernetes or deployment, not both. Or what is the logic here?

It's "deployment/kubernetes"
There is also a docker_compose directory under deployment(deployment/docker_compose), providing the instructions for docker environment.
See
https://github.com/opea-project/GenAIComps/tree/main/comps/3rd_parties/tgi/deployment

@yongfengdu yongfengdu force-pushed the newhelm branch 2 times, most recently from e19ecf2 to 353bc46 Compare January 10, 2025 09:23
@yongfengdu yongfengdu changed the title Add kubernetes deployment for tgi and llm Add kubernetes deployment for GenAIComps Jan 10, 2025
Copy link
Collaborator

@lianhao lianhao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for dataprep and retriever, the current value files will be obsolete in a couple of days.(new PR in infra repo will be submitted next week). Should we remove them from this PR?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants