Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add demoui for openai api #777

Merged
merged 1 commit into from
Dec 12, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions charts/DemoUI/inference/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,12 +5,12 @@ Before deploying the Demo front-end, you must set the `workspaceServiceURL` envi
To set this value, modify the `values.override.yaml` file or use the `--set` flag during Helm install/upgrade:

```bash
helm install inference-frontend ./charts/DemoUI/inference/values.yaml --set env.workspaceServiceURL="http://<CLUSTER_IP>:80/chat"
helm install inference-frontend ./charts/DemoUI/inference --set env.workspaceServiceURL="http://<CLUSTER_IP>:80"
```

Or through a custom `values` file (`values.override.yaml`):
```bash
helm install inference-frontend ./charts/DemoUI/inference/values.yaml -f values.override.yaml
helm install inference-frontend ./charts/DemoUI/inference -f values.override.yaml
```

## Values
Expand Down
21 changes: 17 additions & 4 deletions charts/DemoUI/inference/templates/deployment.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -37,13 +37,26 @@ spec:
args:
- -c
- |
mkdir -p /app/frontend && \
pip install chainlit requests && \
wget -O /app/frontend/inference.py https://raw.githubusercontent.com/kaito-project/kaito/main/demo/inferenceUI/chainlit.py && \
chainlit run frontend/inference.py -w
mkdir -p /app/frontend
pip install chainlit pydantic==2.10.1 requests openai --quiet
case "$RUNTIME" in
vllm)
wget -O /app/frontend/inference.py https://raw.githubusercontent.com/kaito-project/kaito/refs/heads/main/demo/inferenceUI/chainlit_openai.py
;;
transformers)
wget -O /app/frontend/inference.py https://raw.githubusercontent.com/kaito-project/kaito/refs/heads/main/demo/inferenceUI/chainlit_transformers.py
;;
*)
echo "Error: Unsupported RUNTIME value" >&2
exit 1
;;
esac
chainlit run --host 0.0.0.0 /app/frontend/inference.py -w
env:
- name: WORKSPACE_SERVICE_URL
value: "{{ .Values.env.workspaceServiceURL }}"
- name: RUNTIME
value: "{{ .Values.env.runtime }}"
workingDir: /app
ports:
- name: http
Expand Down
10 changes: 6 additions & 4 deletions charts/DemoUI/inference/values.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ replicaCount: 1
image:
repository: python
pullPolicy: IfNotPresent
tag: "3.8"
tag: "3.12"
imagePullSecrets: []
podAnnotations: {}
serviceAccount:
Expand All @@ -18,9 +18,9 @@ service:
# Specify the URL for the Workspace Service inference endpoint. Use the DNS name within the cluster for reliability.
#
# Examples:
# Cluster IP: "http://<CLUSTER_IP>:80/chat"
# DNS name: "http://<SERVICE_NAME>.<NAMESPACE>.svc.cluster.local:80/chat"
# e.g., "http://workspace-falcon-7b.default.svc.cluster.local:80/chat"
# Cluster IP: "http://<CLUSTER_IP>:80"
# DNS name: "http://<SERVICE_NAME>.<NAMESPACE>.svc.cluster.local:80"
# e.g., "http://workspace-falcon-7b.default.svc.cluster.local:80"
#
# workspaceServiceURL: "<YOUR_SERVICE_URL>"
resources:
Expand All @@ -44,6 +44,8 @@ readinessProbe:
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 1
env:
runtime: "vllm" # "vllm" or "transformers"
nodeSelector: {}
tolerations: []
affinity: {}
Expand Down
4 changes: 2 additions & 2 deletions demo/inferenceUI/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,12 +20,12 @@ Workspace Service endpoint.
- Using the --set flag:

```
helm install inference-frontend ./charts/DemoUI/inference --set env.workspaceServiceURL="http://<SERVICE_NAME>.<NAMESPACE>.svc.cluster.local:80/chat"
helm install inference-frontend ./charts/DemoUI/inference --set env.workspaceServiceURL="http://<SERVICE_NAME>.<NAMESPACE>.svc.cluster.local:80"
```
- Using a custom `values.override.yaml` file:
```
env:
workspaceServiceURL: "http://<SERVICE_NAME>.<NAMESPACE>.svc.cluster.local:80/chat"
workspaceServiceURL: "http://<SERVICE_NAME>.<NAMESPACE>.svc.cluster.local:80"
```
Then deploy with custom values file:
```
Expand Down
54 changes: 54 additions & 0 deletions demo/inferenceUI/chainlit_openai.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,54 @@
import os
from urllib.parse import urljoin

from openai import AsyncOpenAI
import chainlit as cl

URL = os.environ.get('WORKSPACE_SERVICE_URL')

client = AsyncOpenAI(base_url=urljoin(URL, "v1"), api_key="YOUR_OPENAI_API_KEY")
cl.instrument_openai()

settings = {
"temperature": 0.7,
"max_tokens": 500,
"top_p": 1,
"frequency_penalty": 0,
"presence_penalty": 0,
}

@cl.on_chat_start
async def start_chat():
models = await client.models.list()
print(f"Using model: {models}")
if len(models.data) == 0:
raise ValueError("No models found")

global model
model = models.data[0].id
print(f"Using model: {model}")

@cl.on_message
async def main(message: cl.Message):
messages=[
{
"content": "You are a helpful assistant.",
"role": "system"
},
{
"content": message.content,
"role": "user"
}
]
msg = cl.Message(content="")

stream = await client.chat.completions.create(
messages=messages, model=model,
stream=True,
**settings
)

async for part in stream:
if token := part.choices[0].delta.content or "":
await msg.stream_token(token)
await msg.update()
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
import os
from urllib.parse import urljoin

import chainlit as cl
import requests
Expand All @@ -25,7 +26,7 @@ def inference(prompt):
}
}

response = requests.post(URL, json=data)
response = requests.post(urljoin(URL, "chat"), json=data)

if response.status_code == 200:
response_data = response.json()
Expand Down
Loading