forked from alibaba/higress
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
add model-mapper plugin & optimize model-router plugin
- Loading branch information
Showing
11 changed files
with
1,061 additions
and
67 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,70 @@ | ||
# Copyright (c) 2022 Alibaba Group Holding Ltd. | ||
# | ||
# Licensed under the Apache License, Version 2.0 (the "License"); | ||
# you may not use this file except in compliance with the License. | ||
# You may obtain a copy of the License at | ||
# | ||
# http://www.apache.org/licenses/LICENSE-2.0 | ||
# | ||
# Unless required by applicable law or agreed to in writing, software | ||
# distributed under the License is distributed on an "AS IS" BASIS, | ||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
# See the License for the specific language governing permissions and | ||
# limitations under the License. | ||
|
||
load("@proxy_wasm_cpp_sdk//bazel:defs.bzl", "proxy_wasm_cc_binary") | ||
load("//bazel:wasm.bzl", "declare_wasm_image_targets") | ||
|
||
proxy_wasm_cc_binary( | ||
name = "model_mapper.wasm", | ||
srcs = [ | ||
"plugin.cc", | ||
"plugin.h", | ||
], | ||
deps = [ | ||
"@proxy_wasm_cpp_sdk//:proxy_wasm_intrinsics_higress", | ||
"@com_google_absl//absl/strings", | ||
"@com_google_absl//absl/time", | ||
"//common:json_util", | ||
"//common:http_util", | ||
"//common:rule_util", | ||
], | ||
) | ||
|
||
cc_library( | ||
name = "model_mapper_lib", | ||
srcs = [ | ||
"plugin.cc", | ||
], | ||
hdrs = [ | ||
"plugin.h", | ||
], | ||
copts = ["-DNULL_PLUGIN"], | ||
deps = [ | ||
"@com_google_absl//absl/strings", | ||
"@com_google_absl//absl/time", | ||
"//common:json_util", | ||
"@proxy_wasm_cpp_host//:lib", | ||
"//common:http_util_nullvm", | ||
"//common:rule_util_nullvm", | ||
], | ||
) | ||
|
||
cc_test( | ||
name = "model_mapper_test", | ||
srcs = [ | ||
"plugin_test.cc", | ||
], | ||
copts = ["-DNULL_PLUGIN"], | ||
deps = [ | ||
":model_mapper_lib", | ||
"@com_google_googletest//:gtest", | ||
"@com_google_googletest//:gtest_main", | ||
"@proxy_wasm_cpp_host//:lib", | ||
], | ||
) | ||
|
||
declare_wasm_image_targets( | ||
name = "model_mapper", | ||
wasm_file = ":model_mapper.wasm", | ||
) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,63 @@ | ||
## 功能说明 | ||
`model-mapper`插件实现了基于LLM协议中的model参数路由的功能 | ||
|
||
## 配置字段 | ||
|
||
| 名称 | 数据类型 | 填写要求 | 默认值 | 描述 | | ||
| ----------- | --------------- | ----------------------- | ------ | ------------------------------------------- | | ||
| `modelKey` | string | 选填 | model | 请求body中model参数的位置 | | ||
| `modelMapping` | map of string | 选填 | - | AI 模型映射表,用于将请求中的模型名称映射为服务提供商支持模型名称。<br/>1. 支持前缀匹配。例如用 "gpt-3-*" 匹配所有名称以“gpt-3-”开头的模型;<br/>2. 支持使用 "*" 为键来配置通用兜底映射关系;<br/>3. 如果映射的目标名称为空字符串 "",则表示保留原模型名称。 | | ||
| `enableOnPathSuffix` | array of string | 选填 | ["/v1/chat/completions"] | 只对这些特定路径后缀的请求生效 ## 运行属性 | ||
|
||
插件执行阶段:认证阶段 | ||
插件执行优先级:800 | ||
| | ||
## 效果说明 | ||
|
||
如下配置 | ||
|
||
```yaml | ||
modelMapping: | ||
'gpt-4-*': "qwen-max" | ||
'gpt-4o': "qwen-vl-plus" | ||
'*': "qwen-turbo" | ||
``` | ||
开启后,`gpt-4-` 开头的模型参数会被改写为 `qwen-max`, `gpt-4o` 会被改写为 `qwen-vl-plus`,其他所有模型会被改写为 `qwen-turbo` | ||
|
||
例如原本的请求是: | ||
|
||
```json | ||
{ | ||
"model": "gpt-4o", | ||
"frequency_penalty": 0, | ||
"max_tokens": 800, | ||
"stream": false, | ||
"messages": [{ | ||
"role": "user", | ||
"content": "higress项目主仓库的github地址是什么" | ||
}], | ||
"presence_penalty": 0, | ||
"temperature": 0.7, | ||
"top_p": 0.95 | ||
} | ||
``` | ||
|
||
|
||
经过这个插件后,原始的 LLM 请求体将被改成: | ||
|
||
```json | ||
{ | ||
"model": "qwen-vl-plus", | ||
"frequency_penalty": 0, | ||
"max_tokens": 800, | ||
"stream": false, | ||
"messages": [{ | ||
"role": "user", | ||
"content": "higress项目主仓库的github地址是什么" | ||
}], | ||
"presence_penalty": 0, | ||
"temperature": 0.7, | ||
"top_p": 0.95 | ||
} | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,65 @@ | ||
## Function Description | ||
The `model-mapper` plugin implements the functionality of routing based on the model parameter in the LLM protocol. | ||
|
||
## Configuration Fields | ||
|
||
| Name | Data Type | Filling Requirement | Default Value | Description | | ||
| ----------- | --------------- | ----------------------- | ------ | ------------------------------------------- | | ||
| `modelKey` | string | Optional | model | The location of the model parameter in the request body. | | ||
| `modelMapping` | map of string | Optional | - | AI model mapping table, used to map the model names in the request to the model names supported by the service provider.<br/>1. Supports prefix matching. For example, use "gpt-3-*" to match all models whose names start with “gpt-3-”;<br/>2. Supports using "*" as the key to configure a generic fallback mapping relationship;<br/>3. If the target name in the mapping is an empty string "", it means to keep the original model name. | | ||
| `enableOnPathSuffix` | array of string | Optional | ["/v1/chat/completions"] | Only applies to requests with these specific path suffixes. | | ||
|
||
## Runtime Properties | ||
|
||
Plugin execution phase: Authentication phase | ||
Plugin execution priority: 800 | ||
|
||
## Effect Description | ||
|
||
With the following configuration: | ||
|
||
```yaml | ||
modelMapping: | ||
'gpt-4-*': "qwen-max" | ||
'gpt-4o': "qwen-vl-plus" | ||
'*': "qwen-turbo" | ||
``` | ||
After enabling, model parameters starting with `gpt-4-` will be rewritten to `qwen-max`, `gpt-4o` will be rewritten to `qwen-vl-plus`, and all other models will be rewritten to `qwen-turbo`. | ||
|
||
For example, if the original request was: | ||
|
||
```json | ||
{ | ||
"model": "gpt-4o", | ||
"frequency_penalty": 0, | ||
"max_tokens": 800, | ||
"stream": false, | ||
"messages": [{ | ||
"role": "user", | ||
"content": "What is the GitHub address of the main repository for the higress project?" | ||
}], | ||
"presence_penalty": 0, | ||
"temperature": 0.7, | ||
"top_p": 0.95 | ||
} | ||
``` | ||
|
||
|
||
After processing by this plugin, the original LLM request body will be modified to: | ||
|
||
```json | ||
{ | ||
"model": "qwen-vl-plus", | ||
"frequency_penalty": 0, | ||
"max_tokens": 800, | ||
"stream": false, | ||
"messages": [{ | ||
"role": "user", | ||
"content": "What is the GitHub address of the main repository for the higress project?" | ||
}], | ||
"presence_penalty": 0, | ||
"temperature": 0.7, | ||
"top_p": 0.95 | ||
} | ||
``` |
Oops, something went wrong.