Skip to content

Commit

Permalink
Update README.md
Browse files Browse the repository at this point in the history
  • Loading branch information
sr-remsha authored Nov 13, 2024
1 parent 7dcf05e commit b64b1ab
Showing 1 changed file with 2 additions and 2 deletions.
4 changes: 2 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -220,12 +220,12 @@ Dynamic settings can include the following parameters:
| applications.<application_name> | `endpoint`: AI DIAL Application API for chat completions.<br />`iconUrl`: Icon path for the AI DIAL Application on UI.<br />`description`: Brief AI DIAL Application description.<br />`displayName`: AI DIAL Application name on UI.<br />`inputAttachmentTypes`: A list of allowed MIME types for the input attachments.<br />`maxInputAttachments`: Maximum number of input attachments (default is zero when `inputAttachmentTypes` is unset, otherwise, infinity) <br/> `forwardAuthToken`: If flag is set to `true` forward Http header with authorization token to chat completion endpoint of the application. <br />`userRoles`: a specific claim value provided by a specific IDP. Refer to [IDP Configuration](https://github.com/epam/ai-dial/blob/main/docs/Auth/2.%20Web/1.overview.md) to view examples.<br />`descriptionKeywords`: a list of keywords describes the model, e.g. `code-gen`, `text2image`. |
| applications.<application_name>.defaults | Default parameters are applied if a request doesn't contain them in OpenAI `chat/completions` API call |
| applications.<application_name>.interceptors | A list of interceptors to be triggered for the given application. Refer to [Interceptors](https://docs.epam-rail.com/tutorials/interceptors) to learn more. |
|applications.<application_name>.features| `rateEndpoint`: endpoint for rate requests *(exposed by DIAL Core as `<deployment name>/rate`)*.<br />`tokenizeEndpoint`: endpoint for requests to the model tokenizer *(exposed by DIAL Core as `<deployment name>/tokenize`)*.<br />`truncatePromptEndpoint`: endpoint for truncating prompt requests *(exposed by DIAL Core as `<deployment name>/truncate_prompt`)*.<br />`systemPromptSupported`: does the application support system prompt (default is `true`).<br />`toolsSupported`: does the application support tools (default is `false`).<br />`seedSupported`: does the application support `seed` request parameter (default is `false`).<br />`urlAttachmentsSupported`: does the application support attachments with URLs (default is `false`).<br />`folderAttachmentsSupported`: does the application support folder attachments (default is `false`)<br />`configurationEndpoint`: the endpoint to request application configuration parameters as JSON schema *(exposed by DIAL Core as `<deployment name>/configuration`)*. |
|applications.<application_name>.features| `rateEndpoint`: endpoint for rate requests *(exposed by DIAL Core as `<deployment name>/rate`)*.<br />`tokenizeEndpoint`: endpoint for requests to the model tokenizer *(exposed by DIAL Core as `<deployment name>/tokenize`)*.<br />`truncatePromptEndpoint`: endpoint for truncating prompt requests *(exposed by DIAL Core as `<deployment name>/truncate_prompt`)*.<br />`systemPromptSupported`: does the application support system prompt (default is `true`).<br />`toolsSupported`: does the application support tools (default is `false`).<br />`seedSupported`: does the application support `seed` request parameter (default is `false`).<br />`urlAttachmentsSupported`: does the application support attachments with URLs (default is `false`).<br />`folderAttachmentsSupported`: does the application support folder attachments (default is `false`)<br />`configurationEndpoint`: the endpoint to request application configuration parameters as JSON schema *(exposed by DIAL Core as `<deployment name>/configuration`)*.<br />`accessibleByPerRequestKey`: indicates whether the deployment is accessible using a per-request API key (default is `true`).<br />`contentPartsSupported`: indicates whether the deployment supports requests with content parts or not (default is `false`). |
| models | A list of deployed models and their parameters:<br />`<model_name>`: Unique model name. |
| models.<model_name> | `type`: Model type—`chat` or `embedding`.<br />`iconUrl`: Icon path for the model on UI.<br />`description`: Brief model description.<br />`displayName`: Model name on UI.<br />`displayVersion`: Model version on UI.<br />`endpoint`: Model API for chat completions or embeddings.<br />`tokenizerModel`: Identifies the specific model whose tokenization algorithm exactly matches that of the referenced model. This is typically the name of the earliest-released model in a series of models sharing an identical tokenization algorithm (e.g. `gpt-3.5-turbo-0301`, `gpt-4-0314`, or `gpt-4-1106-vision-preview`). This parameter is essential for DIAL clients that reimplement tokenization algorithms on their side, instead of utilizing the `tokenizeEndpoint` provided by the model.<br />`features`: Model features.<br />`limits`: Model token limits.<br />`pricing`: Model pricing.<br />`upstreams`: Used for [load-balancing—request](https://docs.epam-rail.com/tutorials/load-balancer) is sent to model endpoint containing X-UPSTREAM-ENDPOINT and X-UPSTREAM-KEY headers.<br />`userRoles`: a specific claim value provided by a specific IDP. Refer to [IDP Configuration](https://github.com/epam/ai-dial/blob/main/docs/Auth/2.%20Web/1.overview.md) to view examples.<br />`descriptionKeywords`: a list of keywords describes the model, e.g. `code-gen`, `text2image`. |
| models.<model_name>.limits | `maxPromptTokens`: maximum number of tokens in a completion request.<br />`maxCompletionTokens`: maximum number of tokens in a completion response.<br />`maxTotalTokens`: maximum number of tokens in completion request and response combined.<br />Typically either `maxTotalTokens` is specified or `maxPromptTokens` and `maxCompletionTokens`. |
| models.<model_name>.pricing | `unit`: the pricing units (currently `token` and `char_without_whitespace` are supported).<br />`prompt`: per-unit price for the completion request in USD.<br />`completion`: per-unit price for the completion response in USD. |
| models.<model_name>.features | `rateEndpoint`: endpoint for rate requests *(exposed by core as `<deployment name>/rate`)*.<br />`tokenizeEndpoint`: endpoint for requests to the model tokenizer *(exposed by DIAL Core as `<deployment name>/tokenize`)*.<br />`truncatePromptEndpoint`: endpoint for truncating prompt requests *(exposed by DIAL Core as `<deployment name>/truncate_prompt`)*.<br />`systemPromptSupported`: does the model support system prompt (default is `true`).<br />`toolsSupported`: does the model support tools (default is `false`).<br />`seedSupported`: does the model support `seed` request parameter (default is `false`).<br />`urlAttachmentsSupported`: does the model/application support attachments with URLs (default is `false`).<br />`folderAttachmentsSupported`: does the model/application support folder attachments (default is `false`) |
| models.<model_name>.features | `rateEndpoint`: endpoint for rate requests *(exposed by core as `<deployment name>/rate`)*.<br />`tokenizeEndpoint`: endpoint for requests to the model tokenizer *(exposed by DIAL Core as `<deployment name>/tokenize`)*.<br />`truncatePromptEndpoint`: endpoint for truncating prompt requests *(exposed by DIAL Core as `<deployment name>/truncate_prompt`)*.<br />`systemPromptSupported`: does the model support system prompt (default is `true`).<br />`toolsSupported`: does the model support tools (default is `false`).<br />`seedSupported`: does the model support `seed` request parameter (default is `false`).<br />`urlAttachmentsSupported`: does the model/application support attachments with URLs (default is `false`).<br />`folderAttachmentsSupported`: does the model/application support folder attachments (default is `false`)<br />`accessibleByPerRequestKey`: indicates whether the deployment is accessible using a per-request API key (default is `true`).<br />`contentPartsSupported`: indicates whether the deployment supports requests with content parts or not (default is `false`). |
| models.<model_name>.upstreams | `endpoint`: Model endpoint.<br />`key`: Your API key.<br />`weight`: Weight for upstream endpoint; positive number represents an endpoint capacity, zero or negative disables this enpoint from routing. Default value: 1.<br />`tier`: Specifies tier group for the endpoint. Only positive numbers allowed. All requests will be routed to the endpoints with the highest tier (the lowest tier value), other endpoints (with lower tier/higher tier value) may be used only if the highest tier endpoints are unavailable. Default value: 0 - highest tier. Refer to [Load Balancer](https://docs.epam-rail.com/tutorials/load-balancer) to learn more.<br/>`extraData`: Additional metadata containing any information that is passed to the upstream's endpoint. It can be a JSON or String. |
| models.<model_name>.defaults | Default parameters are applied if a request doesn't contain them in OpenAI `chat/completions` API call |
| models.<model_name>.interceptors | A list of interceptors to be triggered for the given model. Refer to [Interceptors](https://docs.epam-rail.com/tutorials/interceptors) to learn more. |
Expand Down

0 comments on commit b64b1ab

Please sign in to comment.