Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore: Update table of contents for orchestration doc #467

Merged
merged 1 commit into from
Jan 20, 2025
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
239 changes: 120 additions & 119 deletions packages/orchestration/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@
- [Data Masking](#data-masking)
- [Grounding](#grounding)
- [Using a JSON Configuration from AI Launchpad](#using-a-json-configuration-from-ai-launchpad)
- [Streaming](#streaming)
- [Using Resource Groups](#using-resource-groups)
- [Custom Request Configuration](#custom-request-configuration)
- [Custom Destination](#custom-destination)
Expand Down Expand Up @@ -86,125 +87,6 @@

In addition to the examples below, you can find more **sample code** [here](https://github.com/SAP/ai-sdk-js/blob/main/sample-code/src/orchestration.ts).

### Streaming

The `OrchestrationClient` supports streaming responses for chat completion requests based on the [Server-sent events](https://html.spec.whatwg.org/multipage/server-sent-events.html#server-sent-events) standard.

Use the `stream()` method to receive a stream of chunk responses from the model.
After consuming the stream, call the helper methods to get the finish reason and token usage information.

```ts
const orchestrationClient = new OrchestrationClient({
llm: {
model_name: 'gpt-4o',
model_params: { max_tokens: 50, temperature: 0.1 }
},
templating: {
template: [
{ role: 'user', content: 'Give a long history of {{?country}}?' }
]
}
});

const response = await orchestrationClient.stream({
inputParams: { country: 'France' }
});

for await (const chunk of response.stream) {
console.log(JSON.stringify(chunk));
}

const finishReason = response.getFinishReason();
const tokenUsage = response.getTokenUsage();

console.log(`Finish reason: ${finishReason}\n`);
console.log(`Token usage: ${JSON.stringify(tokenUsage)}\n`);
```

#### Streaming the Delta Content

The client provides a helper method to extract the text chunks as strings:

```ts
for await (const chunk of response.stream.toContentStream()) {
console.log(chunk); // will log the delta content
}
```

Each chunk will be a string containing the delta content.

#### Streaming with Abort Controller

Streaming request can be aborted using the `AbortController` API.
In case of an error, the SAP Cloud SDK for AI will automatically close the stream.
Additionally, it can be aborted manually by calling the `stream()` method with an `AbortController` object.

```ts
const orchestrationClient = new OrchestrationClient({
llm: {
model_name: 'gpt-4o',
model_params: { max_tokens: 50, temperature: 0.1 }
},
templating: {
template: [
{ role: 'user', content: 'Give a long history of {{?country}}?' }
]
}
});

const controller = new AbortController();
const response = await orchestrationClient.stream(
{
inputParams: { country: 'France' }
},
controller
);

// Abort the streaming request after one second
setTimeout(() => {
controller.abort();
}, 1000);

for await (const chunk of response.stream) {
console.log(JSON.stringify(chunk));
}
```

In this example, streaming request will be aborted after one second.
Abort controller can be useful, e.g., when end-user wants to stop the stream or refreshes the page.

#### Stream Options

The orchestration service offers multiple streaming options, which you can configure in addition to the LLM's streaming options.
These include options like definining the maximum number of characters per chunk or modifying the output filter behavior.
There are two ways to add specific streaming options to your client, either at initialization of orchestration client, or when calling the stream API.

Setting streaming options dynamically could be useful if an initialized orchestration client will also be used for streaming.

You can check the list of available stream options in the [orchestration service's documentation](https://help.sap.com/docs/sap-ai-core/sap-ai-core-service-guide/streaming).

An example for setting the streaming options when calling the stream API looks like the following:

```ts
const response = orchestrationClient.stream(
{
inputParams: { country: 'France' }
},
controller,
{
llm: { include_usage: false },
global: { chunk_size: 10 },
outputFiltering: { overlap: 200 }
}
);
```

Usage metrics are collected by default, if you do not want to receive them, set `include_usage` to `false`.
If you don't want any streaming options as part of your call to the LLM, set `streamOptions.llm` to `null`.

> [!NOTE]
> When initalizing a client with a JSON module config, providing streaming options is not possible.

### Templating

Use the orchestration client with templating to pass a prompt containing placeholders that will be replaced with input parameters during a chat completion request.
Expand Down Expand Up @@ -501,6 +383,125 @@
return response;
```

### Streaming

The `OrchestrationClient` supports streaming responses for chat completion requests based on the [Server-sent events](https://html.spec.whatwg.org/multipage/server-sent-events.html#server-sent-events) standard.

Use the `stream()` method to receive a stream of chunk responses from the model.
After consuming the stream, call the helper methods to get the finish reason and token usage information.

```ts
const orchestrationClient = new OrchestrationClient({
llm: {
model_name: 'gpt-4o',
model_params: { max_tokens: 50, temperature: 0.1 }
},
templating: {
template: [
{ role: 'user', content: 'Give a long history of {{?country}}?' }
]
}
});

const response = await orchestrationClient.stream({
inputParams: { country: 'France' }
});

for await (const chunk of response.stream) {
console.log(JSON.stringify(chunk));
}

const finishReason = response.getFinishReason();
const tokenUsage = response.getTokenUsage();

console.log(`Finish reason: ${finishReason}\n`);
console.log(`Token usage: ${JSON.stringify(tokenUsage)}\n`);
```

#### Streaming the Delta Content

The client provides a helper method to extract the text chunks as strings:

```ts
for await (const chunk of response.stream.toContentStream()) {
console.log(chunk); // will log the delta content
}
```

Each chunk will be a string containing the delta content.

#### Streaming with Abort Controller

Streaming request can be aborted using the `AbortController` API.
In case of an error, the SAP Cloud SDK for AI will automatically close the stream.
Additionally, it can be aborted manually by calling the `stream()` method with an `AbortController` object.

```ts
const orchestrationClient = new OrchestrationClient({
llm: {
model_name: 'gpt-4o',
model_params: { max_tokens: 50, temperature: 0.1 }
},
templating: {
template: [
{ role: 'user', content: 'Give a long history of {{?country}}?' }
]
}
});

const controller = new AbortController();
const response = await orchestrationClient.stream(
{
inputParams: { country: 'France' }
},
controller
);

// Abort the streaming request after one second
setTimeout(() => {
controller.abort();
}, 1000);

for await (const chunk of response.stream) {
console.log(JSON.stringify(chunk));
}
```

In this example, streaming request will be aborted after one second.
Abort controller can be useful, e.g., when end-user wants to stop the stream or refreshes the page.

#### Stream Options

The orchestration service offers multiple streaming options, which you can configure in addition to the LLM's streaming options.
These include options like definining the maximum number of characters per chunk or modifying the output filter behavior.
There are two ways to add specific streaming options to your client, either at initialization of orchestration client, or when calling the stream API.

Setting streaming options dynamically could be useful if an initialized orchestration client will also be used for streaming.

You can check the list of available stream options in the [orchestration service's documentation](https://help.sap.com/docs/sap-ai-core/sap-ai-core-service-guide/streaming).

An example for setting the streaming options when calling the stream API looks like the following:

```ts
const response = orchestrationClient.stream(
{
inputParams: { country: 'France' }
},
controller,
{
llm: { include_usage: false },
global: { chunk_size: 10 },
outputFiltering: { overlap: 200 }
}
);
```

Usage metrics are collected by default, if you do not want to receive them, set `include_usage` to `false`.
If you don't want any streaming options as part of your call to the LLM, set `streamOptions.llm` to `null`.

> [!NOTE]

Check warning on line 502 in packages/orchestration/README.md

View workflow job for this annotation

GitHub Actions / grammar-check

[vale] reported by reviewdog 🐶 [SAP.Spacing] '!N' should have one space. Raw Output: {"message": "[SAP.Spacing] '!N' should have one space.", "location": {"path": "packages/orchestration/README.md", "range": {"start": {"line": 502, "column": 4}}}, "severity": "WARNING"}
> When initalizing a client with a JSON module config, providing streaming options is not possible.

### Using Resource Groups

The resource group can be used as an additional parameter to pick the right orchestration deployment.
Expand Down
Loading