Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Add streaming API to orchestration #352

Merged
merged 65 commits into from
Jan 13, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
65 commits
Select commit Hold shift + click to select a range
beac1bd
branch
tomfrenken Dec 4, 2024
d55ec77
add most parts
tomfrenken Dec 5, 2024
3e36337
stream method
tomfrenken Dec 5, 2024
d0f0277
remove path
tomfrenken Dec 9, 2024
28e9146
change types
tomfrenken Dec 9, 2024
27f22a2
fix: Changes from lint
Dec 9, 2024
7c57cf0
remove todo
tomfrenken Dec 9, 2024
ec8369b
Merge branch 'main' of https://github.com/SAP/ai-sdk-js into orchestr…
tomfrenken Dec 9, 2024
217efdc
merge
tomfrenken Dec 9, 2024
27f4032
common execute
tomfrenken Dec 9, 2024
15fae8a
fix: Changes from lint
Dec 9, 2024
188e0ba
update docs, types
tomfrenken Dec 16, 2024
431abce
narrow type
tomfrenken Dec 16, 2024
3fdd473
add sample code, tests, ...
tomfrenken Dec 17, 2024
f04582d
update code
tomfrenken Dec 17, 2024
8b5b61e
fix streaming
tomfrenken Dec 17, 2024
c64d3bd
add mock responses
tomfrenken Dec 18, 2024
032f46f
update test
tomfrenken Dec 18, 2024
728ff82
update test
tomfrenken Dec 18, 2024
4180d54
add last test
tomfrenken Dec 18, 2024
94d9c35
fix tests
tomfrenken Dec 19, 2024
48b55fe
lint:fix
tomfrenken Dec 19, 2024
46c9378
Merge branch 'main' into orchestration-streaming
tomfrenken Dec 19, 2024
f6735f9
remove internal from core
tomfrenken Dec 19, 2024
80efcd8
expose stream class
tomfrenken Dec 19, 2024
87c9557
add dev dependencies
tomfrenken Dec 19, 2024
a2f2387
add docs
tomfrenken Dec 19, 2024
8f77987
major refactor
tomfrenken Dec 25, 2024
460d223
refactor
tomfrenken Jan 1, 2025
45d1a5a
Apply suggestions from code review
tomfrenken Jan 2, 2025
08f3e91
refactor all suggestions
tomfrenken Jan 3, 2025
495c7f6
refactor all suggestions4
tomfrenken Jan 3, 2025
cabdbdf
merge main
tomfrenken Jan 5, 2025
87201de
fix type tests
tomfrenken Jan 5, 2025
c182652
fix: Changes from lint
Jan 5, 2025
700b9fe
Merge branch 'main' into orchestration-streaming
jjtang1985 Jan 6, 2025
5c94d41
refactor everything, again
tomfrenken Jan 6, 2025
6688dcb
merge main
tomfrenken Jan 6, 2025
f0bae6e
merge main
tomfrenken Jan 6, 2025
78ae9d4
merge more
tomfrenken Jan 6, 2025
3018ad5
fix: Changes from lint
Jan 6, 2025
2b03915
adjust tests & logic
tomfrenken Jan 7, 2025
0407c78
merge main
tomfrenken Jan 7, 2025
5dcbf06
fix: Changes from lint
Jan 7, 2025
cb35f42
final logic fixes
tomfrenken Jan 7, 2025
15bd1cd
Merge main
tomfrenken Jan 8, 2025
a552844
Add ability to remove options
tomfrenken Jan 8, 2025
f83c029
merge main
tomfrenken Jan 8, 2025
ee5b88b
last exception
tomfrenken Jan 8, 2025
d454941
add docs
tomfrenken Jan 8, 2025
b73ad7d
fix: Changes from lint
Jan 8, 2025
1eb65d7
vale
tomfrenken Jan 8, 2025
27a3350
allow-list llm
tomfrenken Jan 8, 2025
5351267
add tests and docs
tomfrenken Jan 9, 2025
6746577
change orchestration exports
tomfrenken Jan 9, 2025
a5d65ef
add a billion tests
tomfrenken Jan 10, 2025
24885c9
merge billion conflicts
tomfrenken Jan 10, 2025
4d78c6a
fix: Changes from lint
Jan 10, 2025
5663d4e
add sample code and e2e tests
tomfrenken Jan 10, 2025
53d6ec1
merge with main
tomfrenken Jan 13, 2025
77f08d4
fix: Changes from lint
Jan 13, 2025
8aadbac
add changelog, adjust docs
tomfrenken Jan 13, 2025
fba722d
adjust type test imports
tomfrenken Jan 13, 2025
50c0dc0
fix: Changes from lint
Jan 13, 2025
5ed1be2
Update .changeset/hip-melons-destroy.md
ZhongpinWang Jan 13, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions .changeset/hip-melons-destroy.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
---
'@sap-ai-sdk/orchestration': minor
---

[New Functionality] Add support for streaming in the orchestration client.
1 change: 1 addition & 0 deletions packages/core/src/index.ts
Original file line number Diff line number Diff line change
Expand Up @@ -9,3 +9,4 @@ export {
AwsBedrockChatModel,
AiCoreOpenSourceChatModel
} from './model-types.js';
export { SseStream, LineDecoder, SSEDecoder } from './stream/index.js';
3 changes: 3 additions & 0 deletions packages/core/src/stream/index.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
export * from './sse-stream.js';
export * from './sse-decoder.js';
export * from './line-decoder.js';
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,6 @@ type Bytes = string | ArrayBuffer | Uint8Array | Buffer | null | undefined;
* reading lines from text.
*
* Https://github.com/encode/httpx/blob/920333ea98118e9cf617f246905d7b202510941c/httpx/_decoders.py#L258.
* @internal
MatKuhr marked this conversation as resolved.
Show resolved Hide resolved
*/
export class LineDecoder {
// prettier-ignore
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,6 @@ export interface ServerSentEvent {

/**
* Server-Sent Event decoder.
* @internal
*/
export class SSEDecoder {
private data: string[];
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,6 @@ type Bytes = string | ArrayBuffer | Uint8Array | Buffer | null | undefined;

/**
* Stream implemented as an async iterable.
* @internal
*/
export class SseStream<Item> implements AsyncIterable<Item> {
protected static transformToSseStream<Item>(
Expand Down
6 changes: 3 additions & 3 deletions packages/foundation-models/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -152,7 +152,7 @@ Refer to `AzureOpenAiChatCompletionParameters` interface for other parameters th
The `AzureOpenAiChatClient` supports streaming response for chat completion requests based on the [Server-sent events](https://html.spec.whatwg.org/multipage/server-sent-events.html#server-sent-events) standard.

Use the `stream()` method to receive a stream of chunk responses from the model.
After consuming the stream, call the helper methods to get the finish reason and token usage information respectively.
After consuming the stream, call the helper methods to get the finish reason and token usage information.

```ts
const chatClient = new AzureOpenAiChatClient('gpt-4o');
Expand All @@ -178,7 +178,7 @@ console.log(`Token usage: ${JSON.stringify(tokenUsage)}\n`);

##### Streaming the Delta Content

The client provides a helper method to extract delta content and stream string directly.
The client provides a helper method to extract the text chunks as strings:

```ts
for await (const chunk of response.stream.toContentStream()) {
Expand All @@ -198,7 +198,7 @@ Additionally, it can be aborted manually by calling the `stream()` method with a
```ts
const chatClient = new AzureOpenAiChatClient('gpt-4o');
const controller = new AbortController();
const response = await new AzureOpenAiChatClient('gpt-35-turbo').stream(
const response = await chatClient.stream(
{
messages: [
{
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -198,7 +198,7 @@ describe('Azure OpenAI chat client', () => {

const response = await client.stream(prompt);
for await (const chunk of response.stream) {
expect(JSON.stringify(chunk.data)).toEqual(initialResponse);
expect(chunk.data).toEqual(JSON.parse(initialResponse));
break;
}
});
Expand Down
Original file line number Diff line number Diff line change
@@ -1,9 +1,8 @@
import { createLogger } from '@sap-cloud-sdk/util';
import { jest } from '@jest/globals';
import { LineDecoder, SSEDecoder } from '@sap-ai-sdk/core';
import { parseFileToString } from '../../../../test-util/mock-http.js';
import { AzureOpenAiChatCompletionStream } from './azure-openai-chat-completion-stream.js';
import { LineDecoder } from './stream/line-decoder.js';
import { SSEDecoder } from './stream/sse-decoder.js';

describe('OpenAI chat completion stream', () => {
let sseChunks: string[];
Expand Down Expand Up @@ -39,12 +38,12 @@ describe('OpenAI chat completion stream', () => {

it('should wrap the raw chunk', async () => {
let output = '';
const asnycGenerator = AzureOpenAiChatCompletionStream._processChunk(
const asyncGenerator = AzureOpenAiChatCompletionStream._processChunk(
originalChatCompletionStream
);
for await (const chunk of asnycGenerator) {
for await (const chunk of asyncGenerator) {
expect(chunk).toBeDefined();
chunk.getDeltaContent() ? (output += chunk.getDeltaContent()) : null;
output += chunk.getDeltaContent() ?? '';
}
expect(output).toEqual('The capital of France is Paris.');
});
Expand Down
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
import { createLogger } from '@sap-cloud-sdk/util';
import { SseStream } from './stream/index.js';
import { SseStream } from '@sap-ai-sdk/core';
import { AzureOpenAiChatCompletionStreamChunkResponse } from './azure-openai-chat-completion-stream-chunk-response.js';
import type { HttpResponse } from '@sap-cloud-sdk/http-client';
import type { AzureOpenAiChatCompletionStreamResponse } from './azure-openai-chat-completion-stream-response.js';
Expand Down

This file was deleted.

119 changes: 119 additions & 0 deletions packages/orchestration/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -86,6 +86,125 @@

In addition to the examples below, you can find more **sample code** [here](https://github.com/SAP/ai-sdk-js/blob/main/sample-code/src/orchestration.ts).

### Streaming

The `OrchestrationClient` supports streaming responses for chat completion requests based on the [Server-sent events](https://html.spec.whatwg.org/multipage/server-sent-events.html#server-sent-events) standard.

Use the `stream()` method to receive a stream of chunk responses from the model.
After consuming the stream, call the helper methods to get the finish reason and token usage information.

```ts
const orchestrationClient = new OrchestrationClient({
llm: {
model_name: 'gpt-4o',
model_params: { max_tokens: 50, temperature: 0.1 }
},
templating: {
template: [
{ role: 'user', content: 'Give a long history of {{?country}}?' }
]
}
});

const response = await orchestrationClient.stream({
inputParams: { country: 'France' }
});

for await (const chunk of response.stream) {
console.log(JSON.stringify(chunk));
}

const finishReason = response.getFinishReason();
const tokenUsage = response.getTokenUsage();

console.log(`Finish reason: ${finishReason}\n`);
console.log(`Token usage: ${JSON.stringify(tokenUsage)}\n`);
```

#### Streaming the Delta Content

The client provides a helper method to extract the text chunks as strings:

```ts
for await (const chunk of response.stream.toContentStream()) {
console.log(chunk); // will log the delta content
}
```

Each chunk will be a string containing the delta content.

#### Streaming with Abort Controller
tomfrenken marked this conversation as resolved.
Show resolved Hide resolved

Streaming request can be aborted using the `AbortController` API.
In case of an error, the SAP Cloud SDK for AI will automatically close the stream.
Additionally, it can be aborted manually by calling the `stream()` method with an `AbortController` object.

```ts
const orchestrationClient = new OrchestrationClient({
llm: {
model_name: 'gpt-4o',
model_params: { max_tokens: 50, temperature: 0.1 }
},
templating: {
template: [
{ role: 'user', content: 'Give a long history of {{?country}}?' }
]
}
});

const controller = new AbortController();
const response = await orchestrationClient.stream(
{
inputParams: { country: 'France' }
},
controller
);

// Abort the streaming request after one second
setTimeout(() => {
controller.abort();
}, 1000);

for await (const chunk of response.stream) {
console.log(JSON.stringify(chunk));
}
```

In this example, streaming request will be aborted after one second.
Abort controller can be useful, e.g., when end-user wants to stop the stream or refreshes the page.

#### Stream Options

The orchestration service offers multiple streaming options, which you can configure in addition to the LLM's streaming options.
tomfrenken marked this conversation as resolved.
Show resolved Hide resolved
These include options like definining the maximum number of characters per chunk or modifying the output filter behavior.
There are two ways to add specific streaming options to your client, either at initialization of orchestration client, or when calling the stream API.

Setting streaming options dynamically could be useful if an initialized orchestration client will also be used for streaming.

You can check the list of available stream options in the [orchestration service's documentation](https://help.sap.com/docs/sap-ai-core/sap-ai-core-service-guide/streaming).

An example for setting the streaming options when calling the stream API looks like the following:

```ts
const response = orchestrationClient.stream(
{
inputParams: { country: 'France' }
},
controller,
{
llm: { include_usage: false },
global: { chunk_size: 10 },
outputFiltering: { overlap: 200 }
tomfrenken marked this conversation as resolved.
Show resolved Hide resolved
}
);
```

Usage metrics are collected by default, if you do not want to receive them, set `include_usage` to `false`.
If you don't want any streaming options as part of your call to the LLM, set `streamOptions.llm` to `null`.

> [!NOTE]

Check warning on line 205 in packages/orchestration/README.md

View workflow job for this annotation

GitHub Actions / grammar-check

[vale] reported by reviewdog 🐶 [SAP.Spacing] '!N' should have one space. Raw Output: {"message": "[SAP.Spacing] '!N' should have one space.", "location": {"path": "packages/orchestration/README.md", "range": {"start": {"line": 205, "column": 4}}}, "severity": "WARNING"}
> When initalizing a client with a JSON module config, providing streaming options is not possible.

### Templating

Use the orchestration client with templating to pass a prompt containing placeholders that will be replaced with input parameters during a chat completion request.
Expand Down
3 changes: 3 additions & 0 deletions packages/orchestration/package.json
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,9 @@
"dependencies": {
"@sap-ai-sdk/core": "workspace:^",
"@sap-ai-sdk/ai-api": "workspace:^",
"@sap-cloud-sdk/util": "^3.25.0"
},
"devDependencies": {
tomfrenken marked this conversation as resolved.
Show resolved Hide resolved
"@sap-cloud-sdk/http-client": "^3.25.0",
"@sap-cloud-sdk/connectivity": "^3.25.0"
}
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
// Jest Snapshot v1, https://goo.gl/fbAQLP

exports[`Orchestration chat completion stream should transform the original stream to string stream 1`] = `
"The SAP Cloud SDK is a comprehensive development toolkit designed to simplify and accelerate the creation of applications that integrate with SAP solutions, particularly those built on the SAP Business Technology Platform (BTP). It provides developers with libraries, tools, and best practices that streamline the process of connecting to SAP systems, such as S/4HANA and other services available on the SAP Cloud Platform.

Key features of the SAP Cloud SDK include:

1. **Simplified Connectivity**: The SDK offers pre-built libraries to easily interact with SAP services, providing capabilities for authentication, service consumption, and OData/REST client generation.

2. **Multi-cloud Support**: It supports multiple cloud environments, ensuring that applications remain flexible and can be deployed across various cloud providers.

3. **Best Practices and Guidelines**: The SDK includes best practices for development, ensuring high-quality, scalable, and maintainable code.

4. **Project Scaffolding and Code Samples**: Developers can quickly start their projects using provided templates and samples, accelerating the development process and reducing the learning curve.

5. **Extensive Documentation and Community Support**: Ample documentation, tutorials, and an active community help developers overcome challenges and adopt the SDK efficiently.

Overall, the SAP Cloud SDK is an essential tool for developers looking to build cloud-native applications and extensions that seamlessly integrate with SAP's enterprise solutions."
`;

exports[`Orchestration chat completion stream should wrap the raw chunk 1`] = `
"The SAP Cloud SDK is a comprehensive development toolkit designed to simplify and accelerate the creation of applications that integrate with SAP solutions, particularly those built on the SAP Business Technology Platform (BTP). It provides developers with libraries, tools, and best practices that streamline the process of connecting to SAP systems, such as S/4HANA and other services available on the SAP Cloud Platform.

Key features of the SAP Cloud SDK include:

1. **Simplified Connectivity**: The SDK offers pre-built libraries to easily interact with SAP services, providing capabilities for authentication, service consumption, and OData/REST client generation.

2. **Multi-cloud Support**: It supports multiple cloud environments, ensuring that applications remain flexible and can be deployed across various cloud providers.

3. **Best Practices and Guidelines**: The SDK includes best practices for development, ensuring high-quality, scalable, and maintainable code.

4. **Project Scaffolding and Code Samples**: Developers can quickly start their projects using provided templates and samples, accelerating the development process and reducing the learning curve.

5. **Extensive Documentation and Community Support**: Ample documentation, tutorials, and an active community help developers overcome challenges and adopt the SDK efficiently.

Overall, the SAP Cloud SDK is an essential tool for developers looking to build cloud-native applications and extensions that seamlessly integrate with SAP's enterprise solutions."
`;
48 changes: 9 additions & 39 deletions packages/orchestration/src/index.ts
Original file line number Diff line number Diff line change
@@ -1,52 +1,22 @@
export type {
CompletionPostResponse,
ChatMessages,
TokenUsage,
TemplatingModuleConfig,
OrchestrationConfig,
ModuleResults,
ModuleConfigs,
MaskingModuleConfig,
MaskingProviderConfig,
GroundingModuleConfig,
DocumentGroundingFilter,
GroundingFilterId,
GroundingFilterSearchConfiguration,
DataRepositoryType,
KeyValueListPair,
SearchDocumentKeyValueListPair,
SearchSelectOptionEnum,
LlmModuleResult,
LlmChoice,
GenericModuleResult,
FilteringModuleConfig,
InputFilteringConfig,
OutputFilteringConfig,
FilterConfig,
ErrorResponse,
DpiEntities,
DpiEntityConfig,
DpiConfig,
CompletionPostRequest,
ChatMessage,
AzureThreshold,
AzureContentSafety,
AzureContentSafetyFilterConfig,
ImageContent,
TextContent,
MultiChatMessageContent,
MultiChatMessage
} from './client/api/schema/index.js';
export * from './client/api/schema/index.js';

export type {
OrchestrationModuleConfig,
LlmModuleConfig,
Prompt,
RequestOptions,
StreamOptions,
DocumentGroundingServiceConfig,
DocumentGroundingServiceFilter,
LlmModelParams
} from './orchestration-types.js';

export { OrchestrationStreamResponse } from './orchestration-stream-response.js';

export { OrchestrationStreamChunkResponse } from './orchestration-stream-chunk-response.js';

export { OrchestrationStream } from './orchestration-stream.js';

export { OrchestrationClient } from './orchestration-client.js';

export {
Expand Down
Loading
Loading