Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC] 0014 Task Workers #897

Open
wants to merge 14 commits into
base: main
Choose a base branch
from
252 changes: 252 additions & 0 deletions rfcs/0014-task-workers.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,252 @@
- Start Date: 2023-11-08
- RFC PR: [#897](https://github.com/SAP/ui5-tooling/pull/897)
- Issue: -
- Affected components <!-- Check affected components by writing an "X" into the brackets -->
+ [x] [ui5-builder](https://github.com/SAP/ui5-builder)
+ [ ] [ui5-server](https://github.com/SAP/ui5-server)
+ [ ] [ui5-cli](https://github.com/SAP/ui5-cli)
+ [ ] [ui5-fs](https://github.com/SAP/ui5-fs)
+ [x] [ui5-project](https://github.com/SAP/ui5-project)
+ [ ] [ui5-logger](https://github.com/SAP/ui5-logger)


# RFC 0014 Task Workers

## Summary
<!-- You can either remove the following explanatory text or move it into this comment for later reference -->

Concept for a new API provided to UI5 Tooling tasks, enabling easy use of Node.js [Worker Threads](https://nodejs.org/api/worker_threads.html) to execute CPU intensive operations outside of the main thread.

## Motivation
<!-- You can either remove the following explanatory text or move it into this comment for later reference -->

The two existing tasks `minify` and `buildThemes` should share the same pool of [workers](https://nodejs.org/api/worker_threads.html) so that there is no unnecessary teardown/startup of workers during a build.

The pool should also be re-used when multiple projects are being built, either in a `ui5 build --all` scenario, or concurrent project builds (as suggested in https://github.com/SAP/ui5-tooling/issues/894) within the same Node.js process to prevent creating multiple workerpools which might slow down the overall build or even system performance.

## Detailed design
<!-- You can either remove the following explanatory text or move it into this comment for later reference -->

### Terminology

* **`Worker`**: A Node.js [Worker thread](https://nodejs.org/api/worker_threads.html) instance
* **`Task`**: A UI5 Tooling task such as `minify` or `buildThemes` (both standard tasks) or any [custom task](https://sap.github.io/ui5-tooling/stable/pages/extensibility/CustomTasks/)
* **`Task Processor`**: A module associated with a UI5 Tooling task (standard or custom) that can be executed in a worker
* **`Build Context`**: An already existing ui5-project module, coupled to the lifecycle of a Graph Build. It shall be extended to provide access to the Work Dispatcher` by forwarding requests from tasks
* **`Thread Runner`**: A `@ui5/project` module that will be loaded in a worker. It handles communication with the main thread and executes a task processor on request
* **`Work Dispatcher`**: A `@ui5/project` singleton module which uses a library like [`workerpool`](https://github.com/josdejong/workerpool) to spawn and manage worker instances in order to have them execute any task processor requested by the task
- Handles the worker lifecycle

![](./resources/0014-task-workers/Overview.png)

### Key Design Decisions

* Task processors shall be called with a defined signature as described [below](#task-processor)
* A task processor should not be exposed to Worker-specific API
- i.e. it can be executed on the main thread as well as in a Worker
- This allows UI5 Tooling to dynamically decide whether to use Workers or not
+ For example in CI environments where only one CPU core is available to the build, Workers might cause unnecessary overhead
+ Users might want to disable Workers to easily debug issues in processors
+ The UI5 Tooling build itself might already be running in a Worker
* The work dispatcher and thread runner modules will handle all inter-process communication
- This includes serializing and de-serializing `@ui5/fs/Resource` instances
* Custom tasks can opt into this feature by defining one ore more task processor modules in their ui5.yaml
* A task can only invoke its own task processor(s)
* The work dispatcher or thread runners have no understanding of dependencies between the workloads
- Tasks are responsible for waiting on the completion of their processors
- The execution of task processors should be dispatched to workers in a first in, first out order
- Task processors can finish in any order, and the result is supplied to the task immediately. A long running processor might therefore finish either before or also after another processor that has been started after it

### Assumptions

* A task processor is assumed to utilize a single CPU thread by 90-100%
- Accordingly they are also assumed to execute little to no I/O operations
* A Worker should never execute more than one task processor at a time
* Task processors are generally stateless

### Task Processor

[Processors](https://sap.github.io/ui5-tooling/stable/pages/Builder/#processors) are an established concept in UI5 Tooling but not yet exposed to custom tasks. The basic idea is that tasks act as the glue code that connects a more generic processor to UI5 Tooling. For example, UI5 Tooling processors make use of very little UI5 Tooling API, making them easily re-usable in different environments like plain Node.js scripts.

With this RFC, we extend this concept to custom tasks. A task can define one or more processors and execute them with a defined API. Their execution is managed by UI5 Tooling, which might execute them on the main thread or in a worker.

#### Input Parameters

* **`resources`**: An array of `@ui5/fs/Resource` provided by the task
* **`options`**: An object provided by the task
* **`fs`**: An optional fs-interface provided by the task
* **`log`**: A @ui5/logger instance. TODO: Should the log messages be printed directly or sent back to the main thread (and grouped there)?
* **`resourceFactory`** Specification-version dependent object providing helper functions to create and manage resources.
- **`resourceFactory.createResource`** Creates a `@ui5/fs/Resource` (similar to [TaskUtil#resourceFactory.createResource](https://sap.github.io/ui5-tooling/stable/api/@ui5_project_build_helpers_TaskUtil.html#~resourceFactory))
- No other API for now and now general "ProcessorUtil" or similar, since processors should remain as UI5 Tooling independent as possible

**_Potential future additions:_**
* _**`workspace`**: An optional workspace __reader__ provided by the task_
* _**`dependencies`**: An optional dependencies reader provided by the task_
* _**`reader`**: An optional generic reader provided by the task_

#### Return Values

The allowed return values are rather generic. But since UI5 Tooling needs to serialize and de-serialize the values while transferring them back to the main thread, there are some limitations.

The thread runner shall validate the **return value must be either**:
1. A value that adheres to the requirements stated in [Serializing Data](#serializing-data)
2. A flat object (`[undefined, Object].includes(value.constructor)`, to detect `Object.create(null)` and `{}`) with property values adhering to the requirements stated in [Serializing Data](#serializing-data)
3. An array (`Array.isArray(value)`) with values adhering to the requirements stated in [Serializing Data](#serializing-data)

Note that nested objects or nested arrays must not be allowed until we become aware of any demand for that.

Processors should be able to return primitives and `@ui5/fs/Resource` instances directly:
```js
return createResource({
path: "resource/path"
string: "content"
});
````

It should also be possible to return simple objects with primitive values or `@ui5/fs/Resource` instances:

```js
return {
code: "string",
map: "string",
counter: 3,
someResource: createResource({
path: "resource/path"
string: "content"
}),
}
```

Alternatively, processors might also return a lists of primitives or `@ui5/fs/Resource` instances:

```js
return [
createResource({
path: "resource/path"
string: "content"
}),
createResource({
path: "resource/path"
string: "content"
}),
//...
]
```

#### Processor Example

```js
/**
* Task Processor example
*
* @param {Object} parameters Parameters
* @param {@ui5/fs/Resource[]} parameters.resources Array of resources provided by the task
* @param {Object} parameters.options Options provided by the calling task
* @param {@ui5/fs/fsInterface} parameters.fs [fs interface]{@link module:@ui5/fs/fsInterface}-like class that internally handles communication with the main thread
* @param {@ui5/project/ProcessorResourceFactory} parameters.resourceFactory Helper object providing functions for creating and managing resources
* @returns {Promise<object|Array|@ui5/fs/Resource|@ui5/fs/Resource[]>} Promise resolving with either a flat object containing Resource instances as values, or an array of Resources
*/
module.exports = function({resources, options, fs, log, resourceFactory}) {
// [...]
};
````

### Task Configuration

```yaml
specVersion: "3.3"
kind: extension
type: task
metadata:
name: pi
task:
path: lib/tasks/pi.js
# Option 1
processors:
computePi: lib/tasks/piProcessor.js

# Option 2
processors:
- name: computePi
path: lib/tasks/piProcessor.js


# Option 3
processors:
computePi:
path: lib/tasks/piProcessor.js
```

**TODO:** Decide on configuration style. Option 1 does not allow for introducing additional per-processor configuration in the future (e.g. max CPU threads, priority, etc.)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find option 3 the most consistent one, as it has a very similar structure as the task itself, and order does not matter (right?), so there's no reason for option 2.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree and updated that part


### Task API

Tasks defining processors in their `ui5.yaml` configuration shall be provided with a new `processors` object, allowing them to trigger execution of the configured processors.

The `processors.execute` function shall accept the following parameters:
* `resources` _(optional)_: Array of `@ui5/fs/Resource` instances if required by the processor
* `options` _(optional)_: An object with configuration for the processor.
* `reader` _(optional)_: An instance of `@ui5/fs/AbstractReader` which will be used to read resources requested by the task processor. If supplied, the task processor will be provided with a `fs` parameter to read those resources


The `execute` function shall validate that `resources` only contains `@ui5/fs/Resource` instances and that `options` adheres to the requirements stated in [Serializing Data](#serializing-data).

#### Task Example

```js
/**
* Custom task example
*
* @param {Object} parameters Parameters
* @param {DuplexCollection} parameters.workspace DuplexCollection to read and write files
* @param {AbstractReader} parameters.dependencies Reader or Collection to read dependency files
* @param {@ui5/project/build/helpers/TaskUtil|object} [parameters.taskUtil] TaskUtil
* @param {object} processors
* @param {Object} parameters.options Options
* @param {string} parameters.options.projectName Project name
* @param {string} [parameters.options.configuration] Task configuration if given in ui5.yaml
* @returns {Promise<undefined>} Promise resolving with undefined once data has been written
*/
module.exports = function({workspace, taskUtil, processors, options}) {
RandomByte marked this conversation as resolved.
Show resolved Hide resolved
const res = await processors.execute("computePi", {
resources: [workspace.byPath("/already-computed.txt")] // Input resources
options: { // Processor configuration
digits: 1_000_000_000_000_000_000_000
},
reader: workspace // To allow the processor to read additional files if necessary
});
await workspace.write(res);
// [...]
};
````

### Serializing Data

In order to ensure all data supplied to- and returned from- a processor can be serialized correctly, the following checks must be implemented:

In case of an object, all property values and in case of an array, all values must be either [**primitives**](https://developer.mozilla.org/en-US/docs/Glossary/Primitive) (except `symbol`?) or **`@ui5/fs/Resource`** instances (do not use `instanceof` checks since Resource instances might differ depending on the specification version).

Note: Instances of `@ui5/fs/Resource` might loose their original `stat` value since it is not fully serializable. Any serializable information will be preserved however.

## How we teach this
<!-- You can either remove the following explanatory text or move it into this comment for later reference -->

* Documentation for custom task developers on how to decide whether a task should use processors or not. For instance depending on their CPU demand

## Drawbacks
<!-- You can either remove the following explanatory text or move it into this comment for later reference -->

**TODO**

## Alternatives
<!-- You can either remove the following explanatory text or move it into this comment for later reference -->

**TODO**

## Unresolved Questions and Bikeshedding
<!-- You can either remove the following explanatory text or move it into this comment for later reference -->

*This section should be removed (i.e. resolved) before merging*

**TODO**
Binary file added rfcs/resources/0014-task-workers.graffle
Binary file not shown.
Binary file added rfcs/resources/0014-task-workers/Overview.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading