Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add status.requiredDaemons to DirectiveBreakdown #177

Merged
merged 3 commits into from
Jul 5, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 8 additions & 0 deletions docs/guides/data-movement/readme.md
Original file line number Diff line number Diff line change
Expand Up @@ -90,6 +90,14 @@ The `CreateRequest` API call that is used to create Data Movement with the Copy
options to allow a user to specify some options for that particular Data Movement. These settings
are on a per-request basis.

The Copy Offload API requires the `nnf-dm` daemon to be running on the compute node. This daemon may be configured to run full-time, or it may be left in a disabled state if the WLM is expected to run it only when a user requests it. See [Compute Daemons](../compute-daemons/readme.md) for the systemd service configuration of the daemon. See `RequiredDaemons` in [Directive Breakdown](../directive-breakdown/readme.md) for a description of how the user may request the daemon, in the case where the WLM will run it only on demand.

If the WLM is running the `nnf-dm` daemon only on demand, then the user can request that the daemon be running for their job by specifying `requires=copy-offload` in their `DW` directive. The following is an example:

```bash
#DW jobdw type=xfs capacity=1GB name=stg1 requires=copy-offload
```

roehrich-hpe marked this conversation as resolved.
Show resolved Hide resolved
See the [DataMovementCreateRequest API](copy-offload-api.html#datamovement.DataMovementCreateRequest)
definition for what can be configured.

Expand Down
30 changes: 30 additions & 0 deletions docs/guides/directive-breakdown/readme.md
Original file line number Diff line number Diff line change
Expand Up @@ -149,3 +149,33 @@ A location constraint consists of an `access` list and a `reference`.
* `status.compute.constraints.location.access` is a list that specifies what type of access the compute nodes need to have to the storage allocations in the allocation set. An allocation set may have multiple access types that are required
* `status.compute.constraints.location.access.type` specifies the connection type for the storage. This can be `network` or `physical`
* `status.compute.constraints.location.access.priority` specifies how necessary the connection type is. This can be `mandatory` or `bestEffort`

## RequiredDaemons

The `status.requiredDaemons` section of the `DirectiveBreakdown` tells the WLM about any driver-specific daemons it must enable for the job; it is assumed that the WLM knows about the driver-specific daemons and that if the users are specifying these then the WLM knows how to start them. The `status.requiredDaemons` section will exist only for `jobdw` and `persistentdw` directives. An example of the `status.requiredDaemons` section is included below.

```yaml
status:
...
requiredDaemons:
- copy-offload
...
```

The allowed list of required daemons that may be specified is defined in the [nnf-ruleset.yaml for DWS](https://github.com/NearNodeFlash/nnf-sos/blob/master/config/dws/nnf-ruleset.yaml), found in the `nnf-sos` repository. The `ruleDefs.key[requires]` statement is specified in two places in the ruleset, one for `jobdw` and the second for `persistentdw`. The ruleset allows a list of patterns to be specified, allowing one for each of the allowed daemons.

The `DW` directive will include a comma-separated list of daemons after the `requires` keyword. The following is an example:

```bash
#DW jobdw type=xfs capacity=1GB name=stg1 requires=copy-offload
```

The `DWDirectiveRule` resource currently active on the system can be viewed with:

```console
kubectl get -n dws-system dwdirectiverule nnf -o yaml
```

### Valid Daemons

Each site should define the list of daemons that are valid for that site and recognized by that site's WLM. The initial `nnf-ruleset.yaml` defines only one, called `copy-offload`. When a user specifies `copy-offload` in their `DW` directive, they are stating that their compute-node application will use the Copy Offload API Daemon described in the [Data Movement Configuration](../data-movement/readme.md).
Loading