Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add NnfLustreMGT documentation #191

Merged
merged 3 commits into from
Aug 21, 2024
Merged

Add NnfLustreMGT documentation #191

merged 3 commits into from
Aug 21, 2024

Conversation

matthew-richerson
Copy link
Contributor

Document how to create the NnfLustreMGT and ConfigMap for MGTs outside of NNF's control.

Document how to create the NnfLustreMGT and ConfigMap for MGTs outside of NNF's control.

Signed-off-by: Matt Richerson <[email protected]>
combinedMgtMdt: false
standaloneMgtPoolName: ""
[...]
```

### NnfLustreMGT
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a line after the subheading

combinedMgtMdt: false
standaloneMgtPoolName: ""
[...]
```

### NnfLustreMGT
A `NnfLustreMGT` resource tracks which fsnames have been used on the MGT to prevent fsname re-use. Any Lustre file systems that are created through the NNF software will request an fsname to use from a `NnfLustreMGT` resource. Every MGT must have a corresponding `NnfLustreMGT` resource. For MGTs that are hosted on NNF hardware, the `NnfLustreMGT` resources are created automatically. The NNF software also erases any no longer used fsnames from disk for any internally hosted MGTs. For an MGT hosted on an external node, an admin must create an `NnfLustreMGT`. This resource ensures that fsnames will be created in a sequential order without any fsname re-use. However, after an fsname is no longer in use by a file system, it will not be erased from the MGT disk. An admin may decide to periodically run the `lctl erase_lcfg [fsname]` command to remove fsnames that are no longer in use.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
A `NnfLustreMGT` resource tracks which fsnames have been used on the MGT to prevent fsname re-use. Any Lustre file systems that are created through the NNF software will request an fsname to use from a `NnfLustreMGT` resource. Every MGT must have a corresponding `NnfLustreMGT` resource. For MGTs that are hosted on NNF hardware, the `NnfLustreMGT` resources are created automatically. The NNF software also erases any no longer used fsnames from disk for any internally hosted MGTs. For an MGT hosted on an external node, an admin must create an `NnfLustreMGT`. This resource ensures that fsnames will be created in a sequential order without any fsname re-use. However, after an fsname is no longer in use by a file system, it will not be erased from the MGT disk. An admin may decide to periodically run the `lctl erase_lcfg [fsname]` command to remove fsnames that are no longer in use.
A `NnfLustreMGT` resource tracks which fsnames have been used on the MGT to prevent fsname re-use. Any Lustre file systems that are created through the NNF software will request an fsname to use from a `NnfLustreMGT` resource. Every MGT must have a corresponding `NnfLustreMGT` resource. For MGTs that are hosted on NNF hardware, the `NnfLustreMGT` resources are created automatically. The NNF software also erases any unused fsnames from disk for any internally hosted MGTs. For an MGT hosted on an external node, an admin must create an `NnfLustreMGT` resource. This resource ensures that fsnames will be created in a sequential order without any fsname re-use. However, after an fsname is no longer in use by a file system, it will not be erased from the MGT disk. An admin may decide to periodically run the `lctl erase_lcfg [fsname]` command to remove fsnames that are no longer in use.

combinedMgtMdt: false
standaloneMgtPoolName: ""
[...]
```

### NnfLustreMGT
A `NnfLustreMGT` resource tracks which fsnames have been used on the MGT to prevent fsname re-use. Any Lustre file systems that are created through the NNF software will request an fsname to use from a `NnfLustreMGT` resource. Every MGT must have a corresponding `NnfLustreMGT` resource. For MGTs that are hosted on NNF hardware, the `NnfLustreMGT` resources are created automatically. The NNF software also erases any no longer used fsnames from disk for any internally hosted MGTs. For an MGT hosted on an external node, an admin must create an `NnfLustreMGT`. This resource ensures that fsnames will be created in a sequential order without any fsname re-use. However, after an fsname is no longer in use by a file system, it will not be erased from the MGT disk. An admin may decide to periodically run the `lctl erase_lcfg [fsname]` command to remove fsnames that are no longer in use.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

erases any ... fsnames from disk

which disk?

```

* `addresses` - This is a list of LNet addresses that could be used for this MGT. This should match any values that are used in the `externalMgs` field in the `NnfStorageProfiles`.
* `fsNameStart` - The first fsname to use. Subsequent fsnames will be incremented based on this starting fsname (e.g, `aaaaaaaa`, `aaaaaaab`, `aaaaaaac`). fsnames use lowercase letters `'a'`-`'z'`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this need to be a certain length?

* `addresses` - This is a list of LNet addresses that could be used for this MGT. This should match any values that are used in the `externalMgs` field in the `NnfStorageProfiles`.
* `fsNameStart` - The first fsname to use. Subsequent fsnames will be incremented based on this starting fsname (e.g, `aaaaaaaa`, `aaaaaaab`, `aaaaaaac`). fsnames use lowercase letters `'a'`-`'z'`.
* `fsNameBlackList` - This is a list of fsnames that should not be given to any NNF Lustre file systems. If the MGT is hosting any non-NNF Lustre file systems, their fsnames should be included in this blacklist.
* `fsNameStartReference` - This is an optional ObjectReference to a `ConfigMap` that holds a starting fsname. If this field is specified, it takes precedence over the `fsNameStart` field in the spec. The `ConfigMap` will be updated to the next available fsname everytime an fsname is assigned to a new Lustre file system.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* `fsNameStartReference` - This is an optional ObjectReference to a `ConfigMap` that holds a starting fsname. If this field is specified, it takes precedence over the `fsNameStart` field in the spec. The `ConfigMap` will be updated to the next available fsname everytime an fsname is assigned to a new Lustre file system.
* `fsNameStartReference` - This is an optional `ObjectReference` to a `ConfigMap` that holds a starting fsname. If this field is specified, it takes precedence over the `fsNameStart` field in the spec. The `ConfigMap` will be updated to the next available fsname every time an fsname is assigned to a new Lustre file system.

Signed-off-by: Matt Richerson <[email protected]>
combinedMgtMdt: false
standaloneMgtPoolName: ""
[...]
```

### NnfLustreMGT

A `NnfLustreMGT` resource tracks which fsnames have been used on the MGT to prevent fsname re-use. Any Lustre file systems that are created through the NNF software will request an fsname to use from a `NnfLustreMGT` resource. Every MGT must have a corresponding `NnfLustreMGT` resource. For MGTs that are hosted on NNF hardware, the `NnfLustreMGT` resources are created automatically. The NNF software also erases any unused fsnames from the MGT disk for any internally hosted MGTs. For a MGT hosted on an external node, an admin must create an `NnfLustreMGT` resource. This resource ensures that fsnames will be created in a sequential order without any fsname re-use. However, after an fsname is no longer in use by a file system, it will not be erased from the MGT disk. An admin may decide to periodically run the `lctl erase_lcfg [fsname]` command to remove fsnames that are no longer in use.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
A `NnfLustreMGT` resource tracks which fsnames have been used on the MGT to prevent fsname re-use. Any Lustre file systems that are created through the NNF software will request an fsname to use from a `NnfLustreMGT` resource. Every MGT must have a corresponding `NnfLustreMGT` resource. For MGTs that are hosted on NNF hardware, the `NnfLustreMGT` resources are created automatically. The NNF software also erases any unused fsnames from the MGT disk for any internally hosted MGTs. For a MGT hosted on an external node, an admin must create an `NnfLustreMGT` resource. This resource ensures that fsnames will be created in a sequential order without any fsname re-use. However, after an fsname is no longer in use by a file system, it will not be erased from the MGT disk. An admin may decide to periodically run the `lctl erase_lcfg [fsname]` command to remove fsnames that are no longer in use.
A `NnfLustreMGT` resource tracks which fsnames have been used on the MGT to prevent fsname re-use. Any Lustre file systems that are created through the NNF software will request an fsname to use from a `NnfLustreMGT` resource. Every MGT must have a corresponding `NnfLustreMGT` resource. For MGTs that are hosted on NNF hardware, the `NnfLustreMGT` resources are created automatically. The NNF software also erases any unused fsnames from the MGT disk for any internally hosted MGTs.
For a MGT hosted on an external node, an admin must create an `NnfLustreMGT` resource. This resource ensures that fsnames will be created in a sequential order without any fsname re-use. However, after an fsname is no longer in use by a file system, it will not be erased from the MGT disk. An admin may decide to periodically run the `lctl erase_lcfg [fsname]` command to remove fsnames that are no longer in use.


A `NnfLustreMGT` resource tracks which fsnames have been used on the MGT to prevent fsname re-use. Any Lustre file systems that are created through the NNF software will request an fsname to use from a `NnfLustreMGT` resource. Every MGT must have a corresponding `NnfLustreMGT` resource. For MGTs that are hosted on NNF hardware, the `NnfLustreMGT` resources are created automatically. The NNF software also erases any unused fsnames from the MGT disk for any internally hosted MGTs. For a MGT hosted on an external node, an admin must create an `NnfLustreMGT` resource. This resource ensures that fsnames will be created in a sequential order without any fsname re-use. However, after an fsname is no longer in use by a file system, it will not be erased from the MGT disk. An admin may decide to periodically run the `lctl erase_lcfg [fsname]` command to remove fsnames that are no longer in use.

Below is an example `NnfLustreMGT` resource. The `NnfLustreMGT` resource for external MGSs should be created in the `nnf-system` namespace.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Below is an example `NnfLustreMGT` resource. The `NnfLustreMGT` resource for external MGSs should be created in the `nnf-system` namespace.
Below is an example `NnfLustreMGT` resource. The `NnfLustreMGT` resource for external MGSs must be created in the `nnf-system` namespace.

```

* `addresses` - This is a list of LNet addresses that could be used for this MGT. This should match any values that are used in the `externalMgs` field in the `NnfStorageProfiles`.
* `fsNameStart` - The first fsname to use. Subsequent fsnames will be incremented based on this starting fsname (e.g, `aaaaaaaa`, `aaaaaaab`, `aaaaaaac`). fsnames use lowercase letters `'a'`-`'z'`. `fsNameStart` should be exactly 8 characters long.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mention the Lustre length limit. Is it 8 or 6?


### ConfigMap

For external MGTs, the `fsNameStartReference` should be used to point to a `ConfigMap` in the default namespace. The `ConfigMap` should not be removed during an argocd undeploy/deploy. This allows the nnf-sos sofware to be undeployed (including any `NnfLustreMGT` resources), without having the fsname reset back to the `fsNameStart` value on a redeploy. The Configmap that is created should be left empty initially.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
For external MGTs, the `fsNameStartReference` should be used to point to a `ConfigMap` in the default namespace. The `ConfigMap` should not be removed during an argocd undeploy/deploy. This allows the nnf-sos sofware to be undeployed (including any `NnfLustreMGT` resources), without having the fsname reset back to the `fsNameStart` value on a redeploy. The Configmap that is created should be left empty initially.
For external MGTs, the `fsNameStartReference` should be used to point to a `ConfigMap` in the `default` namespace. The `ConfigMap` should not be removed during an argocd undeploy/deploy. This allows the nnf-sos sofware to be undeployed (including any `NnfLustreMGT` resources), without having the fsname reset back to the `fsNameStart` value on a redeploy. The Configmap that is created should be left empty initially.


### ConfigMap

For external MGTs, the `fsNameStartReference` should be used to point to a `ConfigMap` in the default namespace. The `ConfigMap` should not be removed during an argocd undeploy/deploy. This allows the nnf-sos sofware to be undeployed (including any `NnfLustreMGT` resources), without having the fsname reset back to the `fsNameStart` value on a redeploy. The Configmap that is created should be left empty initially.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"argocd undeploy/deploy" => "undeploy/deploy".
I don't think we want to weave argocd so tightly into our docs.


### Argocd

* An empty ConfigMap should be deployed with the `0-early-config` application. It should be created in the `default` namespace, and it can have any name.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oof.
Ok, "application" should be Application, since that's the name of a resource Kind. Now we have to explain what 0-early-config is, and we have no other argocd references anywhere in the NearNodeFlash docs, as far as I recall.

Can this be stripped-down and rewritten without reference to ArgoCD?

Signed-off-by: Matt Richerson <[email protected]>
@matthew-richerson matthew-richerson merged commit 34b12a9 into main Aug 21, 2024
1 check passed
@matthew-richerson matthew-richerson deleted the lustremgt branch August 21, 2024 16:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Closed
Development

Successfully merging this pull request may close these issues.

4 participants