Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[proposal] Update the devcontainer.json schema to be able to represent a 'mergedConfiguration' #206

Open
wants to merge 12 commits into
base: main
Choose a base branch
from
141 changes: 141 additions & 0 deletions proposals/consolidate-merged-lifecycle-hook-schema.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,141 @@
# Goal

This proposal aims to raise a concept of the 'mergedConfiguration' into the specification, so that implementing tools can generate, as well as consume, a 'merged' `devcontainer.json` file that is the computed merge after parsing a user 'devcontainer.json'. [The merging logic is already documented](https://containers.dev/implementors/spec/#merge-logic). This proposal focuses on standardizing an output JSON format to represent this merge.

Properties that can not or should not be represented as `devcontainer.json` properties are prepended with a `$`. This proposal standardizes how these properties should be processed and outputted to file, so that implementing tools can write/consume them consistently.

## Motivation

The current generated 'mergedConfiguration' returned by the `read-configuration` CLI command does not return a `devcontainer.json`. [(visualization)](https://github.com/devcontainers/cli/pull/390#issuecomment-1430190326). By standardizing the output format, implementing tools can generate a portable, merged `devcontainer.json` that can be consumed by other tools.

## Proposal

### A. Update the devcontainer.json schema

#### A1. Extends lifecycle script properties

Add a value (`LifecycleCommandParallel[]`) to the unioned definition of each lifecycle hook (except 'initializeCommand').

> NOTE: _This was chosen as every existing lifecycle command (and merged version of each command) can be represented within that type._

```typescript

export type LifecycleCommand = string | string[]
export type LifecycleCommandParallel = { [key: string]: LifecycleCommand; $origin?: string };

interface DevContainerConfig {
// ...other devcontainer.json properties...
onCreateCommand?: LifecycleCommand | LifecycleCommandParallel | LifecycleCommandParallel[]
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that this means that a devcontainer-feature.json could specify something like:

{"onCreateCommand": [{
  "foo": "script/foo",
  "$origin": "foo"
}, {
  "foo": ["date > /tmp/last-ran"]
}]

Should it be considered an error if a devcontainer-feature.json specifies a list of objects? Should tools ignore an $origin property when reading from a json file directly (versus from image metadata)?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, this change specifically affects the schema for devcontainer.json, I see. I hadn't realized that devcontainer-feature.json has a distinct json schema.

Copy link
Member Author

@joshspicer joshspicer Jun 28, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I don't think we'd want to update the devcontainer-feature.json schema, which is very close to a devcontainer.json but distinct. The goal of the $ properties as members of the devcontainer.json is so that a command such as devcontainer read-configuration can output a json object that is also a devcontainer.json. That's not the case today without these changes.

Copy link

@avidal avidal Nov 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The goal of the $ properties as members of the devcontainer.json is so that a command such as devcontainer read-configuration can output a json object that is also a devcontainer.json. That's not the case today without these changes.

I've been thinking more about this, and I'm not sure that's a good thing to have in the spec itself...it seems like an implementation detail for the specific tool in question. The way some of the other comments read indicates that it'd be "valid" in a devcontainer.json but not actually used (re: a comment above about $entrypoints)...which kind of implies that the output of read-configuration is not actually a devcontainer.json.

That is, it seems like a "proper" implementation would use this schema to encode/decode metadata stored in the image label, and use it internally as the result of merging all of the various metadata sources, but it's not actually valid devcontainer.json config. Effectively splitting out merged metadata into a separate schema and leaving the existing devcontainer.json and devcontainer-feature.json alone.

The way I've been thinking about it lately is that a *.json contains two sets of data: config and metadata. I parse either devcontainer.json or devcontainer-feature.json into Config or FeatureConfig respectively, plus a Metadata entry. I parse the image label into a MergedMetadata entry. Internally I store the config and the merged metadata to drive the build process. I use the merged metadata to render into an image label. I print the config and merged metadata during debug output as two separate entities.

edit: The main reason I'm bringing this up is because it seems to me that this change may require significant explanation in the docs to reduce confusion for implementors vs declaring it as a distinct schema and indicating that it represents the results of the "merge logic table" and should be used as the entry for the devcontainer.metadata image label.

updateContentCommand?: LifecycleCommand | LifecycleCommandParallel | LifecycleCommandParallel[]
postCreateCommand?: LifecycleCommand | LifecycleCommandParallel | LifecycleCommandParallel[]
postStartCommand?: LifecycleCommand | LifecycleCommandParallel | LifecycleCommandParallel[]
postAttachCommand?: LifecycleCommand | LifecycleCommandParallel | LifecycleCommandParallel[]
}
```

An optional parameter `$origin` is added to the `LifecycleCommandParallel` that supporting tools can use to indicate the source of the command. For example, this is useful for outputting in the creation log which Feature provided a certain lifecycle hook. The `$` notation is used to indicate this property is additional tooling metadata that should not be present in a user `devcontainer.json`.
joshspicer marked this conversation as resolved.
Show resolved Hide resolved
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would $origin be the qualified feature id with the version OR devcontainer.json?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that we have Feature dependencies and may have more than one Feature contributing configuration, we should use the "canonical ID" (for OCI Features, expanded with the SHA hash).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although i'm going back and forth here. At the end of the day, the intention I have for $origin is to make it easier to understand from the logs what is happening. It may be more intuitive to print verbatim what is in the devcontainer.json.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the $origin of a bit of config that comes from the devcontainer.metadata image label (assuming it doesn't carry one already)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If a section of metadata in devcontainer.metadata doesn't already have an $origin, we wouldn't anything. It had to come from somewhere (either a base dev container that defined the image, a Feature, or the user's current dev container). If omitted, we don't have any insight into where it came from and we'd need to log something generically in those cases.


As an example, the following `devcontainer.json` snippet would be valid:

```json
{
"onCreateCommand": [
{
"commandA": "echo 'Hello World!'",
"commandB": ["echo", "'Hello World!'"],
"$origin": "featureA"
},
{
"commandA": "echo 'Hello World!'",
"commandB": ["echo", "'Hello World!'"],
"$origin": "devcontainer.json"
}
]
}
```

#### A2. Extend `customizations` property

The merging of the `customizations` property is left to the implementing tool, therefore there is no defined merging pattern.

Extend the interface of to include another unioned type:

```typescript
interface DevContainerConfig {
// ...other devcontainer.json properties...
customizations?: Record<string, {}> | Record<string, any[]>
joshspicer marked this conversation as resolved.
Show resolved Hide resolved
}
```

The addition of union type `Record<string, any[]>` (and explicitly typing the existing case to an object) allows for various tools to contribute customizations to the same namespace. For example, a Feature may contribute a `settings` object, and a user (via `devcontainer.json`) may contribute a `settings` object. The merged `customizations` property would be an array of two objects. This allows for the merging algorithm to be defined by the implementing tool.
joshspicer marked this conversation as resolved.
Show resolved Hide resolved

Similar to above, the `$origin` property may be added by tools to indicate the source of the customization.

In the below example, the `vscode` customization has two entries (contributed from two different sources), and the `foo` customization has two entries (contributed from two different sources). The merged `customizations` property would be an array of two objects for each customization namespace.

```json
"customizations": {
// Customizations for the 'vscode' namespace
"vscode": [
{
"settings": {
"settingA": "local",
"settingB": "/usr/bin/lldb"
},
"extensions": [
"GitHub.vscode-pull-request-github"
],
"$origin": "featureA"
},
{
"settings": {
"settingA": "local",
"settingC": true
},
"$origin": "devcontainer.json"
}
],
// Customizations for the 'foo' namespace
"foo": [
{
"bar": "baz",
"b": true,
"$origin": "featureA"
},
{
"bar": "baz",
"a": true,
"$origin": "devcontainer.json"
}
]
}
...
```

### B. Generating a merged configuration

The [reference implementation](https://github.com/devcontainers/cli) includes a 'read-configuration' command that can be used to resolve a fully merged configuration from a given `devcontainer.json`.

Implementing tools should follow the [documented merging logic](https://containers.dev/implementors/spec/#merge-logic) and output a `devcontainer.json` that represents the resolved configuration.
joshspicer marked this conversation as resolved.
Show resolved Hide resolved

Special cases are detailed below.

### B1. The `entrypoint` property

Dev Container Features are able to contribute an `entrypoint` property. This property is not available in the `devcontainer.json`.

A property `$entrypoints` should be added to the merged configuration containing an array of entrypoint strings. Tooling that reads the 'mergedConfiguration' can process this property as needed.
joshspicer marked this conversation as resolved.
Show resolved Hide resolved

In the below example, the `entrypoint` property is contributed from two different sources. The merged `$entrypoints` property would be an array of two strings, one for each entrypoint.

```json
"$entrypoints": [
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we ok with users including this in their devcontainer.json? I guess we might see this if we don't add a regular entrypoint property.

We don't have the $origin here which would make sense for consistency.

Copy link
Member Author

@joshspicer joshspicer Jun 27, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is ok. I don't think we'll necessarily do anything with this value if it's in a devcontainer.json as input, right? The source of truth is still in the image label - this section of the spec is as a standardized output for read-configuration. I imagine all the $ are ignored during a build and used by humans/loggers.

"/usr/bin/entrypointA",
"/usr/bin/entrypointB"
],
```

## Summary

With these additions, the 'mergedConfiguration' returned by `read-configuration` can directly return a `devcontainer.json` without needing non-spec properties (ie: plural `postCreateCommands`). The addition of `$origin` in the `LifecycleCommandParallel` and `customizations` array variant types allows for tooling to indicate the source of provided functionality. The addition of `$entrypoints` allows for tooling to output metadata on the contributed entrypoint(s) without needing to add non-schema properties to the `devcontainer.json`.