Skip to content

Commit

Permalink
Add documentation on how to add new properties. (#601)
Browse files Browse the repository at this point in the history
  • Loading branch information
marcenacp authored Mar 13, 2024
1 parent 1afdaf9 commit a566b7d
Show file tree
Hide file tree
Showing 11 changed files with 69 additions and 8 deletions.
2 changes: 2 additions & 0 deletions docs/mlcroissant/Field.html
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,8 @@ <h1>Croissant &#129360</h1>
Croissant specifications as implemented by <code><a href="https://pypi.org/project/mlcroissant">mlcroissant</a></code>.
<br>
For the actual specifications, please refer to the <a href="https://mlcommons.org/croissant/1.0">Croissant 1.0 standard.</a>.
<br>
To add new properties, refer to the <a href="https://github.com/mlcommons/croissant/tree/main/python/mlcroissant#add-new-properties-to-the-standard">documentation</a>.
</header>
<body>
<h3>Field</h3>
Expand Down
2 changes: 2 additions & 0 deletions docs/mlcroissant/FileObject.html
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,8 @@ <h1>Croissant &#129360</h1>
Croissant specifications as implemented by <code><a href="https://pypi.org/project/mlcroissant">mlcroissant</a></code>.
<br>
For the actual specifications, please refer to the <a href="https://mlcommons.org/croissant/1.0">Croissant 1.0 standard.</a>.
<br>
To add new properties, refer to the <a href="https://github.com/mlcommons/croissant/tree/main/python/mlcroissant#add-new-properties-to-the-standard">documentation</a>.
</header>
<body>
<h3>FileObject</h3>
Expand Down
2 changes: 2 additions & 0 deletions docs/mlcroissant/FileSet.html
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,8 @@ <h1>Croissant &#129360</h1>
Croissant specifications as implemented by <code><a href="https://pypi.org/project/mlcroissant">mlcroissant</a></code>.
<br>
For the actual specifications, please refer to the <a href="https://mlcommons.org/croissant/1.0">Croissant 1.0 standard.</a>.
<br>
To add new properties, refer to the <a href="https://github.com/mlcommons/croissant/tree/main/python/mlcroissant#add-new-properties-to-the-standard">documentation</a>.
</header>
<body>
<h3>FileSet</h3>
Expand Down
2 changes: 2 additions & 0 deletions docs/mlcroissant/Metadata.html
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,8 @@ <h1>Croissant &#129360</h1>
Croissant specifications as implemented by <code><a href="https://pypi.org/project/mlcroissant">mlcroissant</a></code>.
<br>
For the actual specifications, please refer to the <a href="https://mlcommons.org/croissant/1.0">Croissant 1.0 standard.</a>.
<br>
To add new properties, refer to the <a href="https://github.com/mlcommons/croissant/tree/main/python/mlcroissant#add-new-properties-to-the-standard">documentation</a>.
</header>
<body>
<h3>Metadata</h3>
Expand Down
11 changes: 11 additions & 0 deletions docs/mlcroissant/RecordSet.html
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,8 @@ <h1>Croissant &#129360</h1>
Croissant specifications as implemented by <code><a href="https://pypi.org/project/mlcroissant">mlcroissant</a></code>.
<br>
For the actual specifications, please refer to the <a href="https://mlcommons.org/croissant/1.0">Croissant 1.0 standard.</a>.
<br>
To add new properties, refer to the <a href="https://github.com/mlcommons/croissant/tree/main/python/mlcroissant#add-new-properties-to-the-standard">documentation</a>.
</header>
<body>
<h3>RecordSet</h3>
Expand Down Expand Up @@ -48,6 +50,15 @@ <h3>RecordSet</h3>
<td>One or more records that constitute the data of the `RecordSet`.</td>
</tr>

<tr>
<td><a href="http://mlcommons.org/croissant/dataType">cr:dataType</a></td>
<td>

</td>
<td>MANY</td>
<td>The data type of the RecordSet. Mainly used to specify: `sc:Enumeration`.</td>
</tr>

<tr>
<td><a href="https://schema.org/description">sc:description</a></td>
<td>
Expand Down
40 changes: 40 additions & 0 deletions python/mlcroissant/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -87,6 +87,46 @@ metadata=mlc.nodes.Metadata(
metadata.to_json() # this returns the JSON-LD file.
```

## Add new properties to the standard

Nodes (Metadata, RecordSets, etc) implement [PEP 681](https://peps.python.org/pep-0681/). So you can declare RDF triplets using the [dataclass](https://docs.python.org/3/library/dataclasses.html) syntax.

**Example 1**: implement [CreativeWork](https://schema.org/CreativeWork):

```python
@mlc_dataclasses.dataclass
class CreativeWork(Node):

JSONLD_TYPE = SDO.CreativeWork # https://schema.org/CreativeWork

name: str | None = mlc_dataclasses.jsonld_field(
cardinality="ONE", # Cardinality can be ONE or MANY
default=None, # Specify the default value in Python
description="The name of the item.", # The full description
input_types=[SDO.Text], # The schema.org type
url=SDO.name, # The URL of the property
)
```

**Example 2**: implement [RecordSet](https://mlcommons.github.io/croissant/docs/croissant-spec.html#recordset):

```python
@mlc_dataclasses.dataclass
class RecordSet(Node):
JSONLD_TYPE = constants.ML_COMMONS_RECORD_SET_TYPE

fields: list[Field] = mlc_dataclasses.jsonld_field(
cardinality="MANY", # Example with cardinality=="MANY"
default_factory=list,
description=(
"A data element that appears in the records of the RecordSet (e.g., one"
" column of a table)."
),
input_types=[Field], # Types can also be other nodes (here `Field`)
url=constants.ML_COMMONS_FIELD,
)
```

## Run tests

All tests can be run from the Makefile:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -38,6 +38,8 @@ class ParentField(Node):
class Field(Node):
"""Nodes to describe a dataset Field."""

JSONLD_TYPE = constants.ML_COMMONS_FIELD_TYPE

description: str | None = mlc_dataclasses.jsonld_field(
default=None,
input_types=[SDO.Text],
Expand Down Expand Up @@ -129,8 +131,6 @@ def __post_init__(self):
self.source.check_source(self.add_error)
self._standardize_data_types()

JSONLD_TYPE = constants.ML_COMMONS_FIELD_TYPE

def _standardize_data_types(self):
"""Converts data_types to a list of rdflib.URIRef."""
data_types = self.data_types
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,8 @@
class FileSet(Node):
"""Nodes to describe a dataset FileSet (distribution)."""

JSONLD_TYPE = constants.SCHEMA_ORG_FILE_SET

contained_in: list[str] | None = mlc_dataclasses.jsonld_field(
cardinality="MANY",
default_factory=list,
Expand Down Expand Up @@ -71,5 +73,3 @@ def __post_init__(self):
uuid_field = "name" if self.ctx.is_v0() else "id"
self.validate_name()
self.assert_has_mandatory_properties("includes", "encoding_format", uuid_field)

JSONLD_TYPE = constants.SCHEMA_ORG_FILE_SET
Original file line number Diff line number Diff line change
Expand Up @@ -36,6 +36,8 @@
class Metadata(Node):
"""Nodes to describe a dataset metadata."""

JSONLD_TYPE = constants.SCHEMA_ORG_DATASET

cite_as: str | None = mlc_dataclasses.jsonld_field(
default=None,
description=(
Expand Down Expand Up @@ -328,8 +330,6 @@ def __post_init__(self):
self.ctx, self.ctx.conforms_to
)

JSONLD_TYPE = constants.SCHEMA_ORG_DATASET

def to_json(self) -> Json:
"""Converts the `Metadata` to JSON."""
context = self.ctx.rdf.context
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,8 @@ def json_from_jsonld(ctx: Context, data) -> Json | None:
class RecordSet(Node):
"""Nodes to describe a dataset RecordSet."""

JSONLD_TYPE = constants.ML_COMMONS_RECORD_SET_TYPE

data: list[Json] | None = mlc_dataclasses.jsonld_field(
cardinality="MANY",
default=None,
Expand Down Expand Up @@ -168,8 +170,6 @@ def check_joins_in_fields(self):
" documentation for more information."
)

JSONLD_TYPE = constants.ML_COMMONS_RECORD_SET_TYPE


def get_parent_uuid(ctx: Context, uuid: str) -> str | None:
"""Retrieves the UID of the parent, e.g. `file/column` -> `file`."""
Expand Down
2 changes: 2 additions & 0 deletions python/mlcroissant/mlcroissant/scripts/templates/node.html
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,8 @@ <h1>Croissant &#129360</h1>
Croissant specifications as implemented by <code><a href="https://pypi.org/project/mlcroissant">mlcroissant</a></code>.
<br>
For the actual specifications, please refer to the <a href="https://mlcommons.org/croissant/1.0">Croissant 1.0 standard.</a>.
<br>
To add new properties, refer to the <a href="https://github.com/mlcommons/croissant/tree/main/python/mlcroissant#add-new-properties-to-the-standard">documentation</a>.
</header>
<body>
<h3>{{ title }}</h3>
Expand Down

0 comments on commit a566b7d

Please sign in to comment.