- Title: Machine Learning Dataset
- Identifier: https://raw.githubusercontent.com/earthpulse/ml-dataset/main/json-schema/schema.json
- Field Name Prefix: ml-dataset
- Scope: Catalog
- Extension Maturity Classification: Proposal
- Owner: @earthpulse
This document explains the Machine Learning Dataset Extension to the SpatioTemporal Asset Catalog (STAC) specification.
- Examples:
- Catalog example: Shows the basic usage of the extension in a STAC Catalog
- JSON Schema
- Changelog
The fields in the table below can be used in these parts of STAC documents:
- Catalogs
- Collections
- Item Properties
- Assets (for both Collections and Items, incl. Item Asset Definitions in Collections)
- Links
Field Name | Type | Description |
---|---|---|
ml-dataset:name | string | The name of the dataset |
ml-dataset:tasks | array | List of (suggested) tasks that can be solved with the dataset |
ml-dataset:inputs-type | string | Type of the inputs (text, image, satellite image, video, ... or combination) |
ml-dataset:annotations-type | string | Type of annotations (raster, vector, ...) (not present for unsupervised learning) |
ml-dataset:quality-metrics | array | List of quality metrics that define the quality of the dataset |
ml-dataset:version | string | Dataset version |
ml-dataset:splits | array | List of the splits names. Suggested are Training, Validation, Test, Legacy, Benchmark |
ml-dataset:split-items | array | List of the Items that conform the split |
ml-dataset:split | string | Name of the split the Item is included |
This is a much more detailed description of the field template:new_field
...
This is the introduction for the purpose and the content of the XYZ Object...
Field Name | Type | Description |
---|---|---|
x | number | REQUIRED. Describe the required field... |
y | number | REQUIRED. Describe the required field... |
z | number | REQUIRED. Describe the required field... |
The following types should be used as applicable rel
types in the
Link Object.
Type | Description |
---|---|
ml-dataset:splits | Links to train, test, validation splits if defined |
All contributions are subject to the STAC Specification Code of Conduct. For contributions, please follow the STAC specification contributing guide Instructions for running tests are copied here for convenience.
The same checks that run as checks on PR's are part of the repository and can be run locally to verify that changes are valid.
To run tests locally, you'll need npm
, which is a standard part of any node.js installation.
First you'll need to install everything with npm once. Just navigate to the root of this repository and on your command line run:
npm install
Then to check markdown formatting and test the examples against the JSON schema, you can run:
npm test
This will spit out the same texts that you see online, and you can then go and fix your markdown or examples.
If the tests reveal formatting problems with the examples, you can fix them with:
npm run format-examples