How can we handle schema annotations in DPDS? #37
Replies: 2 comments
-
Annotations examplesThe following examples define in all relevant formats a simplified schema for a dataset storing data about Persons. To all the schema the examples show how to add the following annotations
JSON SchemaAnnotations managementIn JSON Schema, annotations are special keywords that provide additional information about the schema without affecting its validation process. Any keyword not reserved for validation can be used as an annotation. Annotation keywords can also be formalized through vocabularies. Example{
"$schema": "http://json-schema.org/draft/2020-12/schema",
"type": "object",
"required": ["givenName", "familyName"],
"displayName": "Customer",
"owner": "Andrea Gioia",
"properties": {
"givenName": {
"type": "string",
"displayName": "First Name"
},
"familyName": {
"type": "string",
"displayName": "Last Name"
}
"age": {
"type": "integer",
"minimum": 0
},
"email": {
"type": "string",
"format": "email",
"encoding": "UTF-8"
}
}
} Validate JSON schema here YAML SchemaAnnotations managementtodo Example$schema: http://json-schema.org/draft/2020-12/schema
type: object
required:
givenName
familyName
#annotations
displayName: Customer
owner: Andrea Gioia
properties:
givenName:
type: string
#annotations
displayName: First Name
familyName:
type: string
#annotations
displayName: Last Name
age:
type: integer
minimum: 0
email:
type: string
format: email
#annotations
encoding: UTF-8 Validate YAML schema here AVRO SchemaAnnotations managementIn AVRO all attributes not defined in the specification document are permitted as metadata, but must not affect the format of serialized data. Example{
"type": "record",
"name": "Person",
"displayName": "Customer",
"owner": "Andrea Gioia",
"fields": [
{
"name": "givenName",
"type": "string",
"displayName": "First Name"
},
{
"name": "familyName",
"type": "string",
"displayName": "Last Name"
},
{
"name": "age",
"type": "int"
},
{
"name": "email",
"type": "string",
"encoding": "UTF-8"
}
]
} PROTOBUF SchemaAnnotations managementProtobuf offers a limited form of annotations through options. You can define options as key-value pairs attached to messages, fields, enums, and services. These options are not processed by the protobuf compiler itself, but you can use them for custom tooling or code generation. Examplesyntax = "proto3";
message Person {
// annotations
option(dispalyName) = "Customer";
option(entity).owner = "Andrea Gioia";
string given_name = 1;
[(dispalyName) = " First Name"]; // annotations
string family_name = 2;
[(dispalyName) = " First Name"]; // annotations
int32 age = 3;
string email = 4;
option(field).encoding = "UTF-8"; // annotations
} XSD SchemaAnnotations managementIn XML Schema (XSD), Here's a breakdown of how it works:
Overall, Example<?xml version="1.0" encoding="UTF-8"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" targetNamespace="http://your.schema.namespace" elementFormDefault="qualified" attributeFormDefault="unqualified">
<xs:element name="Person">
<xs:annotation>
<xs:appinfo>
displayName: Customer
owner: Andrea Gioia
</xs:appinfo>
</xs:annotation>
<xs:complexType>
<xs:sequence>
<xs:element name="givenName" type="xs:string" minOccurs="1">
<xs:annotation>
<xs:appinfo>
displayName: First Name
</xs:appinfo>
</xs:annotation>
</xs:element>
<xs:element name="familyName" type="xs:string" minOccurs="1">
<xs:annotation>
<xs:appinfo>
displayName: First Name
</xs:appinfo>
</xs:annotation>
</xs:element>
<xs:element name="age" type="xs:int" minOccurs="0">
<xs:restriction base="xs:int">
<xs:minInclusive value="0"/>
</xs:restriction>
</xs:element>
<xs:element name="email" type="xs:string" minOccurs="0">
<xs:restriction base="xs:string">
<xs:pattern value="^[^@\s]+@[^@\s]+\.[^@\s]+$"/>
</xs:restriction>
<xs:annotation>
<xs:appinfo>
encoding: UTF-8
</xs:appinfo>
</xs:annotation>
</xs:element>
</xs:sequence>
</xs:complexType>
</xs:element>
</xs:schema> |
Beta Was this translation helpful? Give feedback.
-
Ok, we have collected enough ideas to open a new RFC. Let's create an issue. |
Beta Was this translation helpful? Give feedback.
-
Let's brainstorm here on how to handle schema annotations in DPDS.
The ideas collected in this thread could become the foundation for a new RFC proposal.
What is a schema annotations anyway?
A dataset schema is the definition of how data is organized within a dataset. Schema definition can be used to validate shared data structure (schema validation), annotate data structure with metadata (schema annotation), or create the data structure on a specific datastore (schema creation or data definition language).
In DPDS dataset schemas are used to describe the data shared by ports. The schema definition is not part of the specification. It is delegated to the specification used to describe the API of the specific port. The schema definition format depends on the format used by the API specification to serialize exchanged data. The following tables show some commonly used schema definition format
Why it matters?
Through schema annotation is possible to provide metadata to describe the underlying data beyond the way it is structured. Examples of information that can be encoded through schema annotations are:
While each schema definition format has different modalities to define annotations it's important to define a common vocabulary for the information that can be used to enrich the schema definition. In other words, it's important to define the admissible annotations and their meaning.
Describing the schema of a dataset using standard schema definition format and annotations is better than defining a custom structure to store schema metadata for the following reasons:
Some useful resources
Beta Was this translation helpful? Give feedback.
All reactions