Skip to content

Latest commit

 

History

History
800 lines (740 loc) · 31 KB

1 Introduction.md

File metadata and controls

800 lines (740 loc) · 31 KB

##sec Introduction

This specification adds aggregation functionality to the Open Data Protocol (OData) without changing any of the base principles of OData. It defines semantics and a representation for aggregation of data, especially:

  • Semantics and operations for querying aggregated data,
  • Results format for queries containing aggregated data,
  • Vocabulary terms to annotate what can be aggregated, and how.

##subsec Glossary

##subsubsec Definitions of Terms

This specification defines the following terms:

  • [Aggregatable Expression]{id=AggregatableExpression} – an expression not involving term casts and resulting in a value of a complex or entity or an aggregatable primitive type
  • [Aggregate Expression]{id=AggregateExpression} – argument of the aggregate transformation or function defined in [section ##AggregationAlgorithm]
  • [Aggregatable Primitive Type]{id=AggregatablePrimitiveType} – a primitive type other than Edm.Stream or subtypes of Edm.Geography or Edm.Geometry
  • [Data Aggregation Path]{id=DataAggregationPath} – a path that consists of one or more segments joined together by forward slashes (/). Segments are names of declared or dynamic structural or navigation properties, or type-cast segments consisting of the (optionally qualified) name of a structured type that is derived from the type identified by the preceding path segment to reach properties declared by the derived type.
  • [Expression]{id=Expression} – derived from the commonExpr rule (see OData-ABNF)
  • [Single-Valued Property Path]{id=SingleValuedPropertyPath} – property path ending in a single-valued primitive, complex, or navigation property

##subsubsec Acronyms and Abbreviations

The following non-exhaustive list contains variable names that are used throughout this document:

  • $A,B,C$ – collections of instances
  • $H$ – hierarchical collection
  • $H'$ – subset of nodes from a hierarchical collection
  • $u,v,w$ – instances in a collection
  • $x$ – an instance in a hierarchical collection, called a node
  • $p,q,r$ – paths
  • $S,T$ – transformation sequences
  • $α$aggregate expression, defined in [section ##AggregationAlgorithm]
  • $\Gamma(A,p)$ – the collection that results from evaluating a data aggregation path $p$ relative to a collection $A$, defined in [section ##EvaluationofDataAggregationPaths]
  • $γ(u,p)$ – the collection that results from evaluating a data aggregation path $p$ relative to an instance $u$, defined in [section ##EvaluationofDataAggregationPaths]
  • $\Pi_G(s)$ – a transformation of a collection that injects grouping properties into every instance of the collection, defined in [section ##SimpleGrouping]
  • $σ(x)$ – instance containing a grouping property that represents a node $x$, defined in [section ##Transformationtraverse]

##subsubsec Document Conventions

Keywords defined by this specification use this monospaced font.

Some sections of this specification are illustrated with non-normative examples.

::: example Example ##ex: text describing an example uses this paragraph style

Non-normative examples use this paragraph style.

:::

All examples in this document are non-normative and informative only. Examples labeled with ⚠ contain advanced concepts or make use of keywords that are defined only later in the text, they can be skipped at first reading.

All other text is normative unless otherwise labeled.


##sec Overview

Open Data Protocol (OData) services expose a data model that describes the schema of the service in terms of the Entity Data Model (EDM, see OData-CSDL) and then allows for querying data in terms of this model. The responses returned by an OData service are based on that data model and retain the relationships between the entities in the model.

Extending the OData query features with simple aggregation capabilities avoids cluttering OData services with an exponential number of explicitly modeled "aggregation level entities" or else restricting the consumer to a small subset of predefined aggregations.

Adding the notion of aggregation to OData without changing any of the base principles in OData has two aspects:

  1. Means for the consumer to query aggregated data on top of any given data model (for sufficiently capable data providers)
  2. Means for the provider to annotate what data can be aggregated, and in which way, allowing consumers to avoid asking questions that the provider cannot answer

Implementing any of these two aspects is valuable in itself independent of the other, and implementing both provides additional value for consumers. The provided aggregation annotations help a consumer understand more of the data structure looking at the service's exposed data model. The query extensions allow the consumers to explicitly express the desired aggregation behavior for a particular query. They also allow consumers to formulate queries that utilize the aggregation annotations.

##subsec Example Data Model

::: example Example ##ex: The following diagram depicts a simple model that is used throughout this document.

<style type="text/css"> .st1 {fill:#f2f2f2;stroke:none;stroke-linecap:round;stroke-linejoin:round;stroke-width:0.75} .st2 {fill:#ffffff;stroke:none;stroke-linecap:round;stroke-linejoin:round;stroke-width:0.7} .st3 {fill:#000000;font-size:0.666664em} .st4 {font-size:1em} .st5 {stroke:#000000;stroke-linecap:round;stroke-linejoin:round;stroke-width:0.75} .st6 {fill:none;visibility:hidden} .st7 {stroke:#000000;stroke-linecap:round;stroke-linejoin:round;stroke-width:1.4} .st8 {fill:#000000;font-size:0.833336em} .st9 {fill:#000000;font-size:0.833336em;font-style:italic} .st10 {fill:none} .st11 {marker-start:url(#mrkr1-125);stroke:#000000;stroke-linecap:round;stroke-linejoin:round;stroke-width:0.75;visibility:hidden} .st12 {fill:#000000;fill-opacity:1;stroke:#000000;stroke-opacity:1;stroke-width:0.22935779816514} .st13 {fill:#000000;font-size:0.75em} .st14 {fill:#000000;fill-opacity:1;stroke:#000000;stroke-opacity:1;stroke-width:0.22222222222222} .st15 {stroke:#000000;stroke-linecap:round;stroke-linejoin:round;stroke-width:0.72} .st16 {marker-end:url(#mrkr1-200);stroke:#000000;stroke-linecap:round;stroke-linejoin:round;stroke-width:0.72} .st17 {fill:none;stroke:none;stroke-linecap:round;stroke-linejoin:round;stroke-width:0.75} .st18 {fill:#000000;font-size:0.75em} .st19 {marker-end:url(#mrkr14-311);stroke:#000000;stroke-linecap:round;stroke-linejoin:round;stroke-width:0.75} .st20 {fill:none;fill-rule:evenodd;font-size:12px;overflow:visible;stroke-linecap:square;stroke-miterlimit:3} </style> ID: Edm.String {id} Amount: Edm.Decimal Sale Date: Edm.Date {id} Month: Edm.String Quarter: Edm.String Year: Edm.Int16 Time ID: Edm.String {id} Name: Edm.String Country: Edm.String Customer ID: Edm.String {id} Name: Edm.String Category ID: Edm.String {id} Name: Edm.String Color: Edm.String TaxRate: Edm.Decimal Product ID: Edm.String {id} Name: Edm.String SalesOrganization 1 * 1 * * 1 0..1 * 1 * 1 * Sales Sales Customer Product Products Category Time SalesOrganization Superordinate Rating: Edm.Byte FoodProduct RatingClass: Edm.String NonFoodProduct

The Amount property in the Sale entity type is an aggregatable property, and the properties of the related entity types are groupable. These can be arranged in hierarchies, for example:

  • Product hierarchy based on groupable properties of the Category and Product entity types
  • Customer hierarchy based on Country and Customer
  • Time hierarchy based on Year, Month, and Date
  • SalesOrganization hierarchy based on the recursive association to itself

In the context of Online Analytical Processing (OLAP), this model might be described in terms of a Sales "cube" with an Amount "measure" and three "dimensions". This document will avoid such terms, as they are heavily overloaded. :::

Query extensions and descriptive annotations can be applied to normalized schemas as well as partly or fully denormalized schemas.

::: example Example ##ex: The following diagram depicts a denormalized schema for the simple model.

Sale
SalesID: Edm.String {id}
Amount: Edm.Decimal
CategoryCategoryID: Edm.String
CategoryName: Edm.String
ProductProductID: Edm.String
ProductName: Edm.String
ProductColor: Edm.String
ProductTaxRate: Edm.Decimal
FoodFoodProductRating: Edm.Byte
Non-FoodNonFoodProductRatingClass: Edm.String
Sales OrganizationSalesOrganizationID: Edm.String
SalesOrganizationName: Edm.String
SalesOrganizationSuperordinateID: Edm.String
TimeTimeDate: Edm.Date
TimeMonth: Edm.String
TimeQuarter: Edm.String
TimeYear: Edm.Int16
CustomerCustomerID: Edm.String
CustomerName: Edm.String
CustomerCountry: Edm.String
:::

##subsec Example Data

::: example Example ##ex: The following entity sets and sample data will be used to further illustrate the capabilities introduced by this extension.

:::: {.example-data style=width:600px;height:700px}

::::: {.nav-2 style=left:250px} Products

ID Category Name Color TaxRate
P1 PG1 Sugar White 0.06
P2 PG1 Coffee Brown 0.06
P3 PG2 Paper White 0.14
P4 PG2 Pencil Black 0.14
:::::

::::: {style=left:510px} Food

Rating
5
 
n/a
n/a
:::::

::::: {style=left:570px} Non-Food

RatingClass
n/a
n/a
average
 
:::::

::::: {style=top:150px} Time

Date Month Quarter Year
2022-01-01 2022-01 2022-1 2022
2022-04-01 2022-04 2022-2 2022
2022-04-10 2022-04 2022-2 2022
:::::

::::: {style=top:150px;left:360px} Categories

ID Name
PG1 Food
PG2 Non-Food
:::::

::::: {.nav-2 style=top:260px;left:360px} Sales Organizations

ID Superordinate Name
Sales Corporate Sales
US Sales US
US West US US West
US East US US East
EMEA Sales EMEA
EMEA Central EMEA EMEA Central
:::::

::::: {style=top:300px} Customers

ID Name Country
C1 Joe USA
C2 Sue USA
C3 Sue Netherlands
C4 Luc France
:::::

::::: {.nav-2 .nav-3 .nav-4 .nav-5 style=top:450px} Sales

ID Customer Time Product Sales Organization Amount
1 C1 2022-01-03 P3 US West 1
2 C1 2022-04-10 P1 US West 2
3 C1 2022-08-07 P2 US West 4
4 C2 2022-01-03 P2 US East 8
5 C2 2022-11-09 P3 US East 4
6 C3 2022-04-01 P1 EMEA Central 2
7 C3 2022-08-06 P3 EMEA Central 1
8 C3 2022-11-22 P3 EMEA Central 2
:::::

::::: {.legend style=top:470px;left:500px} Legend

Property
Key
Navigation Property
:::::
::::
:::

##subsec Example Use Cases

::: example Example ##ex: In the example model, one prominent use case is the relation of customers to products. The first question that is likely to be asked is: "Which customers bought which products?"

This leads to the second more quantitative question: "Who bought how much of what?"

The answer to the second question typically is visualized as a cross-table:

:::: cross

Food Non-Food
Sugar Coffee Paper
USA 14 2 12 5 5
Joe 6 2 4 1 1
Sue 8 8 4 4
Netherlands 2 2 3 3
Sue 2 2 3 3
::::

The data in this cross-table can be written down in a shape that more closely resembles the structure of the data model, leaving cells empty that have been aggregated away:

Customer/Country Customer/Name Product/Category/Name Product/Name Amount
USA Joe Non-Food Paper 1
USA Joe Food Sugar 2
USA Joe Food Coffee 4
USA Sue Food Coffee 8
USA Sue Non-Food Paper 4
Netherlands Sue Food Sugar 2
Netherlands Sue Non-Food Paper 3
USA Food Sugar 2
USA Food Coffee 12
USA Non-Food Paper 5
Netherlands Food Sugar 2
Netherlands Non-Food Paper 3
USA Joe Food 6
USA Joe Non-Food 1
USA Sue Food 8
USA Sue Non-Food 4
Netherlands Sue Food 2
Netherlands Sue Non-Food 3
USA Food 14
USA Non-Food 5
Netherlands Food 2
Netherlands Non-Food 3

Note that this result contains seven fully qualified aggregate values, followed by fifteen rollup rows with subtotal values. :::