From 3e7fb777c3da5f376cb89f1d85d20a13250a498f Mon Sep 17 00:00:00 2001 From: Olivia Date: Tue, 31 Oct 2023 15:45:30 +0100 Subject: [PATCH 1/7] Reword proposal, extend requirements, explain various aggregation types --- proposals/aggregations/README.md | 284 ++++++++++++------ .../parameters/includeAggregations.yaml | 2 +- ...ggregation.yaml => aggregationResult.yaml} | 2 +- .../openapi/schemas/aggregations.yaml | 6 +- 4 files changed, 203 insertions(+), 91 deletions(-) rename proposals/aggregations/openapi/schemas/{aggregation.yaml => aggregationResult.yaml} (85%) diff --git a/proposals/aggregations/README.md b/proposals/aggregations/README.md index a757d810..3c0e7aea 100644 --- a/proposals/aggregations/README.md +++ b/proposals/aggregations/README.md @@ -1,132 +1,244 @@ -# OGC API - Records - Term Aggregations +# OGC API - Records - Aggregations This folder contains the content for the standard extension OGC API - Records - Term Aggregations. # Overview -This extensions enables the capability to include Term Aggregations in the items (records) response. These aggregations can be used by clients to enable [faceted search](https://en.wikipedia.org/wiki/Faceted_search). +This extension enables the capability to include different types of Aggregations in the items (records) response. These +aggregations can be used by clients to enable [faceted search](https://en.wikipedia.org/wiki/Faceted_search). -Various backends support faceted search. Examples are [Elastic](https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-bucket-terms-aggregation.html), [SOLR](https://solr.apache.org/guide/8_8/json-facet-api.html) and limited support in [PostGres](https://akorotkov.github.io/blog/2016/06/17/faceted-search/), [Oracle](https://blogs.oracle.com/apex/apex-192-faceted-search). +Various backends support faceted search. Examples +are [Elastic](https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-bucket-terms-aggregation.html), [SOLR](https://solr.apache.org/guide/8_8/json-facet-api.html) +and limited support +in [PostGres](https://akorotkov.github.io/blog/2016/06/17/faceted-search/), [Oracle](https://blogs.oracle.com/apex/apex-192-faceted-search). -Facet statistics are also interesting to give an overview of the spatio temporal distribution of items in a (part of a) collection. +Facet statistics are also interesting to give an overview of the spatio-temporal distribution of items in a (part of a) +collection. -## Extending a json search result with Term Aggregations +This extension has three major requirements: -The server includes the Term Aggregations for any search result within a collection. For example, for a collection `cat` the response to +1. A new endpoint advertising the available aggregations for a collection, similar to `sortables` and `filterables` +2. An _aggregation overview_ should be included in the responses of the `/items` collection endpoint which let the user + specify which aggregations they are + interested in +3. A new endpoint which lets the user "drill down" into a single aggregation, querying more results and applying filters +## Definition of an Aggregation + +The word _Aggregation_ refers here to a high-level piece of information that is computed over a set of records in a +collection. + +An aggregation can be of several types: + +### Terms aggregation + +A "terms" aggregation can be applied to any text property and produces a list of values appearing for a specific +property across all matching records, as well as the count of records containing each value. + +#### Example + +The aggregation `keyword` will return a list of buckets like so: + +* `forestry` (24 records) +* `marine` (12 records) +* `pollution` (5 records) +* etc. + +### Histogram aggregation + +A "histogram" aggregation can be applied to any temporal or numeric property and produces a list of buckets describing +the repartition of values across matching documents. + +#### Example + +The aggregation `createDate` will return a list of buckets like so: + +* `2020-01-01` to `2020-02-01`: 18 records +* `2020-02-01` to `2020-03-01`: 22 records +* `2020-03-01` to `2020-04-01`: 43 records +* etc. + +### Filters aggregation + +A "filters" aggregation produces a count of matching records for one or several predefined queries. This essentially +lets the user run "sub-queries" cheaply to have a better understanding of the composition of the search results. + +> Note: an improvement here could be to let the user specify their own sub-queries to run, using CQL + +#### Example + +The aggregations `hasDownloads` and `hasMaps` are defined like so: + +* `hasDownloads` returns the amount of records which have at least 1 distribution of a download type (CSV, Excel...) +* `hasMaps` returns the amount of records which have at least 1 distribution which is a map service (WMS, OGC API Map, + ESRI Rest)... + +### Spatial aggregation + +> WIP + +## Requirements + +### 1. Advertising available aggregations + +An additional `/aggregations` has to be supported at the collection level. This endpoint only supports `GET` requests. + +For example, for a collection called `myOrg`, a request on + +```http request +GET /collections/myOrg/aggregations ``` -GET /collections/cat/items + +will return a JSON object describing the various available aggregations for that collection. For each aggregation the +following information are included: + +* The identifier of the aggregation +* The type of aggregation: an aggregation can be of type `terms`, `histogram`, `spatial` or `filters` +* The maximum count of buckets returned in the _aggregations overview_ +* For terms aggregations: + * Name of the property targeted by this aggregation + * Sort criteria: count or value (alphabetical) + * Minimum occurrence count + * Support for including/excluding terms +* For histogram aggregations: + * Name of the property targeted by this aggregation + * Type of buckets used: fixed intervals, fixed buckets count, equalized amount of records in each bucket +* For filters aggregations: + * CQL expression used for each filter; this then lets the user apply the same filter in subsequent queries + +### 2. Extending a JSON search result with an Aggregations Overview + +The server might include an Aggregations Overview for any search result within a collection. For example, for a +collection `myOrg` the response to + +```http request +GET /collections/myOrg/items ``` -may include the following property `aggregations`. The content represents an array of aggregated terms. -A aggregation is identified by a collection property name and contains the top number of buckets with -their count. A bucket is a number of results in the resultset matching the key. A parameter `next` -indicates if there are potentially more buckets to be retrieved. +should include the following property `aggregations`. The content represents a dictionary of all aggregations available +for that collection, each containing +various buckets describing different facets of the search results. +An aggregation is identified by an identifier and contains the top number of buckets with +their count. A bucket is a number of results in the result set matching the key. A parameter `more` +indicates how many buckets were left out to keep the response size low (0 if all buckets were included in the overview). -``` +Note: `more` can also simply have the value `true` in case the precise amount of additional buckets could not be +computed. + +```json { "type": "FeatureCollection", - "aggregations": [ - { - "property": "keywords", + "aggregations": { + "keywords": { + "type": "terms", + "property": "keywords", "buckets": [ { - "key": "forestry", + "value": "forestry", "count": "202" }, { - "key": "marine", + "value": "marine", "count": "150" } ], - "next": "0" - } - ], - features": [], - "numberMatched": 375, - "numberReturned": 0, - "links": [] -} -``` - -## Numerical, spatial and temporal buckets - -For dynamic values a range of buckets can be returned defined by a min, max or bbox value. - -``` -{ - "aggregations": [ - "key": "scale", - "buckets": [ - { - "key": "0.1 - 0.001", - "count": "175" - }, - { - "key": "0.001 - 0.00001", - "count": "120" - } - ], - "next": "2" + "more": 0 }, - { - "key": "date", + "createDate": { + "type": "histogram", + "property": "createDate", "buckets": [ { - "key": "1990/01/01 - 1995/01/01", - "count": "12" - }, - { - "key": "1995/01/01 - 2000/01/01", - "count": "340" + "min": "2010-01-01", + "max": "2011-01-01", + "count": 100 }, { - "key": "2000/01/01 - 2005/01/01", - "count": "1200" + "min": "2011-01-01", + "max": "2012-01-01", + "count": 220 } ], - "next": "5" - } - ] + "more": 12 + } + }, + "features": [], + "numberMatched": 375, + "numberReturned": 0, + "links": [] } ``` -In the case of geometries this is a boundingbox in WGS84 defined by two points. +In some situations clients are interested only in the aggregations. In other situations the aggregations are not +required. These aspects can be controlled via additional query parameters. +| Parameter | Explanation | +|------------------|------------------------------------------------------------------------------------------| +| aggregations | default `true`; will include the aggregations overview in the response | +| aggregationsOnly | default `false`; returns the aggregations without search results. Similar to `&limit=0`. | + +> Note: give the possibility for the user to only ask for a subset of the aggregations? + +### 3. Offering the possibility to "drill-down" on a single aggregation + +An additional `/aggregation/{aggregationId}` has to be supported at the collection level. This endpoint only +supports `GET` requests. + +This endpoint takes in the _aggregation identifier_ as advertised in the `/aggregations` endpoint or in the overview in +the `/items` endpoint, as well as the following query parameters: + +* the `limit` and `offset` parameters can be used similarly to the `/items` endpoint, but for paginating through the + list of returned buckets +* the `q`, `bbox`, `datetime` and `filter` parameters behave the same way as for the `/items` endpoint, and let the user + get an exhaustive list of buckets for an existing search results set +* for terms aggregations: + * the `include` and `exclude` params accept text values including wildcards; both of these values will be used to + either include or exclude buckets based on their associated values (note that this has to be advertised as supported + in the `/aggregations` endpoint) + +For example, for a collection called `myOrg`, a request on + +```http request +GET /collections/myOrg/aggregation/keyword?exclude=*climate*&offset=100&limit=10 ``` + +might return: + +```json { - "aggregations": [ + "property": "keyword", + "type": "terms", + "buckets": [ { - "key": "bbox", - "buckets": [ - { - "key": "0,0 5,5", - "count": 175 - }, - { - "key": "0,5 5,10", - "count": 120 - }, - { - "key": "10,5 15,10", - "count": 77 - } - ], - "next": 7 + "value": "forestry", + "count": "202" + }, + { + "value": "marine", + "count": "150" + }, + ... + { + "value": "landcover", + "count": "12" + } + ], + "bucketsCount": 150, + "links": [ + { + "rel": "next", + "title": "The next page of buckets for this aggregation", + "href": "https://example.org/collections/myOrg/aggregation/keyword?exclude=*climate*&offset=110&limit=10" + }, + { + "rel": "previous", + "title": "The previous page of buckets for this aggregation", + "href": "https://example.org/collections/myOrg/aggregation/keyword?exclude=*climate*&offset=90&limit=10" } ] } ``` -## Interacting with aggregations - -In some situations clients are interested only in the aggregations. In other situations the aggregations are not required. These aspects can be controlled via additional query parameters. - -| Parameter | Explanation | -| -- | -- | -| aggregationsOnly | default `false`, Returns the aggregations without search results. Similar to `&limit=0`. | -| includeAggregations | default `true` (if available), can be set to `false` | - # Folder structure This folder is organized as follows: diff --git a/proposals/aggregations/openapi/parameters/includeAggregations.yaml b/proposals/aggregations/openapi/parameters/includeAggregations.yaml index d6b44398..a36a9b65 100644 --- a/proposals/aggregations/openapi/parameters/includeAggregations.yaml +++ b/proposals/aggregations/openapi/parameters/includeAggregations.yaml @@ -1,4 +1,4 @@ -name: includeAggregations +name: aggregations description: parameter can be set to omit aggregations in search result in: query required: false diff --git a/proposals/aggregations/openapi/schemas/aggregation.yaml b/proposals/aggregations/openapi/schemas/aggregationResult.yaml similarity index 85% rename from proposals/aggregations/openapi/schemas/aggregation.yaml rename to proposals/aggregations/openapi/schemas/aggregationResult.yaml index a06ca837..93cde8b6 100644 --- a/proposals/aggregations/openapi/schemas/aggregation.yaml +++ b/proposals/aggregations/openapi/schemas/aggregationResult.yaml @@ -1,6 +1,6 @@ Aggregation: type: object - description: An aggregation is linked to a item property and contains the top set of occurences (buckets) of values of the property + description: An aggregation is linked to a item property and contains the top set of occurrences (buckets) of values of the property required: - key properties: diff --git a/proposals/aggregations/openapi/schemas/aggregations.yaml b/proposals/aggregations/openapi/schemas/aggregations.yaml index 49e98032..0729ce07 100644 --- a/proposals/aggregations/openapi/schemas/aggregations.yaml +++ b/proposals/aggregations/openapi/schemas/aggregations.yaml @@ -1,4 +1,4 @@ -type: array -description: List of aggregations +type: object +description: Aggregations overview items: - $ref: "./aggregation.yaml" \ No newline at end of file + $ref: "./aggregationResult.yaml" From 41d11ceca5c845598b187aaf560c1b91d97d7c99 Mon Sep 17 00:00:00 2001 From: sebr72 Date: Tue, 7 Nov 2023 11:53:40 +0100 Subject: [PATCH 2/7] Use term facet following OGC Code Spring London --- proposals/README.md | 2 +- .../openapi/parameters/aggregationsOnly.yaml | 10 - .../openapi/parameters/facetsOnly.yaml | 10 + ...deAggregations.yaml => includeFacets.yaml} | 4 +- .../openapi/schemas/aggregations.yaml | 4 - ...ggregationResult.yaml => facetResult.yaml} | 4 +- .../aggregations/openapi/schemas/facets.yaml | 4 + proposals/facets/README.md | 227 ++++++++++++++++++ 8 files changed, 246 insertions(+), 19 deletions(-) delete mode 100644 proposals/aggregations/openapi/parameters/aggregationsOnly.yaml create mode 100644 proposals/aggregations/openapi/parameters/facetsOnly.yaml rename proposals/aggregations/openapi/parameters/{includeAggregations.yaml => includeFacets.yaml} (53%) delete mode 100644 proposals/aggregations/openapi/schemas/aggregations.yaml rename proposals/aggregations/openapi/schemas/{aggregationResult.yaml => facetResult.yaml} (70%) create mode 100644 proposals/aggregations/openapi/schemas/facets.yaml create mode 100644 proposals/facets/README.md diff --git a/proposals/README.md b/proposals/README.md index 7f9d0be7..381fe25a 100644 --- a/proposals/README.md +++ b/proposals/README.md @@ -1,6 +1,6 @@ # Proposed Extensions -OGC APIs are designed to be modular. We expect new requirements will emerge with use and new features will be proposed to address those requirements. Development and validation of these new features is a community effort. Supporting that effort are two tools; a process for tracking the maturity of a proposed addition, and a means to publish the current baseline of a proposed new feature. +OGC APIs are designed to be modular. We expect new requirements will emerge with use then new features will be proposed to address those requirements. Development and validation of these new features is a community effort. Supporting that effort are two tools; a process for tracking the maturity of a proposed addition, and a means to publish the current baseline of a proposed new feature. ## Draft Features diff --git a/proposals/aggregations/openapi/parameters/aggregationsOnly.yaml b/proposals/aggregations/openapi/parameters/aggregationsOnly.yaml deleted file mode 100644 index cccf8cb5..00000000 --- a/proposals/aggregations/openapi/parameters/aggregationsOnly.yaml +++ /dev/null @@ -1,10 +0,0 @@ -name: aggregationsOnly -description: parameter can be used to request aggregations only, similar to &limit=0 -in: query -required: false -schema: - type: string - format: uri -style: form -explode: false -default: false \ No newline at end of file diff --git a/proposals/aggregations/openapi/parameters/facetsOnly.yaml b/proposals/aggregations/openapi/parameters/facetsOnly.yaml new file mode 100644 index 00000000..18f63e15 --- /dev/null +++ b/proposals/aggregations/openapi/parameters/facetsOnly.yaml @@ -0,0 +1,10 @@ +name: facetsOnly +description: parameter can be used to request facets only, similar to &limit=0 +in: query +required: false +schema: + type: string + format: uri +style: form +explode: false +default: false \ No newline at end of file diff --git a/proposals/aggregations/openapi/parameters/includeAggregations.yaml b/proposals/aggregations/openapi/parameters/includeFacets.yaml similarity index 53% rename from proposals/aggregations/openapi/parameters/includeAggregations.yaml rename to proposals/aggregations/openapi/parameters/includeFacets.yaml index a36a9b65..6ccf35ff 100644 --- a/proposals/aggregations/openapi/parameters/includeAggregations.yaml +++ b/proposals/aggregations/openapi/parameters/includeFacets.yaml @@ -1,5 +1,5 @@ -name: aggregations -description: parameter can be set to omit aggregations in search result +name: facets +description: parameter can be set to omit facets in search result in: query required: false schema: diff --git a/proposals/aggregations/openapi/schemas/aggregations.yaml b/proposals/aggregations/openapi/schemas/aggregations.yaml deleted file mode 100644 index 0729ce07..00000000 --- a/proposals/aggregations/openapi/schemas/aggregations.yaml +++ /dev/null @@ -1,4 +0,0 @@ -type: object -description: Aggregations overview -items: - $ref: "./aggregationResult.yaml" diff --git a/proposals/aggregations/openapi/schemas/aggregationResult.yaml b/proposals/aggregations/openapi/schemas/facetResult.yaml similarity index 70% rename from proposals/aggregations/openapi/schemas/aggregationResult.yaml rename to proposals/aggregations/openapi/schemas/facetResult.yaml index 93cde8b6..ca847d63 100644 --- a/proposals/aggregations/openapi/schemas/aggregationResult.yaml +++ b/proposals/aggregations/openapi/schemas/facetResult.yaml @@ -1,6 +1,6 @@ -Aggregation: +Facet: type: object - description: An aggregation is linked to a item property and contains the top set of occurrences (buckets) of values of the property + description: A facet is linked to an item property and contains the top set of occurrences (buckets) of values of the property required: - key properties: diff --git a/proposals/aggregations/openapi/schemas/facets.yaml b/proposals/aggregations/openapi/schemas/facets.yaml new file mode 100644 index 00000000..e448e160 --- /dev/null +++ b/proposals/aggregations/openapi/schemas/facets.yaml @@ -0,0 +1,4 @@ +type: object +description: Facets overview +items: + $ref: "./facetResult.yaml" diff --git a/proposals/facets/README.md b/proposals/facets/README.md new file mode 100644 index 00000000..84f9e476 --- /dev/null +++ b/proposals/facets/README.md @@ -0,0 +1,227 @@ +# OGC API - Records - Facets + +This folder contains the content for the standard extension OGC API - Records - Term Facets. + +# Overview + +This extension aggregates the items (records) in buckets and provides a number associated to each bucket. Each of the related buckets are grouped into a facet enabling [faceted search](https://en.wikipedia.org/wiki/Faceted_search). + +Various backends support faceted search. Examples are [Elastic](https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-bucket-terms-aggregation.html), [SOLR](https://solr.apache.org/guide/8_8/json-facet-api.html) and limited support in [PostGres](https://akorotkov.github.io/blog/2016/06/17/faceted-search/), [Oracle](https://blogs.oracle.com/apex/apex-192-faceted-search). + +Facet statistics are also interesting to give an overview of the spatio-temporal distribution of items in a (part of a) collection. + +This extension has two major requirements: +1. A new endpoint advertising the available facets for a collection, similar to `sortables` and `filterables` +2. An _facet overview_ should be included in the responses of the `/items` collection endpoint which let the user + specify which facets they are interested in + +## Definition of a Facet + +The word _Facet_ refers here to a high-level piece of information that is computed over a set of records in a +collection. + +An facet can be of several types: + +### Terms facet + +A "terms" facet can be applied to any text property and produces a list of values appearing for a specific +property across all matching records, as well as the count of records containing each value. + +#### Example + +A facet `keyword` based on countries might return a list of buckets like so: + +* `Greece` (24 records) +* `Germany` (12 records) +* `England` (5 records) +* etc. + +### Histogram facet + +A "histogram" facet can be applied to any temporal or numeric property and produces a list of buckets describing +the repartition of values across matching documents. + +#### Example + +The facet `createDate` will return a list of buckets like so: + +* `2020-01-01` to `2020-02-01`: 18 records +* `2020-02-01` to `2020-03-01`: 22 records +* `2020-03-01` to `2020-04-01`: 43 records +* etc. + +### Filters facets + +A "filters" facets produces a count of matching records for one or several predefined queries. This essentially +lets the user run "sub-queries" cheaply to have a better understanding of the composition of the search results. + +#### Example + +The facet `hasDownloads` and `hasMaps` are defined like so: + +* `hasDownloads` returns the amount of records which have at least 1 distribution of a download type (CSV, Excel...) +* `hasMaps` returns the amount of records which have at least 1 distribution which is a map service (WMS, OGC API Map, + ESRI Rest)... + + +## Requirements + +### 1. Advertising available facets + +An additional `/facets` has to be supported at the collection level. This endpoint only supports `GET` requests. + +For example, for a collection called `myOrg`, a request on + +```http request +GET /collections/myOrg/facets +``` + +will return a JSON object describing the various available facets for that collection. For each facet the +following information are included: + +* The identifier of the facet +* The type of facet: a facet can be of type `terms`, `histogram` or `filters` +* The maximum count of buckets returned in the _facets overview_ +* For terms facets: + * Name of the property targeted by this facet + * Sort criteria: count or value (alphabetical) + * Minimum occurrence count + * Support for including/excluding terms +* For histogram facet: + * Name of the property targeted by this facet + * Type of buckets used: fixed intervals, fixed buckets count, equalized amount of records in each bucket +* For filters facets: + * CQL expression used for each filter; this then lets the user apply the same filter in subsequent queries + +### 2. Extending a JSON search result with a Facet Overview + +The server might include a facet Overview for any search result within a collection. For example, for a collection `myOrg` the response to + +```http request +GET /collections/myOrg/items +``` + +should include the following property `facets`. The content represents a dictionary of all facets available +for that collection, each containing +various buckets describing different facets of the search results. +A facet is identified by an identifier and contains the top number of buckets with +their count. A bucket is a number of results in the result set matching the key. A parameter `more` +indicates how many buckets were left out to keep the response size low (0 if all buckets were included in the overview). + +Note: `more` can also simply have the value `true` in case the precise amount of additional buckets could not be +computed. + +```json +{ + "type": "FeatureCollection", + "facets": { + "keywords": { + "type": "terms", + "property": "keyword", + "buckets": [ + { + "value": "Greece", + "count": "202" + }, + { + "value": "Germany", + "count": "150" + } + ], + "more": 0 + }, + "createDate": { + "type": "histogram", + "property": "createDate", + "buckets": [ + { + "min": "2010-01-01", + "max": "2011-01-01", + "count": 100 + }, + { + "min": "2011-01-01", + "max": "2012-01-01", + "count": 220 + } + ], + "more": 12 + } + }, + "features": [], + "numberMatched": 375, + "numberReturned": 0, + "links": [] +} +``` + +In some situations clients are interested only in the facets. In other situations the facets are not +required. These aspects can be controlled via additional query parameters. + +| Parameter | Explanation | +|------------|------------------------------------------------------------------------------------| +| facets | default `true`; will include the facets overview in the response | +| facetsOnly | default `false`; returns the facets without search results. Similar to `&limit=0`. | + + +### 3. Offering the possibility to "drill-down" on a single facet + +An additional `/facet/{facetId}` has to be supported at the collection level. This endpoint only +supports `GET` requests. + +This endpoint takes in the _facet identifier_ as advertised in the `/facets` endpoint or in the overview in +the `/items` endpoint, as well as the following query parameters: + +* the `limit` and `offset` parameters can be used similarly to the `/items` endpoint, but for paginating through the + list of returned buckets +* the `q`, `bbox`, `datetime` and `filter` parameters behave the same way as for the `/items` endpoint, and let the user + get an exhaustive list of buckets for an existing search results set +* for terms facets: + * the `include` and `exclude` params accept text values including wildcards; both of these values will be used to + either include or exclude buckets based on their associated values (note that this has to be advertised as supported + in the `/facets` endpoint) + +For example, for a collection called `myOrg`, a request on + +```http request +GET /collections/myOrg/facet/keyword?exclude=*england*&offset=100&limit=10 +``` + +might return: + +```json +{ + "property": "keyword", + "type": "terms", + "buckets": [ + { + "value": "Greece", + "count": "202" + }, + { + "value": "Germany", + "count": "150" + } + ], + "bucketsCount": 150, + "links": [ + { + "rel": "next", + "title": "The next page of buckets for this facet", + "href": "https://example.org/collections/myOrg/facet/keyword?exclude=*England*&offset=110&limit=10" + }, + { + "rel": "previous", + "title": "The previous page of buckets for this facet", + "href": "https://example.org/collections/myOrg/facet/keyword?exclude=*climate*&offset=90&limit=10" + } + ] +} +``` + +# Folder structure + +This folder is organized as follows: + +* openapi - normative OpenAPI components specified by the standard + From 743ba9032578878101b76bfea1763f99128a4eb3 Mon Sep 17 00:00:00 2001 From: sebr72 Date: Tue, 7 Nov 2023 17:55:06 +0100 Subject: [PATCH 3/7] Slimmed down version of the proposal --- proposals/facets/README.md | 73 +++++--------------------------------- 1 file changed, 8 insertions(+), 65 deletions(-) diff --git a/proposals/facets/README.md b/proposals/facets/README.md index 84f9e476..cbd7b9be 100644 --- a/proposals/facets/README.md +++ b/proposals/facets/README.md @@ -8,13 +8,6 @@ This extension aggregates the items (records) in buckets and provides a number a Various backends support faceted search. Examples are [Elastic](https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-bucket-terms-aggregation.html), [SOLR](https://solr.apache.org/guide/8_8/json-facet-api.html) and limited support in [PostGres](https://akorotkov.github.io/blog/2016/06/17/faceted-search/), [Oracle](https://blogs.oracle.com/apex/apex-192-faceted-search). -Facet statistics are also interesting to give an overview of the spatio-temporal distribution of items in a (part of a) collection. - -This extension has two major requirements: -1. A new endpoint advertising the available facets for a collection, similar to `sortables` and `filterables` -2. An _facet overview_ should be included in the responses of the `/items` collection endpoint which let the user - specify which facets they are interested in - ## Definition of a Facet The word _Facet_ refers here to a high-level piece of information that is computed over a set of records in a @@ -29,7 +22,7 @@ property across all matching records, as well as the count of records containing #### Example -A facet `keyword` based on countries might return a list of buckets like so: +A terms facet based on `keyword` countries might return a list of buckets like so: * `Greece` (24 records) * `Germany` (12 records) @@ -57,11 +50,11 @@ lets the user run "sub-queries" cheaply to have a better understanding of the co #### Example -The facet `hasDownloads` and `hasMaps` are defined like so: +The facet `hasDownloads` will return the amount of records which have at least 1 distribution of a download type (CSV, Excel...): -* `hasDownloads` returns the amount of records which have at least 1 distribution of a download type (CSV, Excel...) -* `hasMaps` returns the amount of records which have at least 1 distribution which is a map service (WMS, OGC API Map, - ESRI Rest)... +`hasDownloads` +* 'CSV, Excel, ...' : 2051 records +* 'None' : 351 records ## Requirements @@ -163,60 +156,10 @@ required. These aspects can be controlled via additional query parameters. | facets | default `true`; will include the facets overview in the response | | facetsOnly | default `false`; returns the facets without search results. Similar to `&limit=0`. | - -### 3. Offering the possibility to "drill-down" on a single facet - -An additional `/facet/{facetId}` has to be supported at the collection level. This endpoint only -supports `GET` requests. - -This endpoint takes in the _facet identifier_ as advertised in the `/facets` endpoint or in the overview in -the `/items` endpoint, as well as the following query parameters: - -* the `limit` and `offset` parameters can be used similarly to the `/items` endpoint, but for paginating through the - list of returned buckets -* the `q`, `bbox`, `datetime` and `filter` parameters behave the same way as for the `/items` endpoint, and let the user - get an exhaustive list of buckets for an existing search results set -* for terms facets: - * the `include` and `exclude` params accept text values including wildcards; both of these values will be used to - either include or exclude buckets based on their associated values (note that this has to be advertised as supported - in the `/facets` endpoint) - -For example, for a collection called `myOrg`, a request on - -```http request -GET /collections/myOrg/facet/keyword?exclude=*england*&offset=100&limit=10 +### Examples ``` - -might return: - -```json -{ - "property": "keyword", - "type": "terms", - "buckets": [ - { - "value": "Greece", - "count": "202" - }, - { - "value": "Germany", - "count": "150" - } - ], - "bucketsCount": 150, - "links": [ - { - "rel": "next", - "title": "The next page of buckets for this facet", - "href": "https://example.org/collections/myOrg/facet/keyword?exclude=*England*&offset=110&limit=10" - }, - { - "rel": "previous", - "title": "The previous page of buckets for this facet", - "href": "https://example.org/collections/myOrg/facet/keyword?exclude=*climate*&offset=90&limit=10" - } - ] -} +GET /collections/myOrg/items?q=countries&facets=keywords&exclude=*england*&offset=100&limit=10 +GET /collections/myOrg/items?q=countries&facets=keywords&include=*england*&offset=100&limit=10 ``` # Folder structure From 1957100f936d127b7c08e849ae97d6b778eb6c0b Mon Sep 17 00:00:00 2001 From: Florent Gravin Date: Thu, 9 Nov 2023 17:47:36 +0100 Subject: [PATCH 4/7] facets: rework proposal --- proposals/facets/README.md | 79 ++++++++++++++++++++++++++++++-------- 1 file changed, 62 insertions(+), 17 deletions(-) diff --git a/proposals/facets/README.md b/proposals/facets/README.md index cbd7b9be..00b20cf3 100644 --- a/proposals/facets/README.md +++ b/proposals/facets/README.md @@ -1,6 +1,6 @@ # OGC API - Records - Facets -This folder contains the content for the standard extension OGC API - Records - Term Facets. +This folder contains the content for the standard extension OGC API - Records - Facets. # Overview @@ -13,11 +13,11 @@ Various backends support faceted search. Examples are [Elastic](https://www.elas The word _Facet_ refers here to a high-level piece of information that is computed over a set of records in a collection. -An facet can be of several types: +A facet can be of several types: ### Terms facet -A "terms" facet can be applied to any text property and produces a list of values appearing for a specific +A `terms` facet can be applied to any text property and produces a list of values appearing for a specific property across all matching records, as well as the count of records containing each value. #### Example @@ -31,8 +31,7 @@ A terms facet based on `keyword` countries might return a list of buckets like s ### Histogram facet -A "histogram" facet can be applied to any temporal or numeric property and produces a list of buckets describing -the repartition of values across matching documents. +A `histogram` facet can be applied to any temporal or numeric property to distribute item values over ranges or intervals. #### Example @@ -41,11 +40,16 @@ The facet `createDate` will return a list of buckets like so: * `2020-01-01` to `2020-02-01`: 18 records * `2020-02-01` to `2020-03-01`: 22 records * `2020-03-01` to `2020-04-01`: 43 records + +or +* `2020`: 18 records +* `2021`: 22 records +* `2022`: 43 records * etc. ### Filters facets -A "filters" facets produces a count of matching records for one or several predefined queries. This essentially +A `filters` facets produces a count of matching records for one or several predefined queries. This essentially lets the user run "sub-queries" cheaply to have a better understanding of the composition of the search results. #### Example @@ -61,10 +65,11 @@ The facet `hasDownloads` will return the amount of records which have at least 1 ### 1. Advertising available facets -An additional `/facets` has to be supported at the collection level. This endpoint only supports `GET` requests. +An additional `/facets` entrypoint (similar to `/queryables`)has to be supported at the collection level. This endpoint only supports `GET` requests. For example, for a collection called `myOrg`, a request on +Note ```http request GET /collections/myOrg/facets ``` @@ -86,7 +91,54 @@ following information are included: * For filters facets: * CQL expression used for each filter; this then lets the user apply the same filter in subsequent queries -### 2. Extending a JSON search result with a Facet Overview +> **Note** that all the `facettables` attributes must be `queryables` + +Example: + +```json +{ + "type": "object", + "title": "Observations", + "defaultCount": 10, + "properties": { + "date": { + "type": "histogram" + }, + "organization": { + "type": "terms" + }, + "created": { + "type": "histogram" + }, + "format": { + "type": "terms" + }, + "usage": { + "type": "filter", + "view": "link.type = 'wms'", + "download": "link.type = 'wfs'" + } + }, + "$schema": "http://json-schema.org/draft/2019-09/schema" +} +``` +### 2. Extending queryables response + +The queryable response will advertize that an attribute is facettable or not, to eventually skip the `/facets` request. + +eg. +```json +{ + "properties": { + "organization": { + "title": "organization", + "type": "string", + "facets": true + } + } +} +``` +### 3. Extending a JSON search result with a Facet Overview The server might include a facet Overview for any search result within a collection. For example, for a collection `myOrg` the response to @@ -148,18 +200,11 @@ computed. } ``` -In some situations clients are interested only in the facets. In other situations the facets are not -required. These aspects can be controlled via additional query parameters. - -| Parameter | Explanation | -|------------|------------------------------------------------------------------------------------| -| facets | default `true`; will include the facets overview in the response | -| facetsOnly | default `false`; returns the facets without search results. Similar to `&limit=0`. | +In some situations clients are interested only in the facets. In that case, they can use the `facets` parameter with `&limit=0`. ### Examples ``` -GET /collections/myOrg/items?q=countries&facets=keywords&exclude=*england*&offset=100&limit=10 -GET /collections/myOrg/items?q=countries&facets=keywords&include=*england*&offset=100&limit=10 +GET /collections/myOrg/items?q=&facets=keywords:20:value_asc,date:quantile:4,update:year.month,usage ``` # Folder structure From ddfa49afd544a81e42637a1d032feb5f7c05c119 Mon Sep 17 00:00:00 2001 From: sebr72 Date: Tue, 14 Nov 2023 10:19:29 +0100 Subject: [PATCH 5/7] Move files to facets folder and remove duplication --- proposals/aggregations/README.md | 247 ------------------ .../openapi/parameters/facetsOnly.yaml | 10 - .../openapi/parameters/includeFacets.yaml | 10 - .../openapi/schemas/bucket.yaml | 0 .../openapi/schemas/facetResult.yaml | 0 .../openapi/schemas/facets.yaml | 0 6 files changed, 267 deletions(-) delete mode 100644 proposals/aggregations/README.md delete mode 100644 proposals/aggregations/openapi/parameters/facetsOnly.yaml delete mode 100644 proposals/aggregations/openapi/parameters/includeFacets.yaml rename proposals/{aggregations => facets}/openapi/schemas/bucket.yaml (100%) rename proposals/{aggregations => facets}/openapi/schemas/facetResult.yaml (100%) rename proposals/{aggregations => facets}/openapi/schemas/facets.yaml (100%) diff --git a/proposals/aggregations/README.md b/proposals/aggregations/README.md deleted file mode 100644 index 3c0e7aea..00000000 --- a/proposals/aggregations/README.md +++ /dev/null @@ -1,247 +0,0 @@ -# OGC API - Records - Aggregations - -This folder contains the content for the standard extension OGC API - Records - Term Aggregations. - -# Overview - -This extension enables the capability to include different types of Aggregations in the items (records) response. These -aggregations can be used by clients to enable [faceted search](https://en.wikipedia.org/wiki/Faceted_search). - -Various backends support faceted search. Examples -are [Elastic](https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-bucket-terms-aggregation.html), [SOLR](https://solr.apache.org/guide/8_8/json-facet-api.html) -and limited support -in [PostGres](https://akorotkov.github.io/blog/2016/06/17/faceted-search/), [Oracle](https://blogs.oracle.com/apex/apex-192-faceted-search). - -Facet statistics are also interesting to give an overview of the spatio-temporal distribution of items in a (part of a) -collection. - -This extension has three major requirements: - -1. A new endpoint advertising the available aggregations for a collection, similar to `sortables` and `filterables` -2. An _aggregation overview_ should be included in the responses of the `/items` collection endpoint which let the user - specify which aggregations they are - interested in -3. A new endpoint which lets the user "drill down" into a single aggregation, querying more results and applying filters - -## Definition of an Aggregation - -The word _Aggregation_ refers here to a high-level piece of information that is computed over a set of records in a -collection. - -An aggregation can be of several types: - -### Terms aggregation - -A "terms" aggregation can be applied to any text property and produces a list of values appearing for a specific -property across all matching records, as well as the count of records containing each value. - -#### Example - -The aggregation `keyword` will return a list of buckets like so: - -* `forestry` (24 records) -* `marine` (12 records) -* `pollution` (5 records) -* etc. - -### Histogram aggregation - -A "histogram" aggregation can be applied to any temporal or numeric property and produces a list of buckets describing -the repartition of values across matching documents. - -#### Example - -The aggregation `createDate` will return a list of buckets like so: - -* `2020-01-01` to `2020-02-01`: 18 records -* `2020-02-01` to `2020-03-01`: 22 records -* `2020-03-01` to `2020-04-01`: 43 records -* etc. - -### Filters aggregation - -A "filters" aggregation produces a count of matching records for one or several predefined queries. This essentially -lets the user run "sub-queries" cheaply to have a better understanding of the composition of the search results. - -> Note: an improvement here could be to let the user specify their own sub-queries to run, using CQL - -#### Example - -The aggregations `hasDownloads` and `hasMaps` are defined like so: - -* `hasDownloads` returns the amount of records which have at least 1 distribution of a download type (CSV, Excel...) -* `hasMaps` returns the amount of records which have at least 1 distribution which is a map service (WMS, OGC API Map, - ESRI Rest)... - -### Spatial aggregation - -> WIP - -## Requirements - -### 1. Advertising available aggregations - -An additional `/aggregations` has to be supported at the collection level. This endpoint only supports `GET` requests. - -For example, for a collection called `myOrg`, a request on - -```http request -GET /collections/myOrg/aggregations -``` - -will return a JSON object describing the various available aggregations for that collection. For each aggregation the -following information are included: - -* The identifier of the aggregation -* The type of aggregation: an aggregation can be of type `terms`, `histogram`, `spatial` or `filters` -* The maximum count of buckets returned in the _aggregations overview_ -* For terms aggregations: - * Name of the property targeted by this aggregation - * Sort criteria: count or value (alphabetical) - * Minimum occurrence count - * Support for including/excluding terms -* For histogram aggregations: - * Name of the property targeted by this aggregation - * Type of buckets used: fixed intervals, fixed buckets count, equalized amount of records in each bucket -* For filters aggregations: - * CQL expression used for each filter; this then lets the user apply the same filter in subsequent queries - -### 2. Extending a JSON search result with an Aggregations Overview - -The server might include an Aggregations Overview for any search result within a collection. For example, for a -collection `myOrg` the response to - -```http request -GET /collections/myOrg/items -``` - -should include the following property `aggregations`. The content represents a dictionary of all aggregations available -for that collection, each containing -various buckets describing different facets of the search results. -An aggregation is identified by an identifier and contains the top number of buckets with -their count. A bucket is a number of results in the result set matching the key. A parameter `more` -indicates how many buckets were left out to keep the response size low (0 if all buckets were included in the overview). - -Note: `more` can also simply have the value `true` in case the precise amount of additional buckets could not be -computed. - -```json -{ - "type": "FeatureCollection", - "aggregations": { - "keywords": { - "type": "terms", - "property": "keywords", - "buckets": [ - { - "value": "forestry", - "count": "202" - }, - { - "value": "marine", - "count": "150" - } - ], - "more": 0 - }, - "createDate": { - "type": "histogram", - "property": "createDate", - "buckets": [ - { - "min": "2010-01-01", - "max": "2011-01-01", - "count": 100 - }, - { - "min": "2011-01-01", - "max": "2012-01-01", - "count": 220 - } - ], - "more": 12 - } - }, - "features": [], - "numberMatched": 375, - "numberReturned": 0, - "links": [] -} -``` - -In some situations clients are interested only in the aggregations. In other situations the aggregations are not -required. These aspects can be controlled via additional query parameters. - -| Parameter | Explanation | -|------------------|------------------------------------------------------------------------------------------| -| aggregations | default `true`; will include the aggregations overview in the response | -| aggregationsOnly | default `false`; returns the aggregations without search results. Similar to `&limit=0`. | - -> Note: give the possibility for the user to only ask for a subset of the aggregations? - -### 3. Offering the possibility to "drill-down" on a single aggregation - -An additional `/aggregation/{aggregationId}` has to be supported at the collection level. This endpoint only -supports `GET` requests. - -This endpoint takes in the _aggregation identifier_ as advertised in the `/aggregations` endpoint or in the overview in -the `/items` endpoint, as well as the following query parameters: - -* the `limit` and `offset` parameters can be used similarly to the `/items` endpoint, but for paginating through the - list of returned buckets -* the `q`, `bbox`, `datetime` and `filter` parameters behave the same way as for the `/items` endpoint, and let the user - get an exhaustive list of buckets for an existing search results set -* for terms aggregations: - * the `include` and `exclude` params accept text values including wildcards; both of these values will be used to - either include or exclude buckets based on their associated values (note that this has to be advertised as supported - in the `/aggregations` endpoint) - -For example, for a collection called `myOrg`, a request on - -```http request -GET /collections/myOrg/aggregation/keyword?exclude=*climate*&offset=100&limit=10 -``` - -might return: - -```json -{ - "property": "keyword", - "type": "terms", - "buckets": [ - { - "value": "forestry", - "count": "202" - }, - { - "value": "marine", - "count": "150" - }, - ... - { - "value": "landcover", - "count": "12" - } - ], - "bucketsCount": 150, - "links": [ - { - "rel": "next", - "title": "The next page of buckets for this aggregation", - "href": "https://example.org/collections/myOrg/aggregation/keyword?exclude=*climate*&offset=110&limit=10" - }, - { - "rel": "previous", - "title": "The previous page of buckets for this aggregation", - "href": "https://example.org/collections/myOrg/aggregation/keyword?exclude=*climate*&offset=90&limit=10" - } - ] -} -``` - -# Folder structure - -This folder is organized as follows: - -* openapi - normative OpenAPI components specified by the standard - diff --git a/proposals/aggregations/openapi/parameters/facetsOnly.yaml b/proposals/aggregations/openapi/parameters/facetsOnly.yaml deleted file mode 100644 index 18f63e15..00000000 --- a/proposals/aggregations/openapi/parameters/facetsOnly.yaml +++ /dev/null @@ -1,10 +0,0 @@ -name: facetsOnly -description: parameter can be used to request facets only, similar to &limit=0 -in: query -required: false -schema: - type: string - format: uri -style: form -explode: false -default: false \ No newline at end of file diff --git a/proposals/aggregations/openapi/parameters/includeFacets.yaml b/proposals/aggregations/openapi/parameters/includeFacets.yaml deleted file mode 100644 index 6ccf35ff..00000000 --- a/proposals/aggregations/openapi/parameters/includeFacets.yaml +++ /dev/null @@ -1,10 +0,0 @@ -name: facets -description: parameter can be set to omit facets in search result -in: query -required: false -schema: - type: string - format: uri -style: form -explode: false -default: true diff --git a/proposals/aggregations/openapi/schemas/bucket.yaml b/proposals/facets/openapi/schemas/bucket.yaml similarity index 100% rename from proposals/aggregations/openapi/schemas/bucket.yaml rename to proposals/facets/openapi/schemas/bucket.yaml diff --git a/proposals/aggregations/openapi/schemas/facetResult.yaml b/proposals/facets/openapi/schemas/facetResult.yaml similarity index 100% rename from proposals/aggregations/openapi/schemas/facetResult.yaml rename to proposals/facets/openapi/schemas/facetResult.yaml diff --git a/proposals/aggregations/openapi/schemas/facets.yaml b/proposals/facets/openapi/schemas/facets.yaml similarity index 100% rename from proposals/aggregations/openapi/schemas/facets.yaml rename to proposals/facets/openapi/schemas/facets.yaml From 24efc9d94f1f0be5ca350ac93e3897a84e4f579e Mon Sep 17 00:00:00 2001 From: sebr72 Date: Mon, 20 Nov 2023 09:58:39 +0100 Subject: [PATCH 6/7] Clarify examples --- proposals/facets/README.md | 40 ++++++++++++++++++++------------------ 1 file changed, 21 insertions(+), 19 deletions(-) diff --git a/proposals/facets/README.md b/proposals/facets/README.md index 00b20cf3..e0f6c127 100644 --- a/proposals/facets/README.md +++ b/proposals/facets/README.md @@ -41,24 +41,18 @@ The facet `createDate` will return a list of buckets like so: * `2020-02-01` to `2020-03-01`: 22 records * `2020-03-01` to `2020-04-01`: 43 records -or -* `2020`: 18 records -* `2021`: 22 records -* `2022`: 43 records -* etc. - ### Filters facets A `filters` facets produces a count of matching records for one or several predefined queries. This essentially lets the user run "sub-queries" cheaply to have a better understanding of the composition of the search results. -#### Example +#### Examples The facet `hasDownloads` will return the amount of records which have at least 1 distribution of a download type (CSV, Excel...): +* `hasDownloads` : 2051 records -`hasDownloads` -* 'CSV, Excel, ...' : 2051 records -* 'None' : 351 records +The facet `hasMaps` will return the amount of records which have at associated to it: +* `hasMaps` : 11495 records ## Requirements @@ -78,9 +72,9 @@ will return a JSON object describing the various available facets for that colle following information are included: * The identifier of the facet -* The type of facet: a facet can be of type `terms`, `histogram` or `filters` +* The type of facet: a facet can be of type `term`, `histogram` or `filter` * The maximum count of buckets returned in the _facets overview_ -* For terms facets: +* For term facets: * Name of the property targeted by this facet * Sort criteria: count or value (alphabetical) * Minimum occurrence count @@ -88,7 +82,7 @@ following information are included: * For histogram facet: * Name of the property targeted by this facet * Type of buckets used: fixed intervals, fixed buckets count, equalized amount of records in each bucket -* For filters facets: +* For filter facets: * CQL expression used for each filter; this then lets the user apply the same filter in subsequent queries > **Note** that all the `facettables` attributes must be `queryables` @@ -99,19 +93,26 @@ Example: { "type": "object", "title": "Observations", - "defaultCount": 10, - "properties": { + "defaultBucketCount": 10, + "facets": { "date": { - "type": "histogram" + "name" : "updateDate", + "type": "histogram", + "bucketType": "fixedInterval" }, "organization": { - "type": "terms" + "type": "term", + "sortedBy": "count", + "minimumOccurrenceCount": 500 }, "created": { - "type": "histogram" + "type": "histogram", + "bucketType": "fixedBucketCount" }, "format": { - "type": "terms" + "type": "terms", + "sortedBy": "value", + "minimumOccurrenceCount": 10 }, "usage": { "type": "filter", @@ -122,6 +123,7 @@ Example: "$schema": "http://json-schema.org/draft/2019-09/schema" } ``` + ### 2. Extending queryables response The queryable response will advertize that an attribute is facettable or not, to eventually skip the `/facets` request. From a0c7d2fb21aa8a6db68f7c7a60a4be64552e9140 Mon Sep 17 00:00:00 2001 From: Florent Gravin Date: Thu, 11 Jan 2024 16:20:01 +0100 Subject: [PATCH 7/7] facet: improve filter facet description --- proposals/facets/README.md | 11 ++++++----- 1 file changed, 6 insertions(+), 5 deletions(-) diff --git a/proposals/facets/README.md b/proposals/facets/README.md index e0f6c127..44c1f380 100644 --- a/proposals/facets/README.md +++ b/proposals/facets/README.md @@ -48,12 +48,13 @@ lets the user run "sub-queries" cheaply to have a better understanding of the co #### Examples -The facet `hasDownloads` will return the amount of records which have at least 1 distribution of a download type (CSV, Excel...): -* `hasDownloads` : 2051 records - -The facet `hasMaps` will return the amount of records which have at associated to it: -* `hasMaps` : 11495 records +The facet `is available by` will provide 2 sub-queries : +- `Download service` returns the amount of records which have at least 1 distribution of a download type (CSV, Excel...): +- `Visualization service` returns the amount of records which have at least 1 distribution of a type (WMS, WMTS...): +Is available by +* `Download service`: 300 records +* `Visualization service`: 243 records ## Requirements