From ea92ae8e6faa1b87f648a1f4dd64618d6272fdab Mon Sep 17 00:00:00 2001 From: BONNAREL FRANCOIS Date: Fri, 19 Jan 2024 19:33:01 +0100 Subject: [PATCH] initial upgrade of the legacy note --- DataLinkImp.tex | 212 ++++++++++++++++++++++++++++++++++++++---------- Makefile | 2 +- 2 files changed, 169 insertions(+), 45 deletions(-) diff --git a/DataLinkImp.tex b/DataLinkImp.tex index 4afbfff..3ffae43 100644 --- a/DataLinkImp.tex +++ b/DataLinkImp.tex @@ -30,10 +30,10 @@ \begin{abstract} -This note gives some hints on how the DataLink specification should be used. -The DataLink specification document \citep{std:DataLink} describes the linking of data discovery metadata -to access to the data itself, further detailed metadata, related -resources, and to services that perform operations on the data. +This note gives some hints on how the DataLink specification should be used. The DataLink +specification document \citep{2023ivoa.spec.0617D} describes the linking of data discovery +metadata to access to the data itself, further detailed metadata, related resources, and +to services that perform operations on the data. \end{abstract} @@ -60,12 +60,11 @@ \section*{Conformance-related definitions} \section{Introduction} -The DataLink specification \citep{std:DataLink} defines mechanisms for connecting data items discovered -via one service to related data products, and web services +The DataLink specification \citep{2023ivoa.spec.0617D} defines mechanisms for connecting +data items discovered via one service to related data products, and web services that can act upon the data. -The \blinks web service capability is a web service capability -for drilling +The \blinks\ web service capability is a web service capability for drilling down from a discovered data item such as a dataset identifier, a source in a catalog or any other data item. In the first case (typically an IVOA publisher dataset identifier) it allows to find details about the data files that can be @@ -73,7 +72,7 @@ \section{Introduction} services that can act upon the data (usually without having to download the entire dataset). In the latter cases it allows to associate metadata, images, cubes, spectra, timeseries or ancillary data to a source in the sky or other -kind of time. The expected usage is for DAL (Data Access Layer) +kind of iteim. The expected usage is for DAL (Data Access Layer) data discovery services (e.g.\ a TAP service \citep{2010ivoa.spec.0327D} with the ObsCore \citep{2017ivoa.spec.0509L} data model or one of the simple DAL services) to provide an identifier that @@ -82,79 +81,204 @@ \section{Introduction} the data. -The current document is an implementation note for \blinks endpoint recognition in various contexts. -We are reviewing three methods by which DataLink documents can be associated to resource entries in service responses and suggest possible choices according to the context. +The current document is first an implementation note for \blinks\ endpoint recognition +in various contexts (section \ref{sec:reco}). We are reviewing three methods by which +DataLink documents can be associated to resource entries in service responses and suggest +possible choices according to the context. -\section{DataLink and SIAP-2.0 or ObsTAP services} +In addition it documents how to fill the \blinks\ enpoint response FIELDS (section +\ref{sec:fill}). -SIAP-2.0 \citep{2015ivoa.spec.1223D} services are queriable by setting contraints on the four "axes" of the data (2D-space, spectral, time and polarization) and their properties. Other archive metadata or data details are also queriable (target, collection, facilities, etc...). ObsTAP services are TAP services delivering the ivoa.Obscore table queriable via ADQL. The successfull response is a VOTable containing a mandatory result TABLE and optional service descriptor resource(s). +\section{DataLink \blinks\ endpoint recognition mechanisms} +\label{sec:reco} +\subsection{DataLink and SIAP-2.0 or ObsTAP services} -\subsection{DataLink discovery via format and reference columns} -The result table describes the dataset characterization on the different axes and give additional operative features. The various columns are mapped from the ObsCore-1.1 data model (reference). The \verb|accessReference| and \verb|AccessFormat| fields give different solutions to reach the dataset for download or access. This can be done in two ways: +SIAP-2.0 \citep{2015ivoa.spec.1223D} services are queriable by setting constraints on +the four "axes" of the data (2D-space, spectral, time and polarization) and their properties. +Other archive metadata or data details are also queriable (target, collection, facilities, +etc...). ObsTAP services are TAP services delivering the ivoa.Obscore table queriable +via ADQL. The successfull response is a VOTable containing a mandatory result TABLE and +optional service descriptor resource(s). + +\subsubsection{DataLink discovery via format and reference columns} +The result table describes the dataset characterization on the different axes and give +additional operative features. The various columns are mapped from the ObsCore-1.1 +data model \citep{2017ivoa.spec.0509L}. The \verb|accessReference| and \verb|AccessFormat| +fields give different solutions to reach the dataset for download or access. This can be +done in two ways: \begin{itemize} -\item The data discovery step has yielded a result with the \verb|AccessFormat| value (media type) : +\item The data discovery step has yielded a result with the \verb|AccessFormat| value +(media type) : \verb|application/x-votable+xml;content=datalink| - To the client, this indicates that what is given in the access reference (e.g. the \verb|access_url| column in ObsTAP or SIAP-2.0) is a DataLink document (see below for DataLink functionalities). -\item In case the media type is different, the access reference is a direct link to the dataset download. + To the client, this indicates that what is given in the access reference (e.g. the +\verb|access_url| column in ObsTAP or SIAP-2.0) is a DataLink document (see below for +DataLink functionalities). +\item In case the media type is different, the access reference is a direct link to the +dataset download. \end{itemize} -\subsection{SIAP-2.0, ObsTAP and service descriptors} -To enable additional DataLink functionalities, SIAP-2.0 or ObsTAP services can then add a service descriptor in the query response that indicates the availability of a DataLink service accompanying the DAL service. It references one (or more) field(s) from the DAL response. +\subsubsection{SIAP-2.0, ObsTAP and service descriptors} +To enable additional DataLink functionalities, SIAP-2.0 or ObsTAP services can then add +a service descriptor in the query response that indicates the availability of a DataLink +service accompanying the DAL discovery service. It references one (or more) field(s) from +the DAL response. -This is explained in detail in DataLink specification, section 4.2. The net result is that DataLink-enabled clients can find ancillary data and use associated services for data access or processing by virtue of being able to retrieve DataLink documents. +This is explained in detail in DataLink specification, section 4.4. The net result is that +DataLink-enabled clients can find ancillary data and use associated services for data access +or processing by virtue of being able to retrieve DataLink documents. -The service descriptor embedded in an ObsTAP query response is allowed because ObsTAP is a peculiar TAP service and TAP allows the TAP response to contain additional resources in addition to the "results" one (See section 2.9 of TAP-1.0 specification). Hence the inclusion of a service descriptor service is valid. +The service descriptor embedded in an ObsTAP query response is allowed because ObsTAP +is a peculiar TAP service and TAP allows the TAP response to contain additional resources +in addition to the "results" one (See section 2.9 of TAP-1.0 specification) +\footnote{Apparently TAP-1.1 doesn't say explicitely that additional resources to the +"results" one are allowed in the VOTable response}. Hence the inclusion of a service +descriptor service is valid. -\section{DataLink in the context of other dataset discovery methods} -Datasets can also be discovered in various other ways. Dataset "discovery" is defined as the discovery of the IVOA \verb|publisher_did| of a dataset in standard services or by the ad hoc discovery of "obsids" in combination with the a priori knowledge of a DataLink service "aware" of these obsids. +\subsection{DataLink in the context of other dataset discovery methods} +Datasets can also be discovered in various other ways. Dataset "discovery" is defined as +the discovery of the IVOA \verb|publisher_did| of a dataset in standard services or by the +ad hoc discovery of proprietary "dataset-ids" in combination with the a priori knowledge of +a DataLink service "aware" of these dataset-ids. -Beside the most standard and modern discovery path via ObsTAP/SIAP-2.0, the following alternative scenarii are possible: +Beside the most standard and modern discovery path via ObsTAP/SIAP-2.0, the following +alternative scenarii are possible: \begin{itemize} -\item The discovery can be done via an SIAP-1.0 or an SSA service. A DataLink service descriptor should have been added to the service query response to allow further guiding to additional resources -\item The discovery can be done via a paper in a journal implementing publication of datasets associated with articles. The editor should describe or implement a DataLink service (eg via a service descriptor valid for the journal) for further usage and access to download DataAccess metadata resources. -\item Logs of observations are dataset descriptions belonging to the catalog category. They can be exposed in the VO using ConeSearch or TAP. They are in general not consistent with the ObsCore model. However they allow some kind of "dataset discovery". A Service descriptor can be added to the VOTable containing the log file. -\item HiPS is an IVOA standard for a global and hierarchical allsky access to pixel and catalogue data (reference HIPS). HiPS cells contain ids for original dataset which have been processed to build the Healpix maps. HiPS metadata file allows to generate a specific VOTable record for each cell of the HiPS collection including a "progenitor" URL. This can be a new way to discover datasets and start a DataLink session to find out additional associated resources. +\item The discovery can be done via an SIAP-1.0 or an SSA service. A DataLink service +descriptor should have been added to the service query response to allow further guiding +to additional resources +\item The discovery can be done via a paper in a journal implementing publication of +datasets associated with articles. The editor should describe or implement a DataLink +service (eg via a service descriptor valid for the journal) for further usage and access +to download DataAccess metadata resources. +\item Logs of observations are dataset descriptions belonging to the catalog category. +They can be exposed in the VO using ConeSearch or TAP. They are in general not consistent +with the ObsCore data model. However they allow some kind of "dataset discovery". A Service +descriptor can be added to the VOTable containing the log file. +\item HiPS is an IVOA standard for a global and hierarchical allsky access to pixel and +catalogue data (reference HIPS). HiPS cells contain ids for original dataset which have +been processed to build the Healpix maps. HiPS metadata file allows to generate a specific +VOTable record for each cell of the HiPS collection including a "progenitor" URL. This can +be a new way to discover datasets and start a DataLink session to find out additional +associated resources. \end{itemize} -\section{DataLink outside Data discovery context} +\subsection{DataLink outside Data discovery context} -DataLink 1.0 provides description of resources to be attached to a dataset. In the current specification it is actually indifferent to DataLink output what is the real content of the VOTable entry it is attached to. That's why it is tempting to use DataLink mechanisms in other VO contexts than Dataset discovery. +DataLink 1.1 provides description of resources to be attached to a dataset or to whatever +item in a response table of a VO service. It is possible to use DataLink mechanisms in other +VO contexts than Dataset discovery. For example it could be useful to attach to a source in a catalog table such resources as \begin{itemize} \item data records for the same object contained in another catalog/database -\item a service interface to an external database prepared to query around the object position. +\item a service interface to an external database prepared to query around the object +position. \item Datasets associated to the source (such as images, spectra, etc..) \end{itemize} - In addition we could attach links to each measurement in a measurement list for an individual source formatted in VOTable. Light curves or radial velocity TimeSeries are good examples of such measurement lists. These links can attach: + In addition we could attach links to each measurement in a measurement list for an +individual source formatted in VOTable. Light curves or radial velocity TimeSeries are +good examples of such measurement lists. These links can attach: \begin{itemize} \item original dataset from which the measurement is extracted \item metadata describing the extraction method \item calibration information \end{itemize} We could also attach links to entities in an IVOA provenance metadata service response. -\section{Various recognition solutions for use cases introduced in section 3 and 4} - In such contexts, the \blinks resource URL attached to a record may be given in various ways: -\begin{itemize} -\item It may be deduced from an "identifier" given by one of the main table FIELD. In that case it is easy to attach a service descriptor to the main VOTable containing the measurements. -\item It may be given directly in one of the FIELD of the table (let's say the name of this FIELD is "bla"). In that case we may still have two different situations. +\subsection{Various recognition solutions for use cases introduced in section 3 and 4} + + In such contexts, the \blinks\ resource URL attached to a record may be given in various + ways: \begin{itemize} -\item If the FIELD "bla" contains URL driving different contents from row to row, it is a good idea to use an "Obscore"-like solution, consisting in addding the utype \verb|Access.reference| to FIELD "bla", and to add a "foo" column with utype \verb|Access.format| to tell the client what is the nature of the retrieved resource (\{links\} resource or whatever other media type). -\item If the FIELD "bla" contains always an URL to the \{links\} resource (or whatever media type) without any change from row to row, it is recommended to use a LINK element inside the FIELD "bla" to qualify the URL as described in DataLink specification, section 5. -\end{itemize} -\end{itemize} +\item A DataLink service descriptor (see DataLink 1.1 specification, section 4) could be added +to the main table as long as the content of one of the table FIELDs may be used as the ID +parameter in the \blinks\ endpoint. An exemple is given in subsection 4.5 of the specification. + +\item In case the \blinks\ endpoint URL is given in the main table for each row, a LINK +element SHOULD be added to the FIELD containing this URL. The content-type of the LINK will +be set to : +\begin{verbatim} + application/x-votable+xml;content=datalink +\end{verbatim} + In the example below the content of the "Datalink" FIELD is used as an URL to the +\blinks\ endpoint for each row. + \begin{verbatim} + + + +\end{verbatim} + +\item In case the main table contains an URL not systematically pointing to \blinks\ +endpoint, but may also point to responses with other content types +(e.g.\ for single file download), the ObsCore utype "Access.reference" SHOULD be +added to the FIELD and an additional FIELD with ObsCore utype "Access.format" SHOULD +contain the "application/x-votable+xml;content=datalink" value when appropriate. +A ref to the ID of the FIELD containing the URL SHOULD be added to avoid ambiguities. + +In the example below, the Image-Link contains the URL pointing to a \blinks\ response or +the dataset istself depending on the value of the Image-Format FIELD. + \begin{verbatim} + + +\end{verbatim} +\end{itemize} + +\section{Datalink \blinks\ endpoint response FIELDS usage} +\label{sec:fill} +This section lists the standard FIELDS in the response and +trate how to use them when this seems required +\begin{itemize} +\item \textbf{ID :} The ID can be a \verb|publisher_did| or any proprietary item id common to +the main service and the Datalink service. +\item \textbf{access\_url :} The usage of this FIELD is straightforward from the +specification. Just pay attention to the URL with fragments case. By using this feature +several time for the same root URL in your \blinks\ endpoint response you may distinguish +different semantics, content\_qualifier, and local\_semantics inside the same linked target. +For example this could allow to distinguish science data from calibration or auxiliary +subsets in the same file +\item \textbf{service\_def :} This is intended to refer to a service described by the service +descriptor embedded in the same VOTable document. A specific IVOA implementation note on +service descriptors should be written in the future +\item \textbf{error\_message :} There is nothing to add to the specification text here +\item \textbf{description :} This free text should be specific to each link for each specific +ID value and is intended to be read by a user when looking at the response table of the +\blinks\ endpoint. It is not really useful for the user that data providers fill this field +with a generic text valid for all links sharing the same semantics value +\item \textbf{semantics :} This field is intended mainly to give information to the client +software in order to help it to take some decision. However it is also an important information +for the end user. It is qualifying the relationship between the main item identified by the ID +value and the thing which is retrievable via the acces\_url. For example \#calibration or +\#auxiliary are not intrinsec properties of this thing but relatively to the main item +(probably a science dataset). The intrinsec thing type or quality is given by +content\_qualifier (in the case if a \#auxiliary relationship the thing pointed by the +access\_url can be a table, -an image, a cube...). Although +\item \textbf{content\_type :} The usage of this Field is straightforward. Again it is intended +to help the client software to manage the response of the access\_url +\item \textbf{content\_length :} This field is useful to inform the user or the client +software how long will the transfer last and how much memory is required to host it +\item \textbf{content\_qualifier :} The usage of this field is rather straightforward. At the +time of writing of this note it is recommended to use the ObsCore vocabulary for dataproduct +type instead of the ivoa external vocabulary which is not stabilized. Other standard vocabularies +could be used in the future according to the quality of the link response, for example it could +be the identifier of some facility consistent with the emergent facility specification +\item \textbf{local\_semantics :} local\_semantics is mainly intended to help the software to sort out the links and take decisions about what to do with them when semantics, content\_type and content\_qualifier do not allow to do it. This will contain a proprietary vocabulary useful for the end user but is not some totally free text like the one in description. +\item \textbf{link\_auth :} This field contains an information which relates to the status of the access\_url and is not dependant of the user +\item \textbf{link\_authorized :} This field will be filled by the server after the user authentification process. It is dependant both from the link and the user +\end{itemize} \section{Changes} -This is the initial version of this document. +2024-01-18: upgrade to citations of most recent VO specifications and adding a section on usage of response FIELDS\\ +2023-12608: This was the initial version of this document, porting on github a previous internal DAL note. -\bibliography{ivoatex/ivoabib,ivoatex/docrepo,localrefs} +\bibliography{ivoatex/ivoabib,ivoatex/docrepo,localrefs.bib} \end{document} diff --git a/Makefile b/Makefile index 8c92733..e4a391d 100644 --- a/Makefile +++ b/Makefile @@ -7,7 +7,7 @@ DOCNAME = DataLinkImp DOCVERSION = 1.0 # Publication date, ISO format; update manually for "releases" -DOCDATE = 2023-12-08 +DOCDATE = 2024-01-19 # What is it you're writing: NOTE, WD, PR, REC, PEN, or EN DOCTYPE = NOTE