-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
VODML Mapping vs ModelInstanceInVot #23
Comments
I'm taking a closer look to compare the annotation schemes today. I notice that the preview PDF in the ModelInstanceInVot repository is not current (2020-08-18). Is that correct? |
The CI failed!
I’ve been trying to run it again
I'm taking a closer look to compare the annotation schemes today.
I notice that the preview PDF <https://github.com/ivoa-std/ModelInstanceInVot/releases/download/auto-pdf-preview/vodml-instance-vot-draft.pdf> in the ModelInstanceInVot repository is not current (2020-08-18).
The one under doc <https://github.com/ivoa-std/ModelInstanceInVot/blob/master/doc/model-instance-in-vot.pdf> (2020-09-15) appears to match the current serializations.
Is that correct?
yes, the one under the doc is always uptodate
|
Good clarification Laurent, but I think there is a typo in your text above. TABLE_RAW_MAPPING shoukd write "TABLE_ROW-TEMPLATE", shouldn't it ? |
right, fixed |
I wrote this up some days ago.. forgot to push the green button!. General comment: MODEL_INSTANCE: It is a specific goal of our Annotation Syntax to 'Annotate files to multiple models' (Mapping doc; Section 2.3.2). This can be either different products ( MANGO, Cube ) or different versions of the same model. I think this is a feature we want to preserve, and I am expecting the case in Issue #29 does just this? GLOBALS/TEMPLATES in the Mapping syntax:
TEMPLATES hold all instances which are created per row of the associated TABLE, so looks equivalent to the ModelInstanceInVot TABLE_ROW_TEMPLATE but at a higher level
So, in your first pair of diagrams.. you cannot consider the 1 TABLE to represent 1 Dataset, or even 1 'root' object.
TABLE_MAPPING..(TABLE_TEMPLATE in diagrams?):
But this element appears to tie very directly with the VOTable serialization structure
How does this accommodate an INSTANCE whose content is distributed among multiple TABLEs? In the second pair of diagrams.. there can certainly be INSTANCES in GLOBALS which contain elements from multiple TABLES. Summary:
Additional Comments:
|
Long post multiple answers
The use case sections in both document differ, so you comparison is teneous.
In addition to this, we have a couple of requirements
To facilitate both annotation and parsing tasks,
|
Recently
We could easily move back to a multi-model version:
The |
The I would say that your {TEMPLATE} matches my {TABLE_ROW_TEMPLATE}. I added one level {TABLE_MAPPING} to encompass both metadata and data.
|
The annotation is independent of the data structure in sense of the hierarchy the mapping elements (INSTANCE ATTRIBUTE COLLECTION) does not depend on the way that data arranged into the table. This feature allows annoters to work with pre-defined blocks which is a very important requirement.
If you have multiple tables, let's say source + detections:
If you have meta-data spread over multiple tables (any use case?) , you can use the @dmref/@id pattern |
You are right to say that
There is an example here ex: {GROUPBY}, allows to retrieve per-source data from tables containing a mix of source data |
Without something consuming it, it's hard for me to see what comes out of it. It looks like a SORT method, producing a long collection of instances rearranged so that the instances with common 'filter' criteria are together. I'm not sure what the benefit of that is. In the ZTF example you mentioned, appears to create:
So each Source has associated TimeSeries containing 1 Point? The document example uses role:test.lightcurve generating a series of Collections of 'test:photometric.point', first points with "R" filter, then points with "G" filter, then points with "V" filter. I would expect in such an example to create 3 instances of LightCurve, one with the "R" points, one with the "G" points, and one with the "V" points.. but there is no such Instance. If I'm interpreting this correctly, the only effect GroupBy has is to sort the records. This isn't a model feature, but a client side operation. eg: give me all your Source records in this region, sorted by source_id. The last clause is a client request which has nothing to do with the Source model. Bottom line: if this is all correct, and GroupBy is ~= Sort, then it is a feature outside the scope of "Mapping the data to data models" which the Annotation is supposed to support. |
Hi Mark,
Partial answer
There is only one filter (zg) in the ZTF example you are pointing. If we
would have 3 filters we would indeed create 3 different light curves
(tsdata)
Cheers
François
Le 07/04/2021 à 05:11, Mark Cresitello-Dittmar a écrit :
…
*GROUPBY*
There is an example here
<https://github.com/ivoa/dm-usecases/blob/main/usecases/time-series/fb%2Bml-proposal/TimeSeriesMangoZTF.annot.xml>
ex: {GROUPBY}, allows to retrieve per-source data from tables
containing a mix of source data
Without something consuming it, it's hard for me to see what comes out
of it. It looks like a SORT method, producing a long collection of
instances rearranged so that the instances with common 'filter'
criteria are together. I'm not sure what the benefit of that is.
In the ZTF example you mentioned, appears to create:
* a collection of MangoObject (now 'Source') grouped by identifier
o so we have multiple Source instances all id=a, then all id=b,
then all id=c, etc
* each Source has an associated VOModelInstance which is a
'tsd:tsdata' (TimeSeries)
o the tsdata instance contains a collection of tsd:Point, but
I'm not sure how many Points are in the Collection
o The GroupBy says the top level Source is instantiated from
each row of the table, so there should be only 1 Point in each
tsdata instance... generated by *the* row creating the top
level Source instance.
So each Source has associated TimeSeries containing 1 Point?
The document example uses role:test.lightcurve generating a series of
Collections of 'test:photometric.point', first points with "R" filter,
then points with "G" filter, then points with "V" filter. I would
expect in such an example to create 3 instances of LightCurve, one
with the "R" points, one with the "G" points, and one with the "V"
points.. but there is no such Instance.
If I'm interpreting this correctly, the only effect GroupBy has is to
sort the records. This isn't a model feature, but a client side
operation. eg: give me all your Source records in this region, sorted
by source_id. The last clause is a client request which has nothing to
do with the Source model.
Bottom line: if this is all correct, and GroupBy is ~= Sort, then it
is a feature outside the scope of "Mapping the data to data models"
which the Annotation is supposed to support.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#23 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AMP5LTA3CMW5R2OC3XCSCOTTHPEOZANCNFSM4ZXJMUMA>.
|
On Tue, Apr 06, 2021 at 08:11:40PM -0700, Mark Cresitello-Dittmar wrote:
Bottom line: if this is all correct, and GroupBy is ~= Sort, then
it is a feature outside the scope of "Mapping the data to data
models" which the Annotation is supposed to support.
+1 for "outside the scope" for me (even if it's really not sorting).
If we start down this path, we'll eventually put in all of
relational algebra into the annotation scheme. We already have ADQL
for that, let's not re-invent SQL-in-XML (a.k.a ADQL 1.0, which
fortunately didn't really happen).
No, I am convinced that you simply will not be able to normalise all
tabular structures out there so that a hypothetical client will
always receive the same structure after "decoding" the annotation
(ignoring for the moment that I claim very few clients will even want
that).
Which, of course, is another point for adopting small, flexible,
independent, per-domain DMs and annotations...
|
You might be worry to see such dataset (ZTF) circulating, but you have to admit that they do exist. This is one of the examples we have been committed to work on 2 years ago in the frame of the TDIG. The role of GROUP_BY is just to tell the parser how to extract per-source data from a table containing data related to multiple sources. In this case, using a model + annotation allows to do something which is not possible otherwise. |
I confirm - it is common in VizieR to have in the same table different objects which have been observed several times examples:
|
Why does VODML Mapping make a distinction between COLUMN, CONSTANT and LITERAL and ModemInstanceInVot doesn't. Let's figure out a table where all data are related to the same source identified by source_id. There are 3 possible configuration: <PARAM name="source_id" value="MY_SOURCE" /> 3- .....
<FIELD name="source_id" .... />
....
<TR><TD>"MY_SOURCE" </TD>....</TR>
<TR><TD>"MY_SOURCE" </TD>....</TR>
... The 3 options are valid and can be met. VODML Mapping proposes one different statement for each of these situations.1- Missing value <ATTRIBUTE dmrole="model:Source.id">
<LITERAL value="MY_SOURCE" dmtype="ivoa:string"/>
</ATTRIBUTE> 2- Param value <ATTRIBUTE dmrole="model:Source.id">
<CONSTANT ref="source_id" dmtype="ivoa:string"/>
</ATTRIBUTE> 3- Field value <ATTRIBUTE dmrole="model:Source.id">
<COLUMN ref="source_id" dmtype="ivoa:string"/>
</ATTRIBUTE> ModemInstanceInVot proposes one single statement for all of these situations.<ATTRIBUTE dmtype="ivoa:string" ref="source_id" value="MY_SOURCE"/> The processing rules are given by the spec: This way to proceed makes easier the use annotation components, and hence facilitates the annotation process. |
There is certainly some consolidation which can happen with these elements. Correction: VODML Mapping uses
It distinguishes them, so that there is only 1 action for each case.. One point to consider:
|
[FIELD, PARAM, LITERAL] vs [COLUMN, CONSTANT, LITERAL] fixed in my post In my perspective, the scope of mapping XML schema is to validate the syntax, nothing more. With The Considering this, the gain of |
On Fri, Apr 23, 2021 at 11:01:43AM -0700, Mark Cresitello-Dittmar wrote:
> Why does _VODML Mapping_ make a distinction between FIELD, PARAM
> and LITERAL and _ModemInstanceInVot_ doesn't.
>
There is certainly some consolidation which can happen with these elements.
Correction: VODML Mapping uses
* LITERAL for in-line value
* CONSTANT for reference to PARAM
* COLUMN for reference to FIELD
It distinguishes them, so that there is only 1 action for each case..
Whether this is more or less convenient than having 1 annotation
node is the debate.
Simple test: Will it require less code? And are there fewer traps
that code needs to ward against?
I think I can give you the answer in all cases: Yes, do away with
them. ATTRIBUTE (and perhaps ITEM) with either @ref or @value is
plenty enough and a lot more straightforward. If you're just
plunging into this discussion, see
https://github.com/msdemlei/astropy.
One point to consider:
* GLOBALS section is for elements which should generate ONLY 1
INSTANCE; so they can be assigned an ID and be referenced.
GLOBALS, by the way, will, I bet, fail the less-code test as well.
Of course, items in current TEMPLATES (that will fail the test well,
INSTANCE-s can just live in VO-DML) can have IDs as well and hence be
referenced, independently of whether or not they are constant for all
rows.
* It should be a problem for any attribute under here to
resolve to a FIELD
Yes. One more reason to do away with GLOBALS.
* so, the Mapping schema does not allow COLUMN under GLOBALS.
But clients will have to deal with instances with COLUMNs anyway.
You're not helping them by having a magic place where they might be
allowed to forget it.
* TEMPLATES section is for elements where there is 1 instance per table row.
* here all 3 forms can be used LITERAL|CONSTANT|COLUMN
* for LITERAL and CONSTANT, the value is repeated for each instance.
* this means providers can have compact serialization for
constant information
I think I don't get this point. Could you give an example?
|
On Mon, Apr 26, 2021 at 12:32:03AM -0700, Laurent MICHEL wrote:
This rule is not written in the XML schema but in the spec
(or it should be).
+1 (or how many points do I have?)
|
We seem to have general consensus on consolidation of
LITERAL|CONSTANT|COLUMN elements.
Whether or not they are absorbed into ATTRIBUTE is a TBD (for me anyway).
* in Mapping syntax, the element is separate from the role (ATTRIBUTE
identifies the role the element plays in the parent object).
so the instance itself is annotated the same, whether it is part of a
parent object or independent.
On Mon, Apr 26, 2021 at 4:29 AM msdemlei ***@***.***> wrote:
> One point to consider:
> * GLOBALS section is for elements which should generate ONLY 1
> INSTANCE; so they can be assigned an ID and be referenced.
GLOBALS, by the way, will, I bet, fail the less-code test as well.
Of course, items in current TEMPLATES (that will fail the test well,
INSTANCE-s can just live in VO-DML) can have IDs as well and hence be
referenced, independently of whether or not they are constant for all
rows.
ID in GLOBAL vs TEMPLATE:
I don't think the above is true.. perhaps @gerard Lemson ***@***.***> can
chime in here.
Since an INSTANCE under TEMPLATE resolves to >1 instance, I don't think it
can be referenced since there is no way to identify which of the n-rows
instances is the target.
The exception may be if the INSTANCE is composed entirely of LITERAL or
CONSTANT but even then I'd say it probably shouldn't because each iteration
should generate an independent instance.
> * TEMPLATES section is for elements where there is 1 instance per table
row.
> * here all 3 forms can be used LITERAL|CONSTANT|COLUMN
> * for LITERAL and CONSTANT, the value is repeated for each instance.
> * this means providers can have compact serialization for
> constant information
I think I don't get this point. Could you give an example?
Simply using CONSTANT|LITERAL under TEMPLATE to map constant or missing
data in a table.
(ie: serialized as key/param instead of column since the value is constant.)
One place this has come up in these examples is under Standard Properties:
* uses LITERAL to add "label" and "ucd" values to each associated
PARAMETER
Another is to assign "dependent" = TRUE|FALSE to the cube observables in
the Time Series case.
|
As pointed with good reasons by @msdemlei , repeating |
Le 05/05/2021 à 10:38, Laurent MICHEL a écrit :
As pointed with good reasons by @msdemlei
<https://github.com/msdemlei> , repeating |FIELD| elements
Do you mean attributes ? elements ? or both ?
the difference is that xml elements have xml ids while attributes do not
have !
into the annotation is useless and possibly confusing.
This has been done however in the MANGO serializations for ***@***.***| and
***@***.***| because there was no other way to do it.
To avoid this, the annotation syntax must be able to refer to these
|FIELD| elements.
This is major enhancement to bring to /ModelInstanceVot/.
This could look like this |<SC_FIELD ucd="column_ID"/>| or |<SC_FIELD
desc="column_ID"/>|
—
Ahhh, in SC_FIELD ?
…
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#23 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AMP5LTBH6BTL4UQHUS6JA33TMD7X7ANCNFSM4ZXJMUMA>.
|
I mean getting attributes ( |
I'm not sure I follow..
The VODML Mapping syntax certainly does not support this, and is similar to the problem where you cannot annotate individual components of an VOTable array. I think that annotating into the sub-structure of the VOTable elements was considered out of scope. How would this integrate with the ATTRIBUTE element, which above you declare has the benefit of having only a single form?
would this become
To indicate that the value comes from the 'ucd' element of the FIELD with ID "_pos_ra". We've talked about consolidating the Mapping COLUMN|CONSTANT|LITERAL elements to a single form, but I'm less convinced about absorbing these into the element which assigns the 'role'. I like the fact that with the Mapping syntax, the annotation for |
The problem is well laid out but the answer need more thinking. |
Very late in the discussion, but I do not understand the original comment and I don't think it describes the VO-DML mapping proposal correctly. Even if a change would be desired, it would constitute a minor update to include it in the original approach . No reason to create a different proposal. |
Both proposals (VOML Mapping, ModelInstanceInVot) have similar structures. There is nevertheless a major difference that is justified here.
VOML Mapping
ModelInstanceInVot
The advantage of the
ModelInstanceInVot
mapping structure is even more obvious for multi-table VOTables.ModelInstanceInVot
The text was updated successfully, but these errors were encountered: