-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
offset information for segments (from, to) -- what's the best move? #16
Comments
You're giving a lot of elements that may help shaping an answer. My feeling would be that we need a specific class (not abusing att.citing for reasons we could list up in this thread), say att.referring with the following properties: |
Thanks. Do we have precedents with attributes having double (multiple) data types? I kind of assumed up till now that it is a matter of principle to have a single data type defined for attributes. I'm just wondering about how revolutionary such a thing would be for the Council. |
I can imagine an objection: "what if someone does |
Concerning your question: yes, we do have precedents, see http://www.tei-c.org/release/doc/tei-p5-doc/fr/html/ref-teidata.probCert.html (for defining the value of http://www.tei-c.org/release/doc/tei-p5-doc/fr/html/ref-att.global.responsibility.html) |
As to your objection, this is indeed the role of |
Super, thanks for this info. |
Yes, go! |
Started by preparing a separate project for that in the LingSIG space: https://github.com/orgs/LingSIG/projects/2 |
The links do not work! |
Oh that's bad news. They work for me, which may mean that they are not accessible to people from the "organization" only (I've sent you an invite). I didn't realise that within public organizations at GitHub, access may be restricted. That's a bit worrying. |
Ahh, teidata.probCert is not exactly a precedent: it's a single data type that has two internal variants. Still, let's experiment and see. |
Similarly with data.numeric (one datatype to bind them...): <alternate>
<dataRef name="double"/>
<dataRef name="token" restriction="(\-?[\d]+/\-?[\d]+)"/>
<dataRef name="decimal"/>
</alternate> |
I have created the relevant specs now but got stuck at the Schematron. I already use a rule which forces both attributes to either be uniformly integer or uniformly URI. At this point, I would like to open this for discussion. Will reference the relevant diff in the next note. |
Implementing changes proposed and discussed in laurentromary/stdfSpec#16
Here goes: LingSIG/TEI@c30ab72 Two specs added: |
This is a better link, to a pull request which combines the commits (needed to fix a bit): |
(just a note: it works as expected; the Schematron is indeed too tight and @referringMode seems indeed spurious) |
Should not we drop the technical check and see @referringMode as a documentation mechanisms (like @Unit in a metaphoric way)? |
And I was so happy with this simplistic view that there's just the integer offset and ID-based pointing... sigh. OK then, more power to patterns, less mess in the Schematron layer: let's define patterns of
(the last one is meant as a catch-all, for other uses) I would model that as an alternation of patterns, driven by the value of There should be a way to fix one value of the referringMode, to minimize verbosity in the actual markup. Maybe simply in the particular ODD, so it's outside of the proposal. Another thing that has to be documented in the encodingDesc (I guess there?) is the initial value of character-based indexes (it can be 0 or 1). Does the above make sense? |
PS. I can do that in RNG, but I'm afraid I see no way of implementing this in ODD, where I can't see a way to express a pattern of attributes... |
All in all this is exactly the way I was seeing this. We need to find a clever way to implement. |
Yep. The way I see now is not very clever, but I've asked the gurus: |
Ah, there is a theoretical way to express this in pure ODD, but depending on one's view, the fact that it doesn't work is either a feature or a bug: TEIC/Stylesheets#144 |
Implemented Syd's awesome suggestion, the result is a bit too lax now in the sense that it allows for One more thing: I put in "second" as a tip of the hat to our speech-transcription colleagues, but there surely should be more. I'll have a look at Thomas Schmidt's article to see what other values for referringMode should have. I am not sure if the anonymous "interval" should be there (?) |
This is really interesting! Concerning "second" the format is determined by the ISO 8601 (or so). I guess we should have "temporal" there. |
I have left "seconds" out after all because after all speech transcription assumes an indexed timeline. Because URI is so loose (it's so hard to even create an ill-formed URI), I have added the value "id" to One burning question is: should I add |
I think that the code-related portion is done, what remains is to weave some narration into the IA chapter (not easy, because it has a really nice flow exactly where we should somehow interrupt it), and to decide whether this ticket should only modify This is what I put into my ODD to make this work for <elementSpec ident="seg" module="linking" mode="change">
<classes mode="change">
<memberOf key="att.referring"/>
</classes>
</elementSpec>
<classSpec ident="att.referring" mode="change" type="atts" module="tei">
<constraintSpec scheme="schematron" ident="default_mode" mode="replace">
<constraint>
<sch:rule
context="*[local-name() = ('span','seg')][not(@referringMode) and @from and @to]">
<sch:assert test="@from castable as xsd:nonNegativeInteger">The
default form of @from is a non-negative integer</sch:assert>
<sch:assert test="@to castable as xsd:nonNegativeInteger">The
default form of @to is a non-negative integer</sch:assert>
</sch:rule>
</constraint>
</constraintSpec>
<attList>
<attDef ident="referringMode" usage="opt" mode="change">
<defaultVal>icp</defaultVal>
</attDef>
</attList>
</classSpec> This is only complex because there appears to be no way to communicate the current state of ODD to Schematron. Otherwise, the constraint could be part of att.referring and we would only need to modify the default value in the ODD. |
Instead of adding |
This is probably a good move! |
This is a request for advice.
I need to provide offset information for
<seg>
, ISO-LAF-style (numerical, starting from 0).I reject
@corresp
with the quasi-XPointer that we are never going to see implemented, I'm afraid, and which is just scary otherwise. I want@from
and@to
so that the markup is understandable not only for high-end parsing tools. There are two options for this, minimally:@mode="add"
), as<dataRef name="nonNegativeInteger"/>
<seg>
to the class att.citing, and fix the@unit
attribute to the value "character" (and document the convention of starting from 0, but that needs to be done either way)Question: which of these moves feels more in-line with our overall goal here, and which of them is more likely to get accepted by the Council when we get to submit the relevant tickets? The att.citing strategy feels more universal, while the former creates yet another {
@from
,@to
} pair, which the Council may be unhappy with.The text was updated successfully, but these errors were encountered: