Flag diacritics #14

balintdom · 2017-12-08T17:42:13Z

Hello, I can't understand what flag diacritics do. Could you give me a simple working example, and a step-by-step explanation about what they are doing? Unfortunately this time, google is not really my friend...

matekatona · 2017-12-08T17:48:39Z

the example near the end of this page helped me a lot: https://fomafst.github.io/morphtut.html

DavidNemeskey · 2017-12-09T21:07:58Z

It is best to think of flag diacritics as variable( setting/reading instruction)s in a programming language. FSAs/FSTs are memory-less in the sense that the current state and the current input determines the new state (and the output); previous inputs and states have no influence. This can be a problem, when you need to remember something.

An example would be the language "ba+b|ca+c". Here you must remember what the first symbol was, and only accept the string if it ends with the same symbol. Ideally, you would want to do something like:

Lexicon Root
b [input = b] As ;  ! Store 'b' as the variable 'input'
c [input = c] As ;  ! Store 'c' as the variable 'input'

Lexicon As
a As ;
a End ;

Lexicon End
[input] # ;  ! Load the content of the variable 'input' and expect that again on the (input) tape

Only one problem: FSAs don't have a memory so we don't have any variables we could "save" the input to / "load" it from. So one valid lexc solution would be:

Lexicon Root
b As_after_b ;
c As_after_c ;

Lexicon As_after_b
a As_after_b ;
a End_with_b ;

Lexicon As_after_c
a As_after_c ;
a End_with_c ;

Lexicon End_with_b
b # ;

Lexicon End_with_c
c # ;

You can see the problem already: since you don't have any memory, you have to encode the information that you read b or c into the state space; or in lexc, into the continuations that you use. In this case, you had to split End into the two End_with*_ lexica so that you can expect b in one and c in the other. But that's not enough, you also had to split As into two as well, because that is the only way you can encode what you read previously. And if you have not only b and c, but e.g. [a-z], then you would need to split the "ideal" lexica As and End into 26 parts...

So enter flag diacritics. Just think of them as operations on a variable:

P is save(var, value), or var = value
R is if var == value

! This is very important, never forget to define your flags as multi character symbols!
Multichar_Symbols @P.INPUT.B@ @P.INPUT.C@ @R.INPUT.B@ @R.INPUT.C@

Lexicon Root
[email protected]@ As ;  ! Store 'b' as the variable 'input'
[email protected]@ As ;  ! Store 'c' as the variable 'input'

Lexicon As
a As ;
a End ;

Lexicon End
[email protected]@ # ;  ! if INPUT == B
[email protected]@ # ;  ! if INPUT == C

You can see it is almost as simple as the ideal solution. One important difference is that the B in @P.INPUT.b@ (etc.) has nothing to do with the b we read on the tape, because the values of the flag variables live in a different namespace. So we could have called B X and C Y, and it would have worked the same.

In this particular case btw. we could have used @U...@ (unification) instead of both @P...@ and @R...@ -- U on a variable is P the first time and R afterwards.

Hope this helps.

balintdom · 2017-12-15T11:26:58Z

Yes, it is really helpful. Thank you!

DavidNemeskey added the question label Dec 15, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Flag diacritics #14

Flag diacritics #14

balintdom commented Dec 8, 2017

matekatona commented Dec 8, 2017

DavidNemeskey commented Dec 9, 2017 •

edited

Loading

balintdom commented Dec 15, 2017

Flag diacritics #14

Flag diacritics #14

Comments

balintdom commented Dec 8, 2017

matekatona commented Dec 8, 2017

DavidNemeskey commented Dec 9, 2017 • edited Loading

balintdom commented Dec 15, 2017

DavidNemeskey commented Dec 9, 2017 •

edited

Loading