Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Flag diacritics #14

Open
balintdom opened this issue Dec 8, 2017 · 3 comments
Open

Flag diacritics #14

balintdom opened this issue Dec 8, 2017 · 3 comments
Labels

Comments

@balintdom
Copy link

Hello, I can't understand what flag diacritics do. Could you give me a simple working example, and a step-by-step explanation about what they are doing? Unfortunately this time, google is not really my friend...

@matekatona
Copy link

the example near the end of this page helped me a lot: https://fomafst.github.io/morphtut.html

@DavidNemeskey
Copy link
Collaborator

DavidNemeskey commented Dec 9, 2017

It is best to think of flag diacritics as variable( setting/reading instruction)s in a programming language. FSAs/FSTs are memory-less in the sense that the current state and the current input determines the new state (and the output); previous inputs and states have no influence. This can be a problem, when you need to remember something.

An example would be the language "ba+b|ca+c". Here you must remember what the first symbol was, and only accept the string if it ends with the same symbol. Ideally, you would want to do something like:

Lexicon Root
b [input = b] As ;  ! Store 'b' as the variable 'input'
c [input = c] As ;  ! Store 'c' as the variable 'input'

Lexicon As
a As ;
a End ;

Lexicon End
[input] # ;  ! Load the content of the variable 'input' and expect that again on the (input) tape

Only one problem: FSAs don't have a memory so we don't have any variables we could "save" the input to / "load" it from. So one valid lexc solution would be:

Lexicon Root
b As_after_b ;
c As_after_c ;

Lexicon As_after_b
a As_after_b ;
a End_with_b ;

Lexicon As_after_c
a As_after_c ;
a End_with_c ;

Lexicon End_with_b
b # ;

Lexicon End_with_c
c # ;

You can see the problem already: since you don't have any memory, you have to encode the information that you read b or c into the state space; or in lexc, into the continuations that you use. In this case, you had to split End into the two End_with*_ lexica so that you can expect b in one and c in the other. But that's not enough, you also had to split As into two as well, because that is the only way you can encode what you read previously. And if you have not only b and c, but e.g. [a-z], then you would need to split the "ideal" lexica As and End into 26 parts...

So enter flag diacritics. Just think of them as operations on a variable:

  • P is save(var, value), or var = value
  • R is if var == value
! This is very important, never forget to define your flags as multi character symbols!
Multichar_Symbols @P.INPUT.B@ @P.INPUT.C@ @R.INPUT.B@ @R.INPUT.C@

Lexicon Root
[email protected]@ As ;  ! Store 'b' as the variable 'input'
[email protected]@ As ;  ! Store 'c' as the variable 'input'

Lexicon As
a As ;
a End ;

Lexicon End
[email protected]@ # ;  ! if INPUT == B
[email protected]@ # ;  ! if INPUT == C

You can see it is almost as simple as the ideal solution. One important difference is that the B in @P.INPUT.b@ (etc.) has nothing to do with the b we read on the tape, because the values of the flag variables live in a different namespace. So we could have called B X and C Y, and it would have worked the same.

In this particular case btw. we could have used @U...@ (unification) instead of both @P...@ and @R...@ -- U on a variable is P the first time and R afterwards.

Hope this helps.

@balintdom
Copy link
Author

Yes, it is really helpful. Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants