Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GrammaticalExpression evaluate() and FOL statements #52

Closed
nathimel opened this issue Dec 30, 2024 · 5 comments
Closed

GrammaticalExpression evaluate() and FOL statements #52

nathimel opened this issue Dec 30, 2024 · 5 comments

Comments

@nathimel
Copy link
Collaborator

In GrammaticalExpression.evaluate, when we construct self.meaning, we only call the self.func on each referent in the universe, and pass no other args: https://github.com/CLMBRs/ultk/blob/main/src/ultk/language/grammar.py#L154

I've been thinking about how to handle the kinship domain, from Kemp and Regier (2012), which is expressed in FOL expressions:

Screenshot 2024-12-30 at 9 33 40 AM

Maybe I haven't fully grocked Meaning as a mapping yet, but it's not clear to me how we can interpret (evaluate) FOL expressions as they're encoded in this figure without referencing the Universe. E.g., it seems the way to implement the func for 'sister' is:

def sister(x: Referent, y: Referent, u: Universe) -> bool:
    return any( daughter(x, z) and z.is_parent(y) for z in u)

If so, then I think the GrammaticalExpression.evaluate function might need to be changed so that when it's constructing a meaning on line 154 it calls the func on not just each referent but also the Universe. Alternatively store a Universe in each GrammaticalExpression as its self.universe, but that seems like it could cause inefficiency (?)

I took a look at the learn_quant PR branch, and it wasn't clear to me I should the quantifier grammar out of the box for this, but maybe I'm wrong?https://github.com/CLMBRs/ultk/blob/9e6ef27e56f466e5edca630ee3f291bee02164c6/src/examples/learn_quant/grammar.yml

@Ashvin-Ranjan
Copy link
Contributor

I was thinking about this problem and ultimately came to a solution, albeit perhaps a messy one, at https://github.com/Ashvin-Ranjan/ultk/tree/main/src/examples/family.

The way it works is by having referents contain 4 attributes: their name, their gender, their mother, and their father. The mother and father are both also Referents (once the relations reach outside of the scope of the data they just become None). Then there are the following rules:

  • Logical
    • _and: boolean and
    • _or: boolean or
    • _not: boolean not
  • Gender
    • _female: returns true if Referent is female
    • _male: returns true if Referent is male
  • Parents
    • _m: returns Referent's mother
    • _f: returns Referent's father
  • Comparison
    • _ref_c: checks if two Referents are equal
  • Primitives
    • S: returns the Referent being mapped
    • O: returns the origin (the referent being compared to)

With this system the expression for checking if a referent is the origin's mom would be: _ref_c(_m(O), S) and sister would be _and(_and(_ref_c(_m(O), _m(S)), _ref_c(_f(O), _f(S))), _female(S)) (though these might be able to be simplified).

This is a very hacked together way of doing it but the basic tests in https://github.com/Ashvin-Ranjan/ultk/blob/main/src/examples/family/scripts/test.py show that it seems to work.

@nathimel
Copy link
Collaborator Author

Thank you for sharing! Just had a quick look, can this grammar find expressions for kinship terms via generate_unique_expressions()?

I've been working on a version too, and it kind of bothers me that we have to bend over backwards a bit to design such an intuitive grammar (as described below in KR2012) using ULTK; I suspect this means there is some designing to be improved, but after significant thought and experimentation I'm not sure where

Screenshot 2024-12-31 at 2 20 54 PM

@Ashvin-Ranjan
Copy link
Contributor

Ashvin-Ranjan commented Jan 1, 2025

I do agree things need to be changed, but I am not completely sure what. I am wondering if support for something like $\exists z\ A(x, z)$ is really needed especially with the amount of data that a Referent is able to hold. Though it would be very nice to have an easier and more elegant way to design such a grammar in ULTK, since the solution I have currently feels a bit rough at times.

I was also able to work together some code to generate expressions and they seem to match natural languages. I put a quick test for English together at https://github.com/Ashvin-Ranjan/ultk/blob/main/src/examples/family/outputs/natural_languages.yml.

I ended up expanding this to include priors from KR2012 and create a final plot, though I believe I made a mistake somewhere since English appears to be far away from optimal on the graph.

@nathimel
Copy link
Collaborator Author

nathimel commented Jan 2, 2025

I stand corrected re: changing the grammar. I was able to reimplement KR2012's grammar as described without changing ULTK at all (except for a minor type check in Rule.from_callable that was already anticipated): https://github.com/CLMBRs/ultk/blob/kinship/src/examples/kinship/grammar/full.py. I learned a lot about ULTK grammar in the process 🙂

Some of those lessons, which might be obvious, but for posterity:

  • I got confused about several independent things for this domain:
    • how to elegantly construct Referents and a grammar so that we have all the relational data/features we need. Ultimately I chose to make Referents minimal, and just import a data structure containing the relational features rather than encoding them into the Referents, but that's a minor design decision
    • how to handle binary or more generally n-ary primitive predicates. I chose to pass around curried functions. Equivalently (but maybe more slowly) we could have passed around frozensets.
      • some more details: Terminal rules must have a single Referent on the RHS. Intuitively, I can imagine n-ary predicates, including 'parent(, )' being terminal, but then we'd have to complicate the logic in GrammaticalExpression.evaluate(), which builds a mapping from Referent to something else. In the grammar I wrote above, I implemented terminal/primitive predicates as functions that that take dummy Referents and return functions, and then added some slightly complicated binding/applying logic to make sure that the Meaning of well-formed expressions is a mapping from Referent to bool. This is obviously awkward, but feels close enough to KR2012's grammar.
    • whether FOL is needed.
      • First, KR2012 aren't using sentences in FOL. Conceptually, their final expressions correspond to open formulae with exactly one free variable, effectively unary predicates, which is what allows them to denote a set of referents in the universe (rather than, e.g., a set of pairs in the universe, which you would expect of a FOL formula like 'parent(x, y)', or a truth value, which is what you would expect a FOL sentence to evaluate to, wrt a structure etc.)
      • So, conceptually, the grammar I wrote down works so that even though binary expressions can occur inside final expressions, final expressions are only parsed/generated if they are are unary predicates.
      • In any case, i'm confident that ULTK grammar can fully support the syntax of first order logic. The key would be to let Referents be models, and then a sentence of FOL is True or False in that model. That's not news to those working here but again I'll just note this for posterity.

@Ashvin-Ranjan
I don't really see any substantive problems with the grammar you've written yet. The plot looks great, and I'm glad it seems like it didn't take too long to get this up and running -- a good sign that ULTK is on its way! 🙂 This issue isn't the place to iterate on fully replicating KR2012, though I'm happy to do that elsewhere.

So I'll close this issue, as I'm convinced there isn't one!

Happy new year 🥳

@nathimel nathimel closed this as completed Jan 2, 2025
@shanest
Copy link
Collaborator

shanest commented Jan 2, 2025

Thanks, both! Really great discussion, and glad to hear y'all have found two solutions that both seem very natural. I'll ask one or both of you for a summary in conversation sometime in January, but also wanted to quickly mention two related things that came up when I was first working on GrammaticalExpression.

(1) In .__call__: for now, I only pass the evaluated children to self.func, as seen on this line. It could be natural to also let *args be passed in addition to the evaluated children, to get "more information" at intermediate nodes. I think I didn't yet have a reason to do that, so left it out.

(2) In an application that some CLMBRs are working on (to the RASP-L language for transformers), they also used a primitive id that just returned the input basically. It seems like something like that could be useful in this application? This might basically be what Ash did in his version with the S and O 🤔

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants