Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

line numbers in the AST re #732 #750

Draft
wants to merge 7 commits into
base: master
Choose a base branch
from

Conversation

ffrank
Copy link
Contributor

@ffrank ffrank commented Mar 23, 2024

I wrote a suggestion of how to store the mcl code coordinates of notable AST nodes in the parser. (It's not done for each last node.)

The approach is possibly flawed; the amount of copy/paste makes me sad.

This was still the easier part though. Next I was trying to figure out how to generate an error message during type unification, that indicates the == operator in the following snippet:

$a = 1
$b = "one"

if $a == $b {
        $c = 3
}

But the unification approach that collects all invariants, and then loops over them, makes this very hard, maybe impossible.

Will we need to keep references to the AST nodes in the invariants? So that we can trace where problematic invariants originate?

@purpleidea
Copy link
Owner

the amount of copy/paste makes me sad.

I expect some copy-paste, that's to be expected.

during type unification

Don't worry about this at the moment. I have some structural changes which are coming and I don't know if this will break anything that happens there yet, so TBD.

It's not done for each last node.)

Just a POC is fine!

@purpleidea
Copy link
Owner

PS: Note there is also the %error stuff in parser-- that works well enough. Line numbers in the code is the goal =D

Copy link
Owner

@purpleidea purpleidea left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just some random quick thoughts.

lang/parser/parser.y Outdated Show resolved Hide resolved
lang/parser/parser.y Show resolved Hide resolved
lang/ast/structs.go Outdated Show resolved Hide resolved
@ffrank
Copy link
Contributor Author

ffrank commented Mar 24, 2024

In the latest commit 913c100 the code essentially works. It will store start and end positions in nodes types that have received the TextArea embed.

This does not survive interpolation though, unless Interpolate is adapted as demonstrated for StmtBind and StmtIf here.

I even have an idea for a type unifier that could allow us to drill down to the offensive expressions.

EDIT: after some shy attempts to implement a prototype, I want to say this idea did not pan out. Refactored unification sounds much more promising 👇🏻

@purpleidea
Copy link
Owner

FYI: I'm currently refactoring the type unification stuff, so after that's done it will be great to revisit this and get it merged in some form. Sneak peak: The new unification has a pointer to the Expr in question so as long as the Expr has the ROW/COL information, then we'll be able to show that without doing any additional plumbing.

@ffrank
Copy link
Contributor Author

ffrank commented Apr 13, 2024

I'm not moving this in any direction until that change lands then.

@purpleidea
Copy link
Owner

Yeah, sorry for the delay. The start of that is here: https://github.com/purpleidea/mgmt/tree/feat/unification in case you're knowledgeable in this area and want to help. I'm kind of a terrible algorithmist and a few things are up in the air, but if you or anyone is an expert in this area, please chime in =D

@purpleidea
Copy link
Owner

Boy has time flown by! I guess you know all the things I was busy with. In any case, unification has merged, so if you'd like to rebase this, that would be awesome!

It might also be very easy for you to plug things into unification error messages too (or I can help with that)

LMK thanks!

@ffrank
Copy link
Contributor Author

ffrank commented Sep 8, 2024

Hey, nice work on the resolver, feels good so far.

I can build this branch and ./mgmt run lang error.mcl where error.mcl has

# this shall fail
$value = 1 + "a"
test "t1" { anotherstr => $value, }

The error message will be this:

cli parse error: could not unify types: unify error with: topLevel(func() { <built-in:_operator> }): type error: int != str

The trouble is that the error is reported at the definition of the + operator rather than the invocation.

Note that I did change the String function for FuncCalls to give the location. But it is not used at the moment. I have a commit in here that makes mgmt dump a string representation of the AST and it includes

call:_operator(str("+"), int(1), str("a")) @ (1 1)

(Never mind that this position is also wrong, but right now it seems more important that the resolver can report it at all.)

@ffrank
Copy link
Contributor Author

ffrank commented Sep 8, 2024

(Wrong text position now also fixed)

@purpleidea
Copy link
Owner

resolver

What's a resolver?

The trouble is that the error is reported at the definition of the + operator rather than the invocation.

All type unification errors that are found during unification are reported at the same place... We can likely improve errors a bit though.

You asked me to comment and look at this here, I need a bit more info what you're asking, sorry!

@ffrank
Copy link
Contributor Author

ffrank commented Sep 10, 2024

Okay trying to clear it up. I meant solver of course, not resolver.

The current issue is that we get this error:

topLevel(func() { <built-in:_operator> }): type error: int != str

That is correct, but more helpful to the user will be the information that the call of this operator is 1 + "a" in line 2 column 7.

In essence, I see a need to show a "call stack". Of course, this is at type checking time, so nothing is called (is it?)

Maybe what's rather needed is a few levels of AST parents. Rather than just saying "this unnamed internal operator was given bad operands", we should include that "its parent is this expression 1 + "a" in this bind statement $value = ...". (The immediate parent will be enough in this example, but not sure if more complex code will bring a need for more.)

@purpleidea
Copy link
Owner

Okay trying to clear it up. I meant solver of course, not resolver.

Duh now! I don't know why my mind missed that, lol

That is correct, but more helpful to the user will be the information that the call of this operator is 1 + "a" in line 2 column 7.

Yeah I agree, we probably want to hide the "ExprTopLevel" and "ExprSingleton" expressions. To do so, you'd want to wrap this x.Expr call:

return nil, errwrap.Wrapf(err, "unify error with: %s", x.Expr)

With trueCallee from here:

// trueCallee is a helper function because ExprTopLevel and ExprSingleton are

I expect this will cause an import cycle, so I'd have to ponder what the best move is... I'd like to avoid needing to add a new method on Expr to get the "TrueCallee" for everyone.

Does that help?

@ffrank
Copy link
Contributor Author

ffrank commented Sep 10, 2024

Yes the cycle prevented doing it from the solver. Spot on.

I pushed a silly little hack to do this recursion from the GAPI scope, but it does not work. This is the output now:

cli parse error: could not unify types [[ in expression: func() { <built-in:_operator> } ]]: unify error with: topLevel(func() { <built-in:_operator> }): type error: int != str

So the result from trueCallee still does not show information.

This makes sense to me. As said, it's true that it's the + operator that ultimately is affected. But we don't care. We are interested in its AST parent(s).

@purpleidea
Copy link
Owner

You want to unwrap the x.Expr part...

If you see this as your error:

topLevel(func() { built-in:_operator }): type error: int != str

Then TrueCallee(_) would return:

func() { built-in:_operator }: type error: int != str

If that's not what you want, I am lost. But I don't see how this error is not correct. What do you expect it to print?

@ffrank
Copy link
Contributor Author

ffrank commented Sep 11, 2024

I may be barking up the wrong tree, going on about the AST.

But the issue (in terms of AST nodes) is that this type error is reported on the ExprFunc (i.e. the declaration) of the + operator, which will never be helfpul. I need it reported on the ExprCall representing (in this case) 1 + "a".

Remember, ultimately we want to report the line and column number of the code that caused the unification issue.

@purpleidea
Copy link
Owner

the issue (in terms of AST nodes) is that this type error is reported on the ExprFunc (i.e. the declaration) of the + operator, which will never be helfpul. I need it reported on the ExprCall representing

I agree. I don't know the correct way to achieve this at the moment, but I will dig into it if I can.

Remember, ultimately we want to report the line and column number of the code that caused the unification issue.

If it reports the definition site instead of the call site for now, that's okay =D

@ffrank
Copy link
Contributor Author

ffrank commented Sep 12, 2024

If it reports the definition site instead of the call site for now, that's okay =D

I don't believe I agree, at least until I see a typing issue that is not reported on this kind of builtin element. Why have line numbers when we cannot report errors that use them?

I had the idea to include the pointer to the entire AST node in the error, rather than just the Expr within. Then, we could run a graph search through the AST, so that we can basically "zoom in" on the problematic expression(s).

The problem is that I cannot very easily run a search through the AST. That's why I created #773 in order to have some discussion around that. (It might be good to talk about this in person a little.)

@ffrank
Copy link
Contributor Author

ffrank commented Sep 22, 2024

ping @purpleidea

@purpleidea
Copy link
Owner

Just had a look. TBQH, I'm not the expert here, so I don't know how to proceed, sorry.

@purpleidea
Copy link
Owner

Maybe if we did a pair programming we might be able to come up with some ideas? LMK

@ffrank
Copy link
Contributor Author

ffrank commented Oct 21, 2024

I'm traveling atm but maybe we can make it happen in November.

@purpleidea
Copy link
Owner

Sure LMK when ready.

@ffrank
Copy link
Contributor Author

ffrank commented Dec 26, 2024

First complete PoC implementation.

$ ./mgmt run lang examples/lang-errors/simple_types.mcl 
This is: mgmt, version: 0.0.26-206-g1b0b92f2
Copyright (C) 2013-2024+ James Shubin and the project contributors
Written by James Shubin <[email protected]> and the project contributors
14:53:14 main: start: 1735221194038676643
14:53:14 cli: lang: lexing/parsing...
14:53:14 cli: lang: import: /home/felix/mgmt/examples/lang-errors/
14:53:14 cli: lang: running type unification...
14:53:14 cli: lang: possible type issue found at line 4 column 3
14:53:14 main: goodbye!
cli parse error: could not unify types: unify error with: topLevel(func() { <built-in:_operator> }): type error: str != int

(note the final log message before the goodbye)

This breaks some very brittle unit tests. I did not touch those yet, it might be worth refactoring them rather than doing the trivial fix.

One big missing piece is that we cannot easily determine file names (i.e., from which file was a given parser token read), so in a complex code base with multiple mcl files, the message will still be quite ambiguous.

Copy link
Owner

@purpleidea purpleidea left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @ffrank HNY!

This looks really promising. Have a look at some of my comments if you can, I had a few questions, and a likely answer for the filename bit.

But I think I could merge this fairly soon...

I do have a few nit's but I can fix those up unless you have any objections to that.

Cheers!

@@ -221,7 +221,7 @@ stmt:
// `func name(<arg>, <arg>) { <expr> }`
| FUNC_IDENTIFIER IDENTIFIER OPEN_PAREN args CLOSE_PAREN OPEN_CURLY expr CLOSE_CURLY
{
posLast(yylex, yyDollar) // our pos
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

qq: Why are some of the changes for posLast => locate done in the same space, and others done after the $$... definition bit? Does it matter? (See other comment for an example.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No the inconsistencies were just a result of early prototyping. I cleaned this up now, the new lines should always just replace the old ones (unless I am forgetting something, but don't believe so).

$$.stmt = $1.stmt
locate(yylex, $1, yyDollar[len(yyDollar)-1], $$.stmt)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's an example of one that moves to after the $$. = bit...

}
| IF expr OPEN_CURLY prog CLOSE_CURLY ELSE OPEN_CURLY prog CLOSE_CURLY
{
posLast(yylex, yyDollar) // our pos
locate(yylex, $1, yyDollar[len(yyDollar)-1], $$.stmt)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's an example where it stays in the same place.

@@ -393,3 +393,23 @@ func lambdaScopeFeedback(scope *interfaces.Scope, logf func(format string, v ...
logf("$%s(...)", name)
}
}

func AreaParentOf(needle, haystack interfaces.Node) LocalNode {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some explainer comment doc would be very helpful here please.

// Expect is one of the two types to unify.
Expect *types.Type

// Actual is one of the two types to unify.
Actual *types.Type

// An error string to pass along with this
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cute idea to make this implement the error interface... But why do we want to do this? Do we want to change the signature somewhere to just include a Node or error context struct ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes this one is a really silly hack to allow me to pass the node back alongside the error. Better to add a new struct that will hold the error message and a reference to the node, and return that, yes?

cause := unifyErr.(*interfaces.UnificationInvariant).Node
parent := ast.AreaParentOf(cause, iast)
line, col := parent.GetPosition()
logf("possible type issue found at line %d column %d", line, col)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On each Node, we can get the filename from obj.data.Metadata.Main IIRC. And the path of that from obj.data.Base again IIRC.

If that doesn't work, I can surely fix it though.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I pushed a new commit using these hints. Still not able to pinpoint the original file. It can do this for now:

$ ./mgmt run lang examples/lang-errors/including.mcl 
...
01:08:54 cli: lang: possible type issue found at line 4 column 30
01:08:54 cli: lang: it's in one of these files: [/home/felix/mgmt/examples/lang-errors/includable.mcl /home/felix/mgmt/examples/lang-errors/including.mcl]
01:08:54 main: goodbye!
cli parse error: could not unify types: unify error with: topLevel(func() { <built-in:_operator> }): type error: int != str

@ffrank
Copy link
Contributor Author

ffrank commented Jan 5, 2025

I will clean it up in the next few days. This was mainly a proof of concept implementation. Since you agree with the approach, I will rework it into passable code.

Thanks for the hints and pointers.

@ffrank
Copy link
Contributor Author

ffrank commented Jan 19, 2025

Do you have any feedback on the issue of linking AST nodes to source file names?

The data structures that you pointed me to appear to be too unspecific. This is the best I can produce right now:

01:08:54 cli: lang: possible type issue found at line 4 column 30
01:08:54 cli: lang: it's in one of these files: [/home/felix/mgmt/examples/lang-errors/includable.mcl /home/felix/mgmt/examples/lang-errors/including.mcl]
01:08:54 main: goodbye!

I suspect we need to do more work in order to make this information available.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants