Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace ANTLR with hand-rolled parser #917

Open
wants to merge 51 commits into
base: main
Choose a base branch
from

Conversation

stackoverflow
Copy link
Contributor

@stackoverflow stackoverflow commented Jan 29, 2025

Fixes #906
Fixes #888
Fixes #927
Fixes #931
Fixes #932

Incompatible changes:

  • Object entries in the same line now require a semicolon (properties and elements are unaffected):
// not valid anymore
foo { ["1"] = 1 ["2"] = 2 }

// requires ;
foo { ["1"] = 1; ["2"] = 2 }

Reason: not requiring semicolons or newlines requires too much backtracking in the parser, degrading performance. Also both our IntelliJ plugin and the LSP don't allow this syntax and show an error.

Our current ANTLR parser is quite slow, and, depending on the codebase, can be the slowest part of running Pkl.

Some results from parsing all snippet tests in pkl-core showing ~100x performance improvement:
ANTLR elapsed: 7122ms (7.1s)
New parser elapsed: 78.73ms

This is still a draft, many tests are still failing and repl parsing is still using the old parser.


@Override
public ExpressionNode visitAmendsExpr(Amends expr) {
// parentExpr is always New, Amends or Parenthesized. The parser makes sure of it
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this comment, but it would be useful to link through to the method where this behaviour is expected in the parser.

@sgammon

This comment was marked as resolved.

Copy link
Contributor

@bioball bioball left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did a first pass! Still have much more to review.

children.addAll(pars);
children.addAll(members);
return children;
}
Copy link
Contributor

@bioball bioball Feb 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we make this allocation free?

One option is to just make children a field on the class.

Alternatively, maybe this should instead be visitChildren. In that case, this becomes:

<T> @Nullable T visitChildren(ParserVisitor<T> visitor) {
  for (var parameter : parameters) {
    parameter.accept(visitor);
  }
  for (var member : members) {
    member.accept(visitor);
  }
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I kept the children() method there but refactored the visitor not call it anymore. So no allocations are being made in our visitors.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay; I think that's fine. But maybe let's add a comment to Node#children() that it allocates memory and to prefer using a visitor.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just refactored the ast to store everything in the children's list. Calling children() or using the visitor doesn't allocate anything.

pkl-core/src/main/java/org/pkl/core/parser/Comment.java Outdated Show resolved Hide resolved
pkl-core/src/main/java/org/pkl/core/parser/cst/Ident.java Outdated Show resolved Hide resolved
pkl-core/src/main/java/org/pkl/core/parser/cst/Expr.java Outdated Show resolved Hide resolved
pkl-core/src/main/java/org/pkl/core/parser/cst/Expr.java Outdated Show resolved Hide resolved
pkl-core/src/main/java/org/pkl/core/parser/cst/Expr.java Outdated Show resolved Hide resolved
pkl-core/src/main/java/org/pkl/core/parser/cst/Expr.java Outdated Show resolved Hide resolved
pkl-core/src/main/java/org/pkl/core/parser/Lexer.java Outdated Show resolved Hide resolved
@stackoverflow stackoverflow marked this pull request as ready for review February 6, 2025 09:39
Copy link
Contributor

@bioball bioball left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did another pass!

pkl-core/src/main/java/org/pkl/core/parser/cst/Expr.java Outdated Show resolved Hide resolved
pkl-core/src/main/java/org/pkl/core/parser/cst/Expr.java Outdated Show resolved Hide resolved
Comment on lines 487 to 493
public Span getStartDelimiterSpan() {
return startDelimiterSpan;
}

public Span getEndDelimiterSpan() {
return endDelimiterSpan;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are ununsed. Remove?

Alternatively, maybe add @SuppressWarnings("unused")

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The warning is gone for me after the refactor.


public Span getStartDelimiterSpan() {
return startDelimiterSpan;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is unused. Remove?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above.

import org.pkl.core.util.Nullable;

public final class ObjectBody implements Node {
private final List<Parameter> parameters;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should be consistent and either get rid of org.pkl.core.parser.cst.ParameterList, or make this also a ParameterList.

If we choose to get rid of ParameterList, we should also get rid of TypeParameterList, and ArgumentList.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The *List classes keep around the span of their delimiters which we use for error reporting.
This cannot be a ParameterList because this is not surrounded by parentheses.

Unexpected character `{0}`. Did you mean {1}?

invalidCharacter=\
Invalid identifier `{0}`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing snippet test

block comments also
*/
/// doc continuation
foo = 1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're parsing this incorrectly; change this test to:

import "pkl:reflect"

/// doc comment
// line comments can appear between doc comments
/*
 block comments also
*/
/// doc continuation
foo = 1

theComment = reflect.Class(module.getClass()).properties["foo"].docComment

Output:

foo = 1
theComment = """
  doc comment
  // line comments can appear between doc comments
  /*
   block comments also
  */
  doc continuation
  """

@@ -0,0 +1,6 @@
–– Pkl Error ––
Invalid token at position. Expected class, typealias, method or property.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: it would be good to customize error messages for doc comments.

Something like:

Dangling documentation comment.

x | /// doc comment continuation
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Documentation comments must be attached to modules, classes, typealiases, methods, or properties.

import org.pkl.core.parser.antlr.PklParser

@Execution(ExecutionMode.CONCURRENT)
interface ParserComparisonTestInterface {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why create an interface? These methods can just go directly into ParserComparisonTest.

interface ParserComparisonTestInterface {
@Test
@Execution(ExecutionMode.CONCURRENT)
fun compareSnippetTests() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: this would be a good candidate for @ParameterizedTest.

But, this is temporary code anyways, so, it's not a big deal.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants