README.md

Parser generator

Not only other parser generators for web weren't written here, but they lack a set of features we really need:

Type-safety: API of generated parser should be typed without any
AST from grammar: converting untyped trees to AST is unsafe and boring
^TBD CST: pretty-printer has to keep comments /**/, underscores in numbers 1_234 and other features that are nowhere represented in AST.
Named lexemes: good error messages shouldn't report an identifier as "a-z, A-Z, 0-9, or _".
^TBD Error recovery: programming languages should report more than one error at a time.
^TBD Incremental: reparse shouldn't take time proprtional to size of the file.
High-order rules A<B>: duplicated code leads to increased chance to make a mistake, and high-order rules are required for duplication.
^TBD No stack overflow on large expressions: nested constructions might lead to stack overflow.
Space skipping: manually annotating grammar with spaces is error-prone and boring.

pgen mostly follows grammar of peggy with a few notable differences.

Capitalized rules Foo = ... create AST nodes with { $: 'Foo' }.
Rules have to end with semicolon ;.
Inline semantic actions { return 42; } are not supported. We can't infer types of AST when there is some inlined JavaScript code, because JS is untyped.
High-order rules A<B> = ... were added.
Space skipping was added. It uses space rule.
Lexification operator # was added.
Character classes do not support modifiers [a-z]i.

Non-AST rule defintion rule = ...;
AST rule defintion Rule = .... Returns an object with { $: 'Rule', loc: Loc } with rest of the fields defined with named clauses in right-hand side.
Display override for error messaging Id "identifier" = ...;
High-order rule defintion inter<A, B> = ...; and call inter<expression, ",">
Left-biased choice "A" / "B". Will match the first matching clause.
Sequence foo bar baz. All clauses should match in sequence.
Named clauses "if" "(" expr:expression ")" stmts:statements. Sequence operator generates an object, and named clauses become its fields { expr: ..., stmts: ... }.
Picked clause "if" "(" @expression ")". Sequence operator returns only a single value of picked clause.
Single clause sequence a = b. Works as a = @b.
Negative lookahead !x. Fails if x matches. Doesn't consume input.
Positive lookahead &x. Passes if x matches. Doesn't consume input.
Stringification $x. Ignores AST computed by x, returns string that x matched.
Lexification #x. Does not skip spaces inside of x. If x calls some other rules, doesn't skip spaces there either.
Repeat x*.
Repeat at least once x+.
Optional x?.
String "abc".
Character class [a-z_]. Supports ranges a-z. Supports negation [^a-z].