Not only other parser generators for web weren't written here, but they lack a set of features we really need:
- Type-safety: API of generated parser should be typed without
any - AST from grammar: converting untyped trees to AST is unsafe and boring
- CST: pretty-printer has to keep comments
/**/, underscores in numbers1_234and other features that are nowhere represented in AST. - Named lexemes: good error messages shouldn't report an identifier as "a-z, A-Z, 0-9, or _".
- TBD Error recovery: programming languages should report more than one error at a time.
- TBD Incremental: reparse shouldn't take time proprtional to size of the file.
- High-order rules
A<B>: duplicated code leads to increased chance to make a mistake, and high-order rules are required for duplication. - TBD No stack overflow on large expressions: nested constructions might lead to stack overflow.
- Space skipping: manually annotating grammar with spaces is error-prone and boring.
pgen mostly follows grammar of peggy with a few notable differences.
- Capitalized rules
Foo = ...create AST nodes with{ $: 'Foo' }. - Rules have to end with semicolon
;. - Inline semantic actions
{ return 42; }are not supported. We can't infer types of AST when there is some inlined JavaScript code, because JS is untyped. - High-order rules
A<B> = ...were added. - Space skipping was added. It uses
spacerule. - Lexification operator
#was added. - Character classes do not support modifiers
[a-z]i.
- Non-AST rule defintion
rule = ...; - AST rule defintion
Rule = .... Returns an object with{ $: 'Rule', loc: Loc }with rest of the fields defined with named clauses in right-hand side. - Display override for error messaging
Id "identifier" = ...; - High-order rule defintion
inter<A, B> = ...;and callinter<expression, ","> - Left-biased choice
"A" / "B". Will match the first matching clause. - Sequence
foo bar baz. All clauses should match in sequence. - Named clauses
"if" "(" expr:expression ")" stmts:statements. Sequence operator generates an object, and named clauses become its fields{ expr: ..., stmts: ... }. - Picked clause
"if" "(" @expression ")". Sequence operator returns only a single value of picked clause. - Single clause sequence
a = b. Works asa = @b. - Negative lookahead
!x. Fails ifxmatches. Doesn't consume input. - Positive lookahead
&x. Passes ifxmatches. Doesn't consume input. - Stringification
$x. Ignores AST computed by x, returns string thatxmatched. - Lexification
#x. Does not skip spaces inside ofx. Ifxcalls some other rules, doesn't skip spaces there either. - Repeat
x*. - Repeat at least once
x+. - Optional
x?. - String
"abc". - Character class
[a-z_]. Supports rangesa-z. Supports negation[^a-z].
- Spaces are skipped after every terminal:
"string",[a-z] - Spaces are skipped after lexification operator
#x - Spaces are not skipped inside lexification operator
#x. - Spaces are skipped at the start, before rest of the parsing will happen
- If not the whole input was consumed, error will be emitted
yarn build
To generate AST parser:
./bin/pgen grammar.gg grammar.ts
To generate CST parser:
./bin/pgen grammar.gg grammar.ts --cst