Skip to content

Commit 392fa0b

Browse files
committed
fix: docs
1 parent 34b676d commit 392fa0b

File tree

1 file changed

+54
-0
lines changed

1 file changed

+54
-0
lines changed

packages/pgen/README.md

+54
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,54 @@
1+
# Parser generator
2+
3+
Not only other parser generators for web weren't written here, but they lack a set of features we really need:
4+
5+
- **Type-safety**: API of generated parser should be typed without `any`
6+
- **AST from grammar**: converting untyped trees to AST is unsafe and boring
7+
- <sup>TBD</sup> **CST**: pretty-printer has to keep comments `/**/`, underscores in numbers `1_234` and other features that are nowhere represented in AST.
8+
- **Named lexemes**: good error messages shouldn't report an identifier as "a-z, A-Z, 0-9, or _".
9+
- <sup>TBD</sup> **Error recovery**: programming languages should report more than one error at a time.
10+
- <sup>TBD</sup> **Incremental**: reparse shouldn't take time proprtional to size of the file.
11+
- **High-order rules `A<B>`**: duplicated code leads to increased chance to make a mistake, and high-order rules are required for duplication.
12+
- <sup>TBD</sup> **No stack overflow on large expressions**: nested constructions might lead to stack overflow.
13+
- **Space skipping**: manually annotating grammar with spaces is error-prone and boring.
14+
15+
## Comparison to peggy
16+
17+
`pgen` mostly follows grammar of [peggy](https://peggyjs.org/documentation.html#grammar-syntax-and-semantics) with a few notable differences.
18+
19+
- Capitalized rules `Foo = ...` create AST nodes with `{ $: 'Foo' }`.
20+
- Rules have to end with semicolon `;`.
21+
- Inline semantic actions `{ return 42; }` are not supported. We can't infer types of AST when there is some inlined JavaScript code, because JS is untyped.
22+
- High-order rules `A<B> = ...` were added.
23+
- Space skipping was added. It uses `space` rule.
24+
- Lexification operator `#` was added.
25+
- Character classes do not support modifiers `[a-z]i`.
26+
27+
## Syntax reference
28+
29+
- Non-AST rule defintion `rule = ...;`
30+
- AST rule defintion `Rule = ...`. Returns an object with `{ $: 'Rule', loc: Loc }` with rest of the fields defined with named clauses in right-hand side.
31+
- Display override for error messaging `Id "identifier" = ...;`
32+
- High-order rule defintion `inter<A, B> = ...;` and call `inter<expression, ",">`
33+
- Left-biased choice `"A" / "B"`. Will match the first matching clause.
34+
- Sequence `foo bar baz`. All clauses should match in sequence.
35+
- Named clauses `"if" "(" expr:expression ")" stmts:statements`. Sequence operator generates an object, and named clauses become its fields `{ expr: ..., stmts: ... }`.
36+
- Picked clause `"if" "(" @expression ")"`. Sequence operator returns only a single value of picked clause.
37+
- Single clause sequence `a = b`. Works as `a = @b`.
38+
- Negative lookahead `!x`. Fails if `x` matches. Doesn't consume input.
39+
- Positive lookahead `&x`. Passes if `x` matches. Doesn't consume input.
40+
- Stringification `$x`. Ignores AST computed by x, returns string that `x` matched.
41+
- Lexification `#x`. Does not skip spaces inside of `x`. If `x` calls some other rules, doesn't skip spaces there either.
42+
- Repeat `x*`.
43+
- Repeat at least once `x+`.
44+
- Optional `x?`.
45+
- String `"abc"`.
46+
- Character class `[a-z_]`. Supports ranges `a-z`. Supports negation `[^a-z]`.
47+
48+
## Implicit syntax
49+
50+
- Spaces are skipped after every terminal: `"string"`, `[a-z]`
51+
- Spaces are skipped after lexification operator `#x`
52+
- Spaces are not skipped inside lexification operator `#x`.
53+
- Spaces are skipped at the start, before rest of the parsing will happen
54+
- If not the whole input was consumed, error will be emitted

0 commit comments

Comments
 (0)