Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Align the grammar documentation with Python's actual grammar #127833

Open
encukou opened this issue Dec 11, 2024 · 0 comments
Open

Align the grammar documentation with Python's actual grammar #127833

encukou opened this issue Dec 11, 2024 · 0 comments
Labels
docs Documentation in the Doc dir

Comments

@encukou
Copy link
Member

encukou commented Dec 11, 2024

Documentation

The current documentation of Python syntax (the later chapters of the language reference) uses hand-maintained production lists, like this:

A)

compound_stmt ::=  if_stmt
                   | while_stmt
                   | for_stmt
                   | try_stmt
                   | with_stmt
                   | match_stmt
                   | funcdef
                   | classdef
                   | async_with_stmt
                   | async_for_stmt
                   | async_funcdef
suite         ::=  stmt_list NEWLINE | NEWLINE INDENT statement+ DEDENT
statement     ::=  stmt_list NEWLINE | compound_stmt
stmt_list     ::=  simple_stmt (";" simple_stmt)* [";"]

There is no mechanism to ensure that these are in sync with the actual grammar, and they inevitably do get out of sync.
See some of the “docs” issues mentioning “grammar”.

It's not easy to write an automatic tool to keep them in sync, because we do want to elide some details -- the parser rules, unnecessary lookaheads, cuts, etc. But, it's possible to write it, and we wrote a proof of concept, which will need to be rewritten, tuned, and reviewed. Before introducing it, I'd like to go through all the docs, correct the existing documentation, bring it closer to what a tool could generate, and discuss what the ideal presentation would look like. That needs to be a manual process, and it will also need to touch the prose that's next to the grammar snippets.

As a first step, I propose an update to the tooling, which brings the presentation a bit closer to the python.gram syntax.

From the existing ReST source, we can get this:

B)

compound_stmt: if_stmt
               | while_stmt
               | for_stmt
               | try_stmt
               | with_stmt
               | match_stmt
               | funcdef
               | classdef
               | async_with_stmt
               | async_for_stmt
               | async_funcdef
suite:         stmt_list NEWLINE | NEWLINE INDENT statement+ DEDENT
statement:     stmt_list NEWLINE | compound_stmt
stmt_list:     simple_stmt (";" simple_stmt)* [";"]

Since Sphinx hard-codes the productionlist formatting (the ::= symbol and the aligning), we'll need to override the productionlist directive to achieve this.

Then, by changing the ReST and using a different directive, we can get to something like:

C)

compound_stmt:
    | if_stmt
    | while_stmt
    | for_stmt
    | try_stmt
    | with_stmt
    | match_stmt
    | funcdef
    | classdef
    | async_with_stmt
    | async_for_stmt
    | async_funcdef
suite:
    | stmt_list NEWLINE | NEWLINE INDENT statement+ DEDENT
statement:
    | stmt_list NEWLINE | compound_stmt
stmt_list:
    | simple_stmt (";" simple_stmt)* [";"]

I propose to go from A) to B) at once (by overriding productionlist), and from B) to C) gradually, while also updating the content (including changing rule names to match the grammar, and adjusting/reorganizing nearby prose).
I think that the B) and C) styles are similar enough that mixing them in a single version of the docs should not be jarring.

By the way, one additional benefit of a custom directive is that we can add syntax highlighting. (Ideally, with support from the theme.) I think that making strings stand out makes the listings more readable:

image

Linked PRs

@encukou encukou added the docs Documentation in the Doc dir label Dec 11, 2024
encukou added a commit that referenced this issue Feb 5, 2025
…ionlist` (GH-127835)

As a first step toward aligning the grammar documentation with Python's actual
grammar, this overrides the ReST `productionlist` directive to:
- use `:` instead of the `::=` symbol
- add syntax highlighting for strings (using a Pygments highlighting class)

All links and link targets should be preserved. (Unfortunately, this reaches
into some Sphinx internals; I don't see a better way to do exactly what
Sphinx does.)

This also adds a new directive, `grammar-snippet`, which formats the snippet
almost exactly like what's in the source, modulo syntax highlighting and
keeping the backtick character to mark links to other rules.
This will allow formatting the snippets as in the grammar file
(file:///home/encukou/dev/cpython/Doc/build/html/reference/grammar.html).

The new directive is applied to two simple rules in toplevel_components.rst

---------

Co-authored-by: Blaise Pabon <[email protected]>
Co-authored-by: William Ferreira <[email protected]>
Co-authored-by: bswck <[email protected]>
Co-authored-by: Adam Turner <[email protected]>
miss-islington pushed a commit to miss-islington/cpython that referenced this issue Feb 5, 2025
…roductionlist` (pythonGH-127835)

As a first step toward aligning the grammar documentation with Python's actual
grammar, this overrides the ReST `productionlist` directive to:
- use `:` instead of the `::=` symbol
- add syntax highlighting for strings (using a Pygments highlighting class)

All links and link targets should be preserved. (Unfortunately, this reaches
into some Sphinx internals; I don't see a better way to do exactly what
Sphinx does.)

This also adds a new directive, `grammar-snippet`, which formats the snippet
almost exactly like what's in the source, modulo syntax highlighting and
keeping the backtick character to mark links to other rules.
This will allow formatting the snippets as in the grammar file
(file:///home/encukou/dev/cpython/Doc/build/html/reference/grammar.html).

The new directive is applied to two simple rules in toplevel_components.rst

---------
(cherry picked from commit 58a4357)

Co-authored-by: Petr Viktorin <[email protected]>
Co-authored-by: Blaise Pabon <[email protected]>
Co-authored-by: William Ferreira <[email protected]>
Co-authored-by: bswck <[email protected]>
Co-authored-by: Adam Turner <[email protected]>
miss-islington pushed a commit to miss-islington/cpython that referenced this issue Feb 5, 2025
…roductionlist` (pythonGH-127835)

As a first step toward aligning the grammar documentation with Python's actual
grammar, this overrides the ReST `productionlist` directive to:
- use `:` instead of the `::=` symbol
- add syntax highlighting for strings (using a Pygments highlighting class)

All links and link targets should be preserved. (Unfortunately, this reaches
into some Sphinx internals; I don't see a better way to do exactly what
Sphinx does.)

This also adds a new directive, `grammar-snippet`, which formats the snippet
almost exactly like what's in the source, modulo syntax highlighting and
keeping the backtick character to mark links to other rules.
This will allow formatting the snippets as in the grammar file
(file:///home/encukou/dev/cpython/Doc/build/html/reference/grammar.html).

The new directive is applied to two simple rules in toplevel_components.rst

---------
(cherry picked from commit 58a4357e29a15135e6fd99f320c60f8ea0472d27)

Co-authored-by: Petr Viktorin <[email protected]>
Co-authored-by: Blaise Pabon <[email protected]>
Co-authored-by: William Ferreira <[email protected]>
Co-authored-by: bswck <[email protected]>
Co-authored-by: Adam Turner <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
docs Documentation in the Doc dir
Projects
Status: Todo
Development

No branches or pull requests

1 participant