Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add comments into the AST #14352

Draft
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

doorgan
Copy link
Contributor

@doorgan doorgan commented Mar 21, 2025

This implements AST comments and updates the formatter to look for comments in the AST instead of a separate comments list.

This adds a new :include_comments to Code.string_to_quoted/2 and Code.string_to_quoted!/2.

Comments are attaches to AST nodes in three variants:

  • :leading_comments: Comments that are located before a node, or in the same line.

    Examples:

    # This is a leading comment
    foo # This one too
    
  • :trailing_comments: Comments that are located after a node, and before the end of the parent enclosing the node(or the root document).

    Examples:

    foo
    # This is a trailing comment
    # This one too
    
  • :inner_comments: Comments that are located inside an empty node.

    Examples:

    foo do
      # This is an inner comment
    end
    
    [
      # This is an inner comment
    ]
    
    %{
      # This is an inner comment
    }
    

A comment may be considered inner or trailing depending on wether the enclosing node is empty or not. For example, in the following code:

  foo do
    # This is an inner comment
  end

The comment is considered inner because the do block is empty. However, in the following code:

  foo do
    bar
    # This is a trailing comment
  end

The comment is considered trailing to bar because the do block is not empty.

In the case no nodes are present in the AST but there are comments, they are inserted into the root :__block__ node as :inner_comments.

Many of the ideas here come from Sourceror, which takes inspiration for the recast parser.

Different from Sourceror, it introduces :inner_comments to remove the ambiguity between comments before the end keyword, and comments after an expression. This also becomes important when working on the formatter code, as it is very awkward to inject inner comments after the call args were already converted to algebra docs.

The current implementation in this PR adds an extra traversal step after the code is initially parsed. Due to this, and due to the logic to merge comments and AST requiring precise line information in every AST nodes, it requires to also set up the following options:

[
  preserve_comments: &preserve_comments/5,
  literal_encoder: &{:ok, {:__block__, &2, [&1]}},
  token_metadata: true,
  unescape: false,
  columns: true,
]

Where &preserve_comments/5 is the same function used to extract comments for formatting.

I first explored an alternative where we would convert comments into tokens in the tokenizer and then merge them in the parsing step to avoid an extra traversal and setting up extra options, but I couldn't figure out a maintainable way to do it. It seemed to require changing the definition of most expressions in the parser .yrl file, or similarly complex logic in a related file to the one in the Code.Comments module I introduce here.

I'm open to suggestions and guidance if we want to move this to the parsing step instead.

In the current implementation, we perform an extra traversal to merge comments into the AST, which

TODO

  • Fix remaining formatting issues
  • Cleanup and simplify comments merging code
  • Run the formatter on the Elixir codebase itself to ensure nothing changes
  • Add public documentation

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

1 participant