Skip to content

proposal: go/ast: add Start token.Pos fields to BlockStmt and FieldList #73584

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
adonovan opened this issue May 2, 2025 · 13 comments
Open
Assignees
Labels
gopls/parsing Issues related to parsing / poor parser recovery. LibraryProposal Issues describing a requested change to the Go standard library or x/ libraries, but not to a tool Proposal
Milestone

Comments

@adonovan
Copy link
Member

adonovan commented May 2, 2025

Background: The go/ast package was designed with batch-oriented compiler-like applications in mind, where well-formed ASTs are overwhelmingly the common case. However, it is now the basis of many tools, including gopls, an interactive IDE backend, where ill-formed trees are the norm. We have observed that a major source of crashes in gopls is out-of-bounds accesses of various kinds causes by the parser creating syntax trees whose Pos/End range is not a subrange of its parent node. This usually happens because the End value is either (a) zero, because the final token was missing, or (b) greater than File.FileEnd, because the value was derived by adding an offset from a prior token under the assumption that the file is well formed. For example a file containing only package has an invented name token, _, whose end position is beyond EOF.

We would like to establish the invariant that every node returned by the parser has a Pos/End range that includes the ranges of its children. (For Files, pretend that Pos/End refers to FileStart/FileEnd for this discussion.) This change is mostly straightforward and compatible. However, due to unfortunate constraints imposed by doc comments, some syntax nodes must be created even when their source text is entirely missing. They are BlockStmt and FieldList. Consider an "if" token at EOF. This produces an IfStmt, and every IfStmt must (regrettably) have a BlockStmt, even though there are no braces nor statements. Consequently, the BlockStmt exists but is zero, and its Pos and End methods have nothing to go on. Its End method returns zero, and so does the End method of the parent IfStmt. Similarly, a "func" token at EOF yields a FuncDecl, and every FuncDecl must (regrettably) have a Params, even though there are no parens nor parameter fields. Again, a zero FieldList is created, and its Pos and End are zero.

This problem can be fixed by having the parser record the start position of these missing nodes. Adding a non-exported field to BlockStmt abnd FieldList would permit their Pos and End methods to do the right thing. However, a lot of client code assumes that all fields of ast.Node struct types are public, and misbehaves otherwise. Therefore, these new fields must be public.

Proposal: We propose to add two new fields, both Start token.Pos, to FieldList and BlockStmt. They would be unconditionally set to the start position of the syntax node.

package ast

type BlockStmt struct {
+       Start token.Pos // start position of the block (even when missing, as in IfStmt.Body for "if" at EOF)
        ...
}
type FieldList struct {
+       Start token.Pos // start position of the field list (even when missing, as in FuncDecl.Params for "func" at EOF)
        ...
}

Neither addition increases the size class of its struct.

Sketch implementation:

  • https://go.dev/cl/668238 - go/ast and go/parser changes. Patchset 5 shows limitations of using a nonexported field.
  • https://go.dev/cl/668677 - gopls changes, mostly to the abstraction-breaking completion logic, and workarounds for non-exported fields.
@adonovan adonovan added this to the Go1.25 milestone May 2, 2025
@adonovan adonovan moved this to Incoming in Proposals May 2, 2025
@adonovan
Copy link
Member Author

adonovan commented May 2, 2025

@adonovan adonovan changed the title proposal: go/types: add Start token.Pos fields to BlockStmt and FieldList proposal: go/ast: add Start token.Pos fields to BlockStmt and FieldList May 2, 2025
@adonovan adonovan self-assigned this May 2, 2025
@adonovan adonovan added the gopls/parsing Issues related to parsing / poor parser recovery. label May 2, 2025
@mateusz834
Copy link
Member

mateusz834 commented May 2, 2025

So for a valid AST BlockStmt.Start == BlockStmt.Lbrace, right? And same for FieldList.

However, due to unfortunate constraints imposed by doc comments, some syntax nodes must be created even when their source text is entirely missing. They are BlockStmt and FieldList. Consider an "if" token at EOF. This produces an IfStmt, and every IfStmt must (regrettably) have a BlockStmt, even though there are no braces nor statements.

Have you considered a GODEBUG for this? I mean instead of this proposed change, allow a nil BlockStmt, FieldList and guard this with a GODEBUG setting?

@mateusz834
Copy link
Member

To eleborate more about the GODEBUG approach, i always assumed that the doc comment around nil/non-nil guarantees were only corresponding to sucesfully parsed AST and that anything can be a nil when it is invalid.

@adonovan
Copy link
Member Author

adonovan commented May 2, 2025

So for a valid AST BlockStmt.Start == BlockStmt.Lbrace, right? And same for FieldList.

Correct. But we can't make the existing field do double duty without it lying about the presence of a missing token.

Have you considered a GODEBUG for this? I mean instead of this proposed change, allow a nil BlockStmt, FieldList and guard this with a GODEBUG setting?

That would be a breaking change; we don't do breaking changes. GODEBUG is for risky changes that are in principle non-breaking.

@adonovan
Copy link
Member Author

adonovan commented May 2, 2025

To eleborate more about the GODEBUG approach, i always assumed that the doc comment around nil/non-nil guarantees were only corresponding to successfully parsed AST and that anything can be a nil when it is invalid.

I had always assumed the opposite. I think all we can say with certainty is that there is inadequate documentation about what you can expect from go/ast (and go/types), both when the tree is well formed and when it is not. We plan to address that; see #70725 for example.

@mateusz834
Copy link
Member

GODEBUG is for risky changes that are in principle non-breaking.

Not really, we have made such change before: #67620 which is technically a breaking change. But this for sure had a smaller impact.

@gabyhelp gabyhelp added the LibraryProposal Issues describing a requested change to the Go standard library or x/ libraries, but not to a tool label May 2, 2025
@adonovan
Copy link
Member Author

adonovan commented May 5, 2025

A modest but cheeky counterproposal:

Proposal 2: Rather than adding a new Start token.Pos field to FieldList and BlockStmt, we reinterpret the contract of three comments in the go/ast package as applying only to well-formed syntax trees (those without parse errors). That would remove the obligation for the parser to return:

  • a non-nil *Ident for the missing package name in a file containing only package;
  • a non-nil *BlockStmt for the absent IfStmt.Body in a file ending with if, and similarly for, switch, select; or
  • a non-nil *FieldList for the absent FuncType.Params in a file ending with func.

Of course, changing a field from non-nil to maybe-nil is strictly an incompatible change. However, I suspect very few programs (other than gopls, which is integrated into the editor and therefore routinely sees broken ASTs, and the handful of analyzers that set the RunDespiteErrors flag) actually care, because they stop if there is a parse error.

Furthermore, by their nature, the go/ast and go/types packages have never been in a position to provide perpetual compatibility because the spec itself changes, so perhaps their users are more willing to consider a proposal such as this.

@gri @findleyr @dominikh

@findleyr
Copy link
Member

findleyr commented May 5, 2025

@adonovan can we also change the handling of dangling selectors to not introduce a phantom selector named '_'? That's another such inconsistency that breaks calculated positions.

We sort of broke the contract of ast.Ident.Object when we added the SkipObjectResolution parser mode.
And before that, we broke the parsing of invalid trees when we added AllErrors (https://codereview.appspot.com/7307085).

Point being, if we don't want to just change the contract, we could almost certainly do so behind (yet another) mode flag. With that said, it would be so much cleaner if we don't have to introduce yet another dimension into the tree.

@dominikh
Copy link
Member

dominikh commented May 5, 2025

We have observed that a major source of crashes in gopls is out-of-bounds accesses of various kinds causes by the parser creating syntax trees whose Pos/End range is not a subrange of its parent node.

Won't proposal 2 replace that with a new major source of crashes: nil dereferences?

@findleyr
Copy link
Member

findleyr commented May 5, 2025

@dominikh nil dereferences are trivial to fix based on a single telemetry report. Out of bound errors can be nearly impossible to reproduce, and can manifest in a variety of ways (overflowing positions, no token.Files, or the wrong token.File).

So, yes you're right, but we can probably find all of them statically, and will quickly fix any that we miss. This is preferable to having to develop against a tree that can overflow its source.

@adonovan
Copy link
Member Author

adonovan commented May 5, 2025

can we also change the handling of dangling selectors to not introduce a phantom selector named '_'? That's another such inconsistency that breaks calculated positions.

Yes, that we will do in any case, but it's just a bug fix and doesn't require a proposal.

Won't proposal 2 replace that with a new major source of crashes: nil dereferences?

I'm confident we could audit the tools we maintain and add nil checks in the relevant places pretty easily. It's just a global search for three fields. The real question is whether you and others who use go/ast after a parse error feel the same.

@dominikh
Copy link
Member

dominikh commented May 5, 2025

None of my tools work on code with errors and the change wouldn't affect me. I do think that

Furthermore, by their nature, the go/ast and go/types packages have never been in a position to provide perpetual compatibility because the spec itself changes, so perhaps their users are more willing to consider a proposal such as this.

is a convincing argument. I also wonder how stable go/parser's results for incomplete/invalid ASTs have been in the first place. Surely changes to error recovery, for example, could lead to changes that upset existing tools, too.

Could we add a GODEBUG to disable the new behavior, to give tools time to update? Similar to the recent changes concerning type aliases?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
gopls/parsing Issues related to parsing / poor parser recovery. LibraryProposal Issues describing a requested change to the Go standard library or x/ libraries, but not to a tool Proposal
Projects
Status: Incoming
Development

No branches or pull requests

6 participants