-
Notifications
You must be signed in to change notification settings - Fork 18k
proposal: go/ast: add Start token.Pos fields to BlockStmt and FieldList #73584
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
So for a valid AST
Have you considered a |
To eleborate more about the GODEBUG approach, i always assumed that the doc comment around nil/non-nil guarantees were only corresponding to sucesfully parsed AST and that anything can be a nil when it is invalid. |
Correct. But we can't make the existing field do double duty without it lying about the presence of a missing token.
That would be a breaking change; we don't do breaking changes. GODEBUG is for risky changes that are in principle non-breaking. |
I had always assumed the opposite. I think all we can say with certainty is that there is inadequate documentation about what you can expect from go/ast (and go/types), both when the tree is well formed and when it is not. We plan to address that; see #70725 for example. |
Not really, we have made such change before: #67620 which is technically a breaking change. But this for sure had a smaller impact. |
A modest but cheeky counterproposal: Proposal 2: Rather than adding a new
Of course, changing a field from non-nil to maybe-nil is strictly an incompatible change. However, I suspect very few programs (other than gopls, which is integrated into the editor and therefore routinely sees broken ASTs, and the handful of analyzers that set the RunDespiteErrors flag) actually care, because they stop if there is a parse error. Furthermore, by their nature, the go/ast and go/types packages have never been in a position to provide perpetual compatibility because the spec itself changes, so perhaps their users are more willing to consider a proposal such as this. |
@adonovan can we also change the handling of dangling selectors to not introduce a phantom selector named '_'? That's another such inconsistency that breaks calculated positions. We sort of broke the contract of ast.Ident.Object when we added the Point being, if we don't want to just change the contract, we could almost certainly do so behind (yet another) mode flag. With that said, it would be so much cleaner if we don't have to introduce yet another dimension into the tree. |
Won't proposal 2 replace that with a new major source of crashes: nil dereferences? |
@dominikh nil dereferences are trivial to fix based on a single telemetry report. Out of bound errors can be nearly impossible to reproduce, and can manifest in a variety of ways (overflowing positions, no token.Files, or the wrong token.File). So, yes you're right, but we can probably find all of them statically, and will quickly fix any that we miss. This is preferable to having to develop against a tree that can overflow its source. |
Yes, that we will do in any case, but it's just a bug fix and doesn't require a proposal.
I'm confident we could audit the tools we maintain and add nil checks in the relevant places pretty easily. It's just a global search for three fields. The real question is whether you and others who use go/ast after a parse error feel the same. |
None of my tools work on code with errors and the change wouldn't affect me. I do think that
is a convincing argument. I also wonder how stable go/parser's results for incomplete/invalid ASTs have been in the first place. Surely changes to error recovery, for example, could lead to changes that upset existing tools, too. Could we add a GODEBUG to disable the new behavior, to give tools time to update? Similar to the recent changes concerning type aliases? |
Background: The go/ast package was designed with batch-oriented compiler-like applications in mind, where well-formed ASTs are overwhelmingly the common case. However, it is now the basis of many tools, including gopls, an interactive IDE backend, where ill-formed trees are the norm. We have observed that a major source of crashes in gopls is out-of-bounds accesses of various kinds causes by the parser creating syntax trees whose Pos/End range is not a subrange of its parent node. This usually happens because the End value is either (a) zero, because the final token was missing, or (b) greater than File.FileEnd, because the value was derived by adding an offset from a prior token under the assumption that the file is well formed. For example a file containing only
package
has an invented name token,_
, whose end position is beyond EOF.We would like to establish the invariant that every node returned by the parser has a Pos/End range that includes the ranges of its children. (For Files, pretend that Pos/End refers to FileStart/FileEnd for this discussion.) This change is mostly straightforward and compatible. However, due to unfortunate constraints imposed by doc comments, some syntax nodes must be created even when their source text is entirely missing. They are BlockStmt and FieldList. Consider an "if" token at EOF. This produces an IfStmt, and every IfStmt must (regrettably) have a BlockStmt, even though there are no braces nor statements. Consequently, the BlockStmt exists but is zero, and its Pos and End methods have nothing to go on. Its End method returns zero, and so does the End method of the parent IfStmt. Similarly, a "func" token at EOF yields a FuncDecl, and every FuncDecl must (regrettably) have a Params, even though there are no parens nor parameter fields. Again, a zero FieldList is created, and its Pos and End are zero.
This problem can be fixed by having the parser record the start position of these missing nodes. Adding a non-exported field to BlockStmt abnd FieldList would permit their Pos and End methods to do the right thing. However, a lot of client code assumes that all fields of ast.Node struct types are public, and misbehaves otherwise. Therefore, these new fields must be public.
Proposal: We propose to add two new fields, both
Start token.Pos
, to FieldList and BlockStmt. They would be unconditionally set to the start position of the syntax node.Neither addition increases the size class of its struct.
Sketch implementation:
The text was updated successfully, but these errors were encountered: