-
-
Notifications
You must be signed in to change notification settings - Fork 61
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rewrite the grammar #333
Rewrite the grammar #333
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for this fix/improvement/contribution. This is going to be awesome.
I find this PR rather difficult to review. I think the first commit could also contain the test output changes. That would make it easier to see if anything has been broken.
I think there are a couple of bugs that are (re)introduced with this refactoring.
- some of the contextual keyword don’t seem to work any longer, for example parameter
p
invoid m(scoped p) { }
doesn’t have a type. Probablyscoped
became a modifier. var o = (Int32)(1);
became an invocation with a parenthesised identifier, instead of a cast. Cast incorrectly parsed as invocation #175var greetings = new (string head, string tail)[3];
is not an array creation expression in the equals clause. Array of named tuples incorrectly parsed as element_access_expression instead of array_creation_expression #166
There are a lot of changes that I don't understand. Can you explain what's the benefit of changing these:
- some field names have been removed or changed.
null_literal
is gone, sothrow null
is now athrow_expression
without any child.equals_value_clause
has been removed from the grammar,type_parameter_constraints_clause
has been reworked, previously we had a list of clauses with target and constraint, now we have a flat list of constraints, in the form ofidentifier, type_parameter_constraint+, identifier, type_parameter_constraint+
. Isn't this more difficult to handle in for consumers of the output tree?with_initializer_expression
is gone, which meanswith_expression
contains a flat list of children. This way the tree seems to be more difficult to process.
Wow, those two dynamic precedence issues caught me off guard. I noticed some places where prec.dynamic was used unnecessarily, and I thought those two were as well. That was really interesting and after a little digging, it makes sense. This is probably one of few grammars that is very sensitive to dynamic precedence changes. Thanks for the context, I've added them back Others:
|
a2418d5
to
ec9cd66
Compare
While I appreciate you've put a lot of effort in I'm also disappointed such a massive PR has landed with zero up-front discussion. The extra plumbing around packaging and bringing in-line with other grammars is much appreciated but some of these changes would have been a lot easier to review with a bit of thought up-front - e.g. not re-ordering grammar.js where unnecessary and keeping the corpus files in their old location so we can diff (rename/move breaks after a certain amount of change). Are there regressions with existing parsing? I can see a large list of exclusions on the CI but it's unclear if these are new. |
Well I wasn't aware I had to discuss making large improvements beforehand, nor have I ever done so in any other upstream grammar I maintain. The moving of tests to a different dir is necessary, top-level corpus dirs are unsupported upstream now and must be in The exclusion list is either a parse error where it genuinely was an error (beforehand as well) or it has funky usage of preproc ifs, e.g. in the middle of an if/else statement. I have (imo) improved them such that their contents are children of the Tamás politely and graciously pointed out a couple of mistakes regarding some funky dynamic precedence, which I was a little taken aback by as I explained earlier, but those are now fixed. |
2d1c659
to
74ae67c
Compare
@damieng @tamasvajk the diff for tests should be much more readable now |
f290108
to
eb18043
Compare
test/corpus/classes.txt
Outdated
@@ -83,119 +89,114 @@ file class A {} | |||
(class_declaration | |||
(modifier) | |||
name: (identifier) | |||
body: (declaration_list)) | |||
(declaration_list)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO the body
field was fine, but I don't know if there was some drawback to it. Curious if you were seeing significant code size increases due to that field.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a few of these removed in places that @tamasvajk added (not just body but all sorts of fields) - are they used by GH Semantic?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the fields are mostly used by GH CodeQL. @hvitved knows best how much we need them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Having field names in general certainly makes the life easier for Treesitter based QL extractors. Field names are used to generate methods in CodeQL, e.g. this field in the Ruby grammar gives rise to this QL predicate, which is nicer to use than something like getChild(0)
.
Hope you're able to reduce the state count. In general, a great way to do that is avoid long sequences with many variations. In other words, turn this: seq(
optional('foo'),
optional('bar'),
optional('baz'),
optional('quux'),
// ...
) into this: seq(
repeat(choice('foo', 'bar', 'baz', 'quux')),
// ...
) It seems like there may be opportunities to do some of that in this grammar - there may be unnecessarily-specific sequences that could be modeled more generically. |
(block | ||
(return_statement | ||
(identifier)))))))) | ||
(anonymous_method_expression |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is it we're losing "static" as a (modifier) here but we're keeping static as a modifier on the lambda equivalent on line 1262?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can add that back for consistency, but what about static/unsafe in using_directive? I also (personally) think it's ok to hide modifiers, and just expose the string literals in the tree instead, but I don't have much of a preference.
This is shaping up great! If we can get attribute_argument back in then I'm good on the Roslyn alignment/back compat from the naming side. I think we just need to check/adjust the changes and removals of fields to make sure Semantic doesn't break and we should be good to go. |
Thanks for checking in, 👍 from our side! We're pinned to the current release so we won't silently upgrade, and we're using the syntax highlighting queries directly from this repo, which I see are updated as part of this PR. We have augmented the tagging queries with the ability to build up scoped names, which we'll have to update as part of bumping to the new version containing these changes. But the changes to |
thanks for the review/feedback @maxbrunsfeld @damieng @tamasvajk @hvitved! |
Is there any way, using |
$
is used by delegating this to the external scanner for statefulness by leveraging a stack of the current interpolation's info (yes, nested interpolations are handled nicely, no, it was not fun at times to implement)I will update the tokens for publishing the new version later today (needs pypi).