-
-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Permit braces in more contexts #10
Permit braces in more contexts #10
Conversation
case '\n': | ||
// Something's gone wrong. | ||
return false; | ||
default:; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we handle the eof or null case? It's possible we might get stuck in an infinite loop here, might also be a good idea to pull in the fuzz action
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can certainly handle eof, but the fuzz action is outside of my expertise.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here's an example - just plop this in .github/workflows/fuzz.yml and change the name to jsdoc
https://github.com/tree-sitter/tree-sitter-cpp/blob/master/.github/workflows/fuzz.yml
Also just pushed a commit adding some newer tags, rebase on top of that please 🙂 |
Allows for free use of `{` and `}` in descriptions when they're not part of inline tags. Also allows for braces in type descriptions as long as they're balanced.
5ee169e
to
ea3310f
Compare
src/scanner.c
Outdated
@@ -0,0 +1,48 @@ | |||
#include <tree_sitter/parser.h> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Local include please, don't want another infinite loop bug using mismatched system-packaged header files ;)
#include <tree_sitter/parser.h> | |
#include "tree_sitter/parser.h" |
src/tree_sitter/parser.h
Outdated
@@ -13,8 +13,9 @@ extern "C" { | |||
#define ts_builtin_sym_end 0 | |||
#define TREE_SITTER_SERIALIZATION_BUFFER_SIZE 1024 | |||
|
|||
#ifndef TREE_SITTER_API_H_ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you check out tree-sitter from master or 0.20.9 and install the cli locally, or run cargo install --git https://github.com/tree-sitter/tree-sitter
? We will have 0.20.9 on npm/crates early next week, but for now I'd like to keep generated output as it is on 0.20.9, sorry for the trouble
grammar.js
Outdated
'{', | ||
/[^@}]+/, | ||
optional('}'), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a note - existing queries might be matching all braces as a 'special' punctuation - since this is a false positive we probably wouldn't want to match against this
We can solve this by making it a regex, thus unqueryable, or with an alias. This also can be left as is, I'll leave that up to you, just throwing in my 2 cents
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can optional
take a regex? Would changing '}'
to /}/
be the only change needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pretty much, yeah, we just don't want to error out in these cases, yet would still keep it hidden from consumers. Does tokenizing this entire rule work as well? that could eliminate querying the braces too
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just changed it to a regex. Works for me if it works for you.
Also - if you can add "scanner.c" to the relevant sections in binding.gyp (reason for the windows failure), Package.swift, and bindings/rust/build.rs, that'd be great! https://github.com/tree-sitter/tree-sitter-cpp/blob/4ca37be8e70e5a40ae95688bec56b886ba945888/binding.gyp#L12 |
OK, I think I addressed everything — including a recommendation from the CLI to make |
Awesome - the static recommendation is to avoid clashes when a project using several grammars compiles several scanners - if any two have the same function name then that creates a clash and only one of them is actually compiled in, creating a subtle bug. This was an issue when I was rewriting all the C++ scanners into C, and received bug reports about it. The One last thing, can you make the opening brace in the false positive rule a regex as well, if possible? |
Yup, the |
Hang on, don't merge yet. That last change broke something. |
OK, switching the opening Should be good to merge now. |
I think it's because having them all as regex rules internally collapses them into one regex/token-like rule, but I could be wrong The error test case is worse, but that wouldn't be the case with incremental parsing, or it can be fixed by just aliasing the brackets instead |
Just noticed the wctype.h include isn't needed - can you remove that? This is good to go then |
How would the aliasing work? Is it just one level of indirection like {
_opening_brace_but_not_an_anonymous_node: _ => '{',
_inline_tag_false_positive: _ => prec.left(1, seq(
alias($._opening_brace_but_not_an_anonymous_node, '{'),
/[^@}]+/,
optional(/}/),
)),
} or is it the other way around somehow? |
Also, do I need |
And yet another question: what's the easiest way to actually figure out whether a parser is generating an anonymous node when you don't want it to? Will any of the CLI tooling show me anonymous nodes? |
See, I tried something like your example, and it put Anyway, my brain hurts, and the token solution is good enough. Also, |
Oh you're right - I forgot aliases with underscores are visible, sorry for the confusion. Let's land this as is |
Thank you for the PR! I appreciate it |
I'll cut a release on crates/npm tomorrow, going to bed now |
Fixes #1.
Allows for free use of
{
and}
in descriptions when they're not part of inline tags. Also allows for braces in type descriptions as long as they're balanced.In the wild I'm often seeing more liberal usage of JSDoc syntax than what this parser allows. This parser doesn't tolerate curly braces in descriptions except as inline tags, but here's one example that illustrates two different usages that run afoul of that:
Returns a two-item {Array}.
). I don't know whether this syntax has meaning for some tool out there, but I see it often enough that I figure it must.Here's how that code example looks in Pulsar (alongside a playground-style view of the tree):
Neither of these usages seems to be expressly prohibited by the JSDoc spec (such as it is), so this PR aims to tolerate both of these usages without putting errors into the tree and without giving these constructs any special treatment that the spec doesn't recognize.
After I fixed this, I realized that there was one more brace-related failure that is explicitly allowed by the spec. The type formats allowed in JSDoc include some with braces — e.g.,
{{a: number, b: string, c}}
— so I wanted to make sure the parser tolerated that. I decided to turntype
into an external, as it was the easiest way I could think of to keep track of when the braces were balanced.I also added the
@type
tag. There are other tags that could be added, but I can tackle that in a separate PR.