Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Permit braces in more contexts #10

Merged
merged 10 commits into from
Jan 27, 2024

Conversation

savetheclocktower
Copy link
Contributor

@savetheclocktower savetheclocktower commented Jan 9, 2024

Fixes #1.

Allows for free use of { and } in descriptions when they're not part of inline tags. Also allows for braces in type descriptions as long as they're balanced.

In the wild I'm often seeing more liberal usage of JSDoc syntax than what this parser allows. This parser doesn't tolerate curly braces in descriptions except as inline tags, but here's one example that illustrates two different usages that run afoul of that:

  1. Casual usage of type-style syntax in descriptions (e.g., Returns a two-item {Array}.). I don't know whether this syntax has meaning for some tool out there, but I see it often enough that I figure it must.
  2. Code examples that include curly braces.

Here's how that code example looks in Pulsar (alongside a playground-style view of the tree):

illustration of parser failure

Neither of these usages seems to be expressly prohibited by the JSDoc spec (such as it is), so this PR aims to tolerate both of these usages without putting errors into the tree and without giving these constructs any special treatment that the spec doesn't recognize.

After I fixed this, I realized that there was one more brace-related failure that is explicitly allowed by the spec. The type formats allowed in JSDoc include some with braces — e.g., {{a: number, b: string, c}} — so I wanted to make sure the parser tolerated that. I decided to turn type into an external, as it was the easiest way I could think of to keep track of when the braces were balanced.

I also added the @type tag. There are other tags that could be added, but I can tackle that in a separate PR.

case '\n':
// Something's gone wrong.
return false;
default:;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we handle the eof or null case? It's possible we might get stuck in an infinite loop here, might also be a good idea to pull in the fuzz action

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can certainly handle eof, but the fuzz action is outside of my expertise.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's an example - just plop this in .github/workflows/fuzz.yml and change the name to jsdoc
https://github.com/tree-sitter/tree-sitter-cpp/blob/master/.github/workflows/fuzz.yml

@amaanq
Copy link
Member

amaanq commented Jan 27, 2024

Also just pushed a commit adding some newer tags, rebase on top of that please 🙂

Allows for free use of `{` and `}` in descriptions when they're not part of inline tags. Also allows for braces in type descriptions as long as they're balanced.
src/scanner.c Outdated
@@ -0,0 +1,48 @@
#include <tree_sitter/parser.h>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Local include please, don't want another infinite loop bug using mismatched system-packaged header files ;)

Suggested change
#include <tree_sitter/parser.h>
#include "tree_sitter/parser.h"

@@ -13,8 +13,9 @@ extern "C" {
#define ts_builtin_sym_end 0
#define TREE_SITTER_SERIALIZATION_BUFFER_SIZE 1024

#ifndef TREE_SITTER_API_H_
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you check out tree-sitter from master or 0.20.9 and install the cli locally, or run cargo install --git https://github.com/tree-sitter/tree-sitter? We will have 0.20.9 on npm/crates early next week, but for now I'd like to keep generated output as it is on 0.20.9, sorry for the trouble

grammar.js Outdated
Comment on lines 74 to 76
'{',
/[^@}]+/,
optional('}'),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a note - existing queries might be matching all braces as a 'special' punctuation - since this is a false positive we probably wouldn't want to match against this

We can solve this by making it a regex, thus unqueryable, or with an alias. This also can be left as is, I'll leave that up to you, just throwing in my 2 cents

Copy link
Contributor Author

@savetheclocktower savetheclocktower Jan 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can optional take a regex? Would changing '}' to /}/ be the only change needed?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pretty much, yeah, we just don't want to error out in these cases, yet would still keep it hidden from consumers. Does tokenizing this entire rule work as well? that could eliminate querying the braces too

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just changed it to a regex. Works for me if it works for you.

@amaanq
Copy link
Member

amaanq commented Jan 27, 2024

@savetheclocktower
Copy link
Contributor Author

OK, I think I addressed everything — including a recommendation from the CLI to make scan_for_type static.

@amaanq
Copy link
Member

amaanq commented Jan 27, 2024

Awesome - the static recommendation is to avoid clashes when a project using several grammars compiles several scanners - if any two have the same function name then that creates a clash and only one of them is actually compiled in, creating a subtle bug. This was an issue when I was rewriting all the C++ scanners into C, and received bug reports about it. The static keyword keeps each function definition self-contained in its translation unit

One last thing, can you make the opening brace in the false positive rule a regex as well, if possible?

@savetheclocktower
Copy link
Contributor Author

Yup, the static thing makes sense, to the extent that I understand anything about the C universe.

@savetheclocktower
Copy link
Contributor Author

Hang on, don't merge yet. That last change broke something.

@savetheclocktower
Copy link
Contributor Author

savetheclocktower commented Jan 27, 2024

OK, switching the opening { to a regex broke some tests for reasons I don't understand. I experimented with a few alternatives and chose to convert it to a single regex wrapped in token; this changes the tree in the “EOF before balanced braces” test — for worse, in my opinion — but otherwise makes all the tests pass.

Should be good to merge now.

@amaanq
Copy link
Member

amaanq commented Jan 27, 2024

I think it's because having them all as regex rules internally collapses them into one regex/token-like rule, but I could be wrong

The error test case is worse, but that wouldn't be the case with incremental parsing, or it can be fixed by just aliasing the brackets instead

@amaanq
Copy link
Member

amaanq commented Jan 27, 2024

Just noticed the wctype.h include isn't needed - can you remove that? This is good to go then

@savetheclocktower
Copy link
Contributor Author

How would the aliasing work? Is it just one level of indirection like

{
    _opening_brace_but_not_an_anonymous_node: _ => '{',

    _inline_tag_false_positive: _ => prec.left(1, seq(
     alias($._opening_brace_but_not_an_anonymous_node, '{'),
      /[^@}]+/,
      optional(/}/),
    )),
}

or is it the other way around somehow?

@savetheclocktower
Copy link
Contributor Author

Also, do I need stdio.h either? I think that was just left over from doing some printf debugging.

@savetheclocktower
Copy link
Contributor Author

And yet another question: what's the easiest way to actually figure out whether a parser is generating an anonymous node when you don't want it to? Will any of the CLI tooling show me anonymous nodes?

@amaanq
Copy link
Member

amaanq commented Jan 27, 2024

It'd be something like this

inline_tag_false_positive: $ => prec.left(1, seq(
      alias('{', $._brace_left),
      /[^@}]+/,
      alias('}', $._brace_right),
    )),

And yeah stdio.h isn't needed either, not sure why clang didn't warn me about that being unused/unneeded.

All literals by default (strings) will generate literal nodes that are queryable. All rules or aliases prefixed with an underscore are hidden somewhat - they're visible during parsing, but not in the output tree or when using queries.

It might be a good idea for debugging to add anonymous symbols generated in the output, but this might get hectic in large grammars. A simple way would be to grep for anon_sym in parser.c, like so
image

Regex nodes are also anonymous nodes, but not queryable

@savetheclocktower
Copy link
Contributor Author

savetheclocktower commented Jan 27, 2024

See, I tried something like your example, and it put _brace_left and _brace_right nodes in my tree.

Anyway, my brain hurts, and the token solution is good enough.

Also, tree-sitter parse foo.jsdoc -D gave me enough info to discern what was treated as an anonymous node, so I suppose that's good enough.

@amaanq
Copy link
Member

amaanq commented Jan 27, 2024

Oh you're right - I forgot aliases with underscores are visible, sorry for the confusion. Let's land this as is

@amaanq amaanq merged commit a5e363a into tree-sitter:master Jan 27, 2024
5 checks passed
@amaanq
Copy link
Member

amaanq commented Jan 27, 2024

Thank you for the PR! I appreciate it

@savetheclocktower savetheclocktower deleted the freer-use-of-braces branch January 27, 2024 06:37
@amaanq
Copy link
Member

amaanq commented Jan 27, 2024

I'll cut a release on crates/npm tomorrow, going to bed now

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Overeager matching
2 participants