Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conditional directives can break parsing #6

Closed
ckreibich opened this issue Jun 24, 2022 · 6 comments
Closed

Conditional directives can break parsing #6

ckreibich opened this issue Jun 24, 2022 · 6 comments
Labels
bug Something isn't working wontfix This will not be worked on

Comments

@ckreibich
Copy link
Member

ckreibich commented Jun 24, 2022

There is now a construct in the Zeek distribution (technically, in the Spicy plugin) that breaks the tree-sitter parsing tests:

@if ( Version::number >= 50000 )
# Marked with &is_used to suppress complaints when there aren't any
# Spicy file analyzers loaded, and hence this event can't be generated.
# The attribute is only supported for Zeek 5.0 and higher.
event spicy_analyzer_for_mime_type(a: Files::Tag, mt: string) &is_used
@else
event spicy_analyzer_for_mime_type(a: Files::Tag, mt: string)
@endif
    {
    Files::register_for_mime_type(a, mt);
    }

It breaks because the grammar currently allows "preprocessor-style" directives in a handful places, but still expects an otherwise normal AST, and in the above example that's not the case — you have two func_hdrs in direct succession.

We could fix this by breaking apart the current set of preproc_directive choices and make the conditional ones like comments, allowing them to appear anywhere.

(CI should have notified us about this, btw ... but Github disabled it after 60 days of inactivity in this repo and we missed the notification. 🤦 I've re-enabled CI.)

@ckreibich
Copy link
Member Author

That will only allow arbitrary placement of the conditionals, though. It won't fix the fact that the resulting AST still looks broken. Mhmm.

@bbannier
Copy link
Member

I had a look at how this is tackled by the pros in the C grammar provided by the tree-sitter org, but it looks like preprocessor macros as general as the ones supported by C or Zeek might just be nasty and hard to deal with; I filed tree-sitter/tree-sitter-c#108 to gather some feedback.

As far as zeek-format is concerned I do not see incorrect formatting for above snippet; in fact, nothing is formatted which I think is acceptable. Could we just fix the test you mentioned (CI is still ✅?) as things are "working"?

@ckreibich
Copy link
Member Author

Yeah I saw that that ticket too :-) ... and agree re. the testing — we can update it so it covers this error as expected.

I wonder if we could extend the grammar somewhat so it covers "reasonable" uses of conditionals (like in the above — a func_hdr in a conditional block) that the grammar would accept. It would not be perfect, but on the other hand the fact that this is the only such parsing problem in the distribution also indicates that it's a rare problem.

@ckreibich
Copy link
Member Author

ckreibich commented Jun 24, 2022

Two more thoughts:

We could offer two parsers — one for the directives (perhaps just the "code-structural" ones), and one for the scripting language. The former could then understand the syntax of the conditionals (@if -> @else / @else if -> ... -> @endif — the current grammar doesn't enforce that), and leave it up to the user how to piece together the scripting language content. One file would turn into multiple, conditionals-free scripts. Correct, but tricky.

It'd be cool if there's a tree-sitter level fix for this. If you could make the grammar aware of preprocessing, AST nodes could offer alternative node subtrees to cover the options.

@bbannier
Copy link
Member

bbannier commented Jun 25, 2022

It'd be cool if there's a tree-sitter level fix for this.

Yeah, this is exactly what I would learn in the issue I filed, but I fear this is not possible in tree-sitter right now and might also be hard to add, we'll see.

One file would turn into multiple, conditionals-free scripts. Correct, but tricky.

That sounds technically possible, but I wonder how one would use such a setup. E.g., thinking about zeek-format, it would then need to format all scripts (since zeek-script cannot resolve them and worst case any preprocessor directive could bifurcate the "global AST"), and then somehow reconcile the resulting formattings into one true formatting for the given input. This would be great as all branches would be formatted. Now for zeek-language-server, beging able to complete code for all possible branches would also be nice, but would probably be impossible in general since if there are proprocessor branches on Zeek versions we'd only have one version of e.g., the base scripts around. In that case I would much prefer if Zeek gave me a way to preprocess the given input with the used Zeek version so the language server would only ever have to deal with one input free of preprocessor directives. So my gut tells me that this might be a lot of work with unclear benefit.

The tree-sitter grammar for C seems to solve by modelling preprocessor macros only in some places, but then directly add them to the AST, e.g., see this explicit handling of #if. That way they do not need to resolve the directives. It might be possible to do something similar in the Zeek grammar, e.g., support conditional record fields. It still wouldn't help if e.g., the branches contain two different function names/signatures like you mentioned originally and which might be more common in Zeek (e.g., for deprecations).

My gut still tells me: just ignore this issue, but make sure we don't mutilate sources on format (I think this should already work).

@ckreibich
Copy link
Member Author

I think you're right. It goes to the heart of the matter: the conditionals essentially represent multiple scripts.

If we want to keep the distribution working, we could sidestep this particular issue by keeping the AST intact despite the conditionals — a redundant Files::register_for_mime_type(a, mt); doesn't seem terrible to me.

The Github action is running again, and it passes ... because the Zeek disto clone isn't recursive and so doesn't pull in the Spicy file. Hah. :-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working wontfix This will not be worked on
Projects
None yet
Development

No branches or pull requests

2 participants