-
Notifications
You must be signed in to change notification settings - Fork 271
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enable generating a parsed AST for Dafny, and a deserializer for that #6102
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just started a high-level review for now, will return with a more detailed pass.
Silly question, but where does the generated "Parsed AST" live? I see Deserializer/Generated.cs
checked in but it seems to produce the existing AST types?
Binary | ||
} | ||
|
||
public static readonly Option<InputTypeEnum> InputType = new("--input-format", () => InputTypeEnum.Source) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need this? I think it would be simpler to just determine this from the file extension, as we already do for .dfy
vs .doo
, and even .toml
for project files in some places.
(We may want to bikeshed whether .dbin
is the best extension name, but that's a separate discussion :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm using it in combination with --stdin
. For the Java-Dafny compiler, so far I decided not to create any intermediate files but stream directly to dafny
.
When not using --stdin
, indeed the file extension is used to select how to interpret the file contents.
[AttributeUsage(AttributeTargets.Constructor)] | ||
public class ParseConstructorAttribute : Attribute { } | ||
|
||
[AttributeUsage(AttributeTargets.Parameter | AttributeTargets.Field)] | ||
public class BackEdge : Attribute { } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Definitely document these attributes really well. Everyone working with the AST will need to understand when they need to be applied.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Everyone working with the AST will need to understand when they need to be applied.
Added documentation.
One thing I have not considered much is how to test the features that this PR is adding. Because currently they're not actively used, and I already have a lot of code that's built up, I'm inclined to review and merge this code even though not enough effective testing mechanism is present. I already have a branch with many more additional changes and I'd prefer not to add those to this PR as well.
Coming back to testing. If we had not only a deserializer (which consumes the actual AST) but also a serializer that relies on the generated parsed AST, then we could add an option --round-trip-serialize
and add that to a bunch of our existing tests, to test if the whole serialization mechanic is working well.
Without that feature, I don't see a good way to test either the usefulness of the generated parsed AST or the deserializer code. Right now I can test them by using the serializer on the Java side to generate serialized content, but that's not a scalable approach.
Is it OK with you if I add the serializer on the C# side in a follow-up PR? If it is fast enough, we could consider turning on --round-trip-serialize
as a default for a subgroup of our tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Even for a fully internal format, I think this could be a bit more self-describing and readable, for our own sanity when we have to debug it in the future. Happy for this to be merged as is, but I think even just adding newlines and indentation would be worth it in a follow-up PR.
Can you include the equivalent .dfy
source for this as well? Perhaps the test should not only verify it but also --print
it and compare to the .dfy
equivalent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The textual format is indeed intended to be the debugging format, while I will also add a binary format that's completely unreadable. Adding newlines and indentation is a great idea, although I think it will still be unreasonably time consuming to read the format without using a debugger while it's being deserialized.
Can you include the equivalent .dfy source for this as well? Perhaps the test should not only verify it but also --print it and compare to the .dfy equivalent.
Let's remove having tests with serialized input entirely, and replace that with tests that first serializes some .dfy file and then deserialize it.
Sorry to publish a PR that's not in a 'complete' state. Part of the reason is that I've been aggressively working towards a demo so testing properly was on the back burner.
I'm hesitant about backtracking and pruning/adding things to this PR to make it proper. Maybe after we merge this one I can go over the comments in this PR and make sure they're all addressed in a follow-up?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, I'll try to add the serializer on the C# side based on my latest branch, and then try to create a PR that contains only the framework logic and not the AST changes, so we can have a proper PR to merge.
var options = new DafnyOptions(DafnyOptions.Default); | ||
options.Set(CommonOptionBag.InputType, CommonOptionBag.InputTypeEnum.Binary); | ||
var reporter = new BatchErrorReporter(options); | ||
var output = await ProgramParser.Parse(input, new Uri("file://test.java"), reporter); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why .java
here? I'm assuming it's a dummy value anyway, but it doesn't feel right to even imply this format would ever live in a .java
file.
[Fact] | ||
public async Task Deserialize() { | ||
var input = | ||
"0; 1; 2; 3; 4; 5; 6; 7; 8; 9; 10; 11; \"foo\" 0; 0; true true 1; ClassDecl 0; 1; 2; 3; 4; 5; 6; 7; 8; 9; 10; 11; \"foo\" true 0; 1; Method 0; 1; 2; 3; 4; 5; 6; 7; 8; 9; 10; 11; \"foo\" true true false 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; true 0; 0; 0; 0; 0; 0; 0; true 0; 0; 0; 0; 0; 0; 0; 0; true 0; 0; 0; 0; 0; 0; true 1; AssertStmt 0; 0; 0; 0; 0; 0; true LiteralExpr 0; 0; 0; 0; 0; 0; Boolean false true true true 0; false 0; 0; 0; false 0; "; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're claiming this format is schemaless, which is mostly true, but you're leaving the type of the top level element(s) unspecified, and relying on invoking the right parsing method to read it successfully. I think it would help catch bugs more easily and improve the format's readability a bit to include this type as well, with very little impact on encoding size.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're claiming this format is schemaless, which is mostly true,
but you're leaving the type of the top level element(s) unspecified, and relying on invoking the right parsing method to read it successfully.
Not putting statically known types in the format, is keeping the format schemaless right?
I think it would help catch bugs more easily and improve the format's readability a bit to include this type as well, with very little impact on encoding size.
I don't follow what bugs you would catch or how it would help with readability. Reading a schema-less format will be unreasonably time consuming even if you add a top-level type. I only read the format while it is being deserialized, so the call-stack call tell me in the schema it is at. I think we could consider optionally adding types throughout the entire format as a debugging mode on top of already being textual, which I intended as a debugging mode.
using Microsoft.Dafny; | ||
using Type = System.Type; | ||
|
||
namespace IntegrationTests; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wrong namespace I think?
// if (hasErrors) { | ||
// throw new Exception("Exception"); | ||
// } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be uncommented?
root.AddCommand(ParsedAstGenerator.GetCommand()); | ||
root.AddCommand(DeserializerGenerator.GetCommand()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Worth adding top-level Makefile targets for these? Especially since then you can provide the correct default arguments for updating the parsed AST.
Closing in favor of: #6112 |
What was changed?
generate-parsed-ast
which create a version of the Dafny AST that only contains fields which filled in through source codegenerate-deserializer
which generates aDeserializer
class that takes aIDecoder
and returns an instance of the Dafny AST, which is in a post-parsing state.#nullable enable
annotations in this files, and fixes for the nullability warnings in those files.ParseConstructorAttribute
to some constructors in the Dafny ASTFuture work
IDecoder
. Currently there's only a text based format, but it would be easy to add a byte based one.Caveats
Expression
, the generator only visits subtypes that have a constructor annotated with[ParseConstructor]
, which is intentionally missing for many types right now. Because it is missing for those types, not the entire Dafny AST is generated, which means we can work more incrementally.How has this been tested?
DeserializerTest.cs
inputFormat.dfy
CLI testBy submitting this pull request, I confirm that my contribution is made under the terms of the MIT license.