Enable generating a parsed AST for Dafny, and a deserializer for that #6102

keyboardDrummer · 2025-02-11T12:27:41Z

What was changed?

Create a command generate-parsed-ast which create a version of the Dafny AST that only contains fields which filled in through source code
Create a command generate-deserializer which generates a Deserializer class that takes a IDecoder and returns an instance of the Dafny AST, which is in a post-parsing state.
Add nullable annotations to the Dafny AST in some places, together with #nullable enable annotations in this files, and fixes for the nullability warnings in those files.
Add ParseConstructorAttribute to some constructors in the Dafny AST
Remove some fields from the Dafny AST that always held the same value.
Rename certain fields/properties and constructor parameters, so the names match
[Optional] Reorder certain constructor parameters, so the parameters that are passed to the super constructor come first. This was needed for a previous version of the code but is no longer needed.

Future work

Currently the 'interface definition language' is C# types. We can still switch to any of the existing IDL languages in a separate PR. I think most of the code would remain the same.
The serialization format is schemaless, and the code is setup so that the format can be easily changed by providing a different implementation of IDecoder. Currently there's only a text based format, but it would be easy to add a byte based one.

Caveats

Given an abstract type like Expression, the generator only visits subtypes that have a constructor annotated with [ParseConstructor], which is intentionally missing for many types right now. Because it is missing for those types, not the entire Dafny AST is generated, which means we can work more incrementally.
The generators require that each parameter of a constructor can be mapped to a field or property, and fail otherwise. This PR does not update CI so that the generators are run, so the AST could become out of date after this PR is merged.

How has this been tested?

Add a few unit tests in DeserializerTest.cs
Add inputFormat.dfy CLI test

By submitting this pull request, I confirm that my contribution is made under the terms of the MIT license.

robin-aws

Just started a high-level review for now, will return with a more detailed pass.

Silly question, but where does the generated "Parsed AST" live? I see Deserializer/Generated.cs checked in but it seems to produce the existing AST types?

robin-aws · 2025-02-15T00:04:54Z

Source/DafnyCore/Options/CommonOptionBag.cs

+    Binary
+  }
+
+  public static readonly Option<InputTypeEnum> InputType = new("--input-format", () => InputTypeEnum.Source) {


Do we need this? I think it would be simpler to just determine this from the file extension, as we already do for .dfy vs .doo, and even .toml for project files in some places.

(We may want to bikeshed whether .dbin is the best extension name, but that's a separate discussion :)

I'm using it in combination with --stdin. For the Java-Dafny compiler, so far I decided not to create any intermediate files but stream directly to dafny.

When not using --stdin, indeed the file extension is used to select how to interpret the file contents.

robin-aws · 2025-02-15T00:19:37Z

Source/DafnyCore/AST/Program.cs

+[AttributeUsage(AttributeTargets.Constructor)]
+public class ParseConstructorAttribute : Attribute { }
+
+[AttributeUsage(AttributeTargets.Parameter | AttributeTargets.Field)]
+public class BackEdge : Attribute { }


Definitely document these attributes really well. Everyone working with the AST will need to understand when they need to be applied.

Everyone working with the AST will need to understand when they need to be applied.

Added documentation.

One thing I have not considered much is how to test the features that this PR is adding. Because currently they're not actively used, and I already have a lot of code that's built up, I'm inclined to review and merge this code even though not enough effective testing mechanism is present. I already have a branch with many more additional changes and I'd prefer not to add those to this PR as well.

Coming back to testing. If we had not only a deserializer (which consumes the actual AST) but also a serializer that relies on the generated parsed AST, then we could add an option --round-trip-serialize and add that to a bunch of our existing tests, to test if the whole serialization mechanic is working well.

Without that feature, I don't see a good way to test either the usefulness of the generated parsed AST or the deserializer code. Right now I can test them by using the serializer on the Java side to generate serialized content, but that's not a scalable approach.

Is it OK with you if I add the serializer on the C# side in a follow-up PR? If it is fast enough, we could consider turning on --round-trip-serialize as a default for a subgroup of our tests.

robin-aws · 2025-02-15T00:23:26Z

Source/IntegrationTests/TestFiles/LitTests/LitTest/cli/Input/assertTrue.dbin

Even for a fully internal format, I think this could be a bit more self-describing and readable, for our own sanity when we have to debug it in the future. Happy for this to be merged as is, but I think even just adding newlines and indentation would be worth it in a follow-up PR.

Can you include the equivalent .dfy source for this as well? Perhaps the test should not only verify it but also --print it and compare to the .dfy equivalent.

The textual format is indeed intended to be the debugging format, while I will also add a binary format that's completely unreadable. Adding newlines and indentation is a great idea, although I think it will still be unreasonably time consuming to read the format without using a debugger while it's being deserialized.

Can you include the equivalent .dfy source for this as well? Perhaps the test should not only verify it but also --print it and compare to the .dfy equivalent.

Let's remove having tests with serialized input entirely, and replace that with tests that first serializes some .dfy file and then deserialize it.

Sorry to publish a PR that's not in a 'complete' state. Part of the reason is that I've been aggressively working towards a demo so testing properly was on the back burner.

I'm hesitant about backtracking and pruning/adding things to this PR to make it proper. Maybe after we merge this one I can go over the comments in this PR and make sure they're all addressed in a follow-up?

Actually, I'll try to add the serializer on the C# side based on my latest branch, and then try to create a PR that contains only the framework logic and not the AST changes, so we can have a proper PR to merge.

robin-aws · 2025-02-15T00:30:20Z

Source/DafnyCore.Test/DeserializerTest.cs

+    var options = new DafnyOptions(DafnyOptions.Default);
+    options.Set(CommonOptionBag.InputType, CommonOptionBag.InputTypeEnum.Binary);
+    var reporter = new BatchErrorReporter(options);
+    var output = await ProgramParser.Parse(input, new Uri("file://test.java"), reporter);


Why .java here? I'm assuming it's a dummy value anyway, but it doesn't feel right to even imply this format would ever live in a .java file.

robin-aws · 2025-02-15T00:31:10Z

Source/DafnyCore.Test/DeserializerTest.cs

+  [Fact]
+  public async Task Deserialize() {
+    var input =
+      "0; 1; 2; 3; 4; 5; 6; 7; 8; 9; 10; 11; \"foo\" 0; 0; true true 1; ClassDecl 0; 1; 2; 3; 4; 5; 6; 7; 8; 9; 10; 11; \"foo\" true 0; 1; Method 0; 1; 2; 3; 4; 5; 6; 7; 8; 9; 10; 11; \"foo\" true true false 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; true 0; 0; 0; 0; 0; 0; 0; true 0; 0; 0; 0; 0; 0; 0; 0; true 0; 0; 0; 0; 0; 0; true 1; AssertStmt 0; 0; 0; 0; 0; 0; true LiteralExpr 0; 0; 0; 0; 0; 0; Boolean false true true true 0; false 0; 0; 0; false 0; ";


You're claiming this format is schemaless, which is mostly true, but you're leaving the type of the top level element(s) unspecified, and relying on invoking the right parsing method to read it successfully. I think it would help catch bugs more easily and improve the format's readability a bit to include this type as well, with very little impact on encoding size.

You're claiming this format is schemaless, which is mostly true,

but you're leaving the type of the top level element(s) unspecified, and relying on invoking the right parsing method to read it successfully.

Not putting statically known types in the format, is keeping the format schemaless right?

I think it would help catch bugs more easily and improve the format's readability a bit to include this type as well, with very little impact on encoding size.

I don't follow what bugs you would catch or how it would help with readability. Reading a schema-less format will be unreasonably time consuming even if you add a top-level type. I only read the format while it is being deserialized, so the call-stack call tell me in the schema it is at. I think we could consider optionally adding types throughout the entire format as a debugging mode on top of already being textual, which I intended as a debugging mode.

robin-aws · 2025-02-15T00:32:29Z

Source/Scripts/PostParseAstVisitor.cs

+using Microsoft.Dafny;
+using Type = System.Type;
+
+namespace IntegrationTests;


Wrong namespace I think?

robin-aws · 2025-02-15T00:33:53Z

Source/Scripts/ParsedAstGenerator.cs

+    // if (hasErrors) {
+    //   throw new Exception("Exception");
+    // }


Should be uncommented?

robin-aws · 2025-02-15T00:35:28Z

Source/Scripts/MainClass.cs

+    root.AddCommand(ParsedAstGenerator.GetCommand());
+    root.AddCommand(DeserializerGenerator.GetCommand());


Worth adding top-level Makefile targets for these? Especially since then you can provide the correct default arguments for updating the parsed AST.

keyboardDrummer · 2025-02-18T16:11:31Z

Closing in favor of: #6112

keyboardDrummer added 30 commits January 28, 2025 12:24

Generated a parsed ast but it's not pretty yet

6f82db9

Parsed ast is in a decent state

6db6f65

Trying to get the parsedAst.cs file to compile without errors

cc29735

One more error

b07edad

Zero errors except the build one

afefda1

Small tweaks

3b9c904

Various fixes

3267827

Fixes

0ada246

Update

1edc32b

IsRefining refactoring

ef85826

Refactor IsOpened

c33d8a0

Fixes

8572dad

Add deserializer

279534f

Move deserializer

e4bb9ce

Format

050abb6

Fix ast issue

c31186e

Fix deserializer issue

621e924

Update test file

0cf43a0

Reorder constructor members to let the deserializer work automatically

731b6fe

Working on the deserializer generation

220cf05

Ast fixes

cc8dd0c

Base type override

b39d05b

Debugging the generated deserializer

dfaa7d3

DeserializerTest passes

1276627

Test passes

51fce4c

Fix

91bfb19

Make the format more schemaless

2276c66

Really have to switch to explicit nullability annotations

fb354f8

new format parses

340b638

Update the generated parsed AST with nullability annotations

f9324a9

keyboardDrummer added 12 commits February 11, 2025 13:22

Cleanup

c8a9548

More cleanup

4c97d50

Remove JsonSchemaCommand

01dd480

Add second command

8080e50

Reduce duplication

05c977c

Updated generators

47e6760

Done splitting the two generators

9392c7c

Refactoring

8e19dea

More refactoring

d0c4953

Fix test

4ab4aea

Fix warnings

697eaf5

Merge remote-tracking branch 'origin/master' into jsonParser

191826f

keyboardDrummer requested a review from robin-aws February 11, 2025 15:16

keyboardDrummer added 4 commits February 11, 2025 17:13

Trigger CI

77f29fc

Run formatter

b3b4a2b

Fix refactoring mistake

64041f4

Fix to TypeParameterCharacteristics

eb80e14

keyboardDrummer marked this pull request as ready for review February 12, 2025 13:33

keyboardDrummer enabled auto-merge (squash) February 12, 2025 13:34

keyboardDrummer changed the title ~~Enable generating a parsed AST for Dafny, and a deserializer that~~ Enable generating a parsed AST for Dafny, and a deserializer for that Feb 12, 2025

Fixes to parsedAst.cs generation

816fc76

robin-aws reviewed Feb 15, 2025

View reviewed changes

keyboardDrummer closed this Feb 18, 2025

auto-merge was automatically disabled February 18, 2025 16:11
Pull request was closed

keyboardDrummer deleted the jsonParser branch February 18, 2025 16:11

robin-aws mentioned this pull request Feb 18, 2025

Add serialization framework #6112

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enable generating a parsed AST for Dafny, and a deserializer for that #6102

Enable generating a parsed AST for Dafny, and a deserializer for that #6102

keyboardDrummer commented Feb 11, 2025 •

edited

Loading

robin-aws left a comment

robin-aws Feb 15, 2025

keyboardDrummer Feb 17, 2025 •

edited

Loading

robin-aws Feb 15, 2025

keyboardDrummer Feb 17, 2025 •

edited

Loading

robin-aws Feb 15, 2025

keyboardDrummer Feb 17, 2025 •

edited

Loading

keyboardDrummer Feb 17, 2025

robin-aws Feb 15, 2025

robin-aws Feb 15, 2025

keyboardDrummer Feb 17, 2025 •

edited

Loading

robin-aws Feb 15, 2025

robin-aws Feb 15, 2025

robin-aws Feb 15, 2025

keyboardDrummer commented Feb 18, 2025

		root.AddCommand(ParsedAstGenerator.GetCommand());
		root.AddCommand(DeserializerGenerator.GetCommand());

Enable generating a parsed AST for Dafny, and a deserializer for that #6102

Enable generating a parsed AST for Dafny, and a deserializer for that #6102

Conversation

keyboardDrummer commented Feb 11, 2025 • edited Loading

What was changed?

Future work

Caveats

How has this been tested?

robin-aws left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

keyboardDrummer Feb 17, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

keyboardDrummer Feb 17, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

keyboardDrummer Feb 17, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

keyboardDrummer Feb 17, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

keyboardDrummer commented Feb 18, 2025

keyboardDrummer commented Feb 11, 2025 •

edited

Loading

keyboardDrummer Feb 17, 2025 •

edited

Loading

keyboardDrummer Feb 17, 2025 •

edited

Loading

keyboardDrummer Feb 17, 2025 •

edited

Loading

keyboardDrummer Feb 17, 2025 •

edited

Loading