Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable generating a parsed AST for Dafny, and a deserializer for that #6102

Closed
wants to merge 47 commits into from

Conversation

keyboardDrummer
Copy link
Member

@keyboardDrummer keyboardDrummer commented Feb 11, 2025

What was changed?

  • Create a command generate-parsed-ast which create a version of the Dafny AST that only contains fields which filled in through source code
  • Create a command generate-deserializer which generates a Deserializer class that takes a IDecoder and returns an instance of the Dafny AST, which is in a post-parsing state.
  • Add nullable annotations to the Dafny AST in some places, together with #nullable enable annotations in this files, and fixes for the nullability warnings in those files.
  • Add ParseConstructorAttribute to some constructors in the Dafny AST
  • Remove some fields from the Dafny AST that always held the same value.
  • Rename certain fields/properties and constructor parameters, so the names match
  • [Optional] Reorder certain constructor parameters, so the parameters that are passed to the super constructor come first. This was needed for a previous version of the code but is no longer needed.

Future work

  • Currently the 'interface definition language' is C# types. We can still switch to any of the existing IDL languages in a separate PR. I think most of the code would remain the same.
  • The serialization format is schemaless, and the code is setup so that the format can be easily changed by providing a different implementation of IDecoder. Currently there's only a text based format, but it would be easy to add a byte based one.

Caveats

  • Given an abstract type like Expression, the generator only visits subtypes that have a constructor annotated with [ParseConstructor], which is intentionally missing for many types right now. Because it is missing for those types, not the entire Dafny AST is generated, which means we can work more incrementally.
  • The generators require that each parameter of a constructor can be mapped to a field or property, and fail otherwise. This PR does not update CI so that the generators are run, so the AST could become out of date after this PR is merged.

How has this been tested?

  • Add a few unit tests in DeserializerTest.cs
  • Add inputFormat.dfy CLI test

By submitting this pull request, I confirm that my contribution is made under the terms of the MIT license.

@keyboardDrummer keyboardDrummer marked this pull request as ready for review February 12, 2025 13:33
@keyboardDrummer keyboardDrummer enabled auto-merge (squash) February 12, 2025 13:34
@keyboardDrummer keyboardDrummer changed the title Enable generating a parsed AST for Dafny, and a deserializer that Enable generating a parsed AST for Dafny, and a deserializer for that Feb 12, 2025
Copy link
Member

@robin-aws robin-aws left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just started a high-level review for now, will return with a more detailed pass.

Silly question, but where does the generated "Parsed AST" live? I see Deserializer/Generated.cs checked in but it seems to produce the existing AST types?

Binary
}

public static readonly Option<InputTypeEnum> InputType = new("--input-format", () => InputTypeEnum.Source) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need this? I think it would be simpler to just determine this from the file extension, as we already do for .dfy vs .doo, and even .toml for project files in some places.

(We may want to bikeshed whether .dbin is the best extension name, but that's a separate discussion :)

Copy link
Member Author

@keyboardDrummer keyboardDrummer Feb 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm using it in combination with --stdin. For the Java-Dafny compiler, so far I decided not to create any intermediate files but stream directly to dafny.

When not using --stdin, indeed the file extension is used to select how to interpret the file contents.

Comment on lines +139 to +144
[AttributeUsage(AttributeTargets.Constructor)]
public class ParseConstructorAttribute : Attribute { }

[AttributeUsage(AttributeTargets.Parameter | AttributeTargets.Field)]
public class BackEdge : Attribute { }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Definitely document these attributes really well. Everyone working with the AST will need to understand when they need to be applied.

Copy link
Member Author

@keyboardDrummer keyboardDrummer Feb 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Everyone working with the AST will need to understand when they need to be applied.

Added documentation.

One thing I have not considered much is how to test the features that this PR is adding. Because currently they're not actively used, and I already have a lot of code that's built up, I'm inclined to review and merge this code even though not enough effective testing mechanism is present. I already have a branch with many more additional changes and I'd prefer not to add those to this PR as well.

Coming back to testing. If we had not only a deserializer (which consumes the actual AST) but also a serializer that relies on the generated parsed AST, then we could add an option --round-trip-serialize and add that to a bunch of our existing tests, to test if the whole serialization mechanic is working well.

Without that feature, I don't see a good way to test either the usefulness of the generated parsed AST or the deserializer code. Right now I can test them by using the serializer on the Java side to generate serialized content, but that's not a scalable approach.

Is it OK with you if I add the serializer on the C# side in a follow-up PR? If it is fast enough, we could consider turning on --round-trip-serialize as a default for a subgroup of our tests.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even for a fully internal format, I think this could be a bit more self-describing and readable, for our own sanity when we have to debug it in the future. Happy for this to be merged as is, but I think even just adding newlines and indentation would be worth it in a follow-up PR.

Can you include the equivalent .dfy source for this as well? Perhaps the test should not only verify it but also --print it and compare to the .dfy equivalent.

Copy link
Member Author

@keyboardDrummer keyboardDrummer Feb 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The textual format is indeed intended to be the debugging format, while I will also add a binary format that's completely unreadable. Adding newlines and indentation is a great idea, although I think it will still be unreasonably time consuming to read the format without using a debugger while it's being deserialized.

Can you include the equivalent .dfy source for this as well? Perhaps the test should not only verify it but also --print it and compare to the .dfy equivalent.

Let's remove having tests with serialized input entirely, and replace that with tests that first serializes some .dfy file and then deserialize it.

Sorry to publish a PR that's not in a 'complete' state. Part of the reason is that I've been aggressively working towards a demo so testing properly was on the back burner.

I'm hesitant about backtracking and pruning/adding things to this PR to make it proper. Maybe after we merge this one I can go over the comments in this PR and make sure they're all addressed in a follow-up?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I'll try to add the serializer on the C# side based on my latest branch, and then try to create a PR that contains only the framework logic and not the AST changes, so we can have a proper PR to merge.

var options = new DafnyOptions(DafnyOptions.Default);
options.Set(CommonOptionBag.InputType, CommonOptionBag.InputTypeEnum.Binary);
var reporter = new BatchErrorReporter(options);
var output = await ProgramParser.Parse(input, new Uri("file://test.java"), reporter);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why .java here? I'm assuming it's a dummy value anyway, but it doesn't feel right to even imply this format would ever live in a .java file.

[Fact]
public async Task Deserialize() {
var input =
"0; 1; 2; 3; 4; 5; 6; 7; 8; 9; 10; 11; \"foo\" 0; 0; true true 1; ClassDecl 0; 1; 2; 3; 4; 5; 6; 7; 8; 9; 10; 11; \"foo\" true 0; 1; Method 0; 1; 2; 3; 4; 5; 6; 7; 8; 9; 10; 11; \"foo\" true true false 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; true 0; 0; 0; 0; 0; 0; 0; true 0; 0; 0; 0; 0; 0; 0; 0; true 0; 0; 0; 0; 0; 0; true 1; AssertStmt 0; 0; 0; 0; 0; 0; true LiteralExpr 0; 0; 0; 0; 0; 0; Boolean false true true true 0; false 0; 0; 0; false 0; ";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're claiming this format is schemaless, which is mostly true, but you're leaving the type of the top level element(s) unspecified, and relying on invoking the right parsing method to read it successfully. I think it would help catch bugs more easily and improve the format's readability a bit to include this type as well, with very little impact on encoding size.

Copy link
Member Author

@keyboardDrummer keyboardDrummer Feb 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're claiming this format is schemaless, which is mostly true,

but you're leaving the type of the top level element(s) unspecified, and relying on invoking the right parsing method to read it successfully.

Not putting statically known types in the format, is keeping the format schemaless right?

I think it would help catch bugs more easily and improve the format's readability a bit to include this type as well, with very little impact on encoding size.

I don't follow what bugs you would catch or how it would help with readability. Reading a schema-less format will be unreasonably time consuming even if you add a top-level type. I only read the format while it is being deserialized, so the call-stack call tell me in the schema it is at. I think we could consider optionally adding types throughout the entire format as a debugging mode on top of already being textual, which I intended as a debugging mode.

using Microsoft.Dafny;
using Type = System.Type;

namespace IntegrationTests;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wrong namespace I think?

Comment on lines +36 to +38
// if (hasErrors) {
// throw new Exception("Exception");
// }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be uncommented?

Comment on lines +11 to +12
root.AddCommand(ParsedAstGenerator.GetCommand());
root.AddCommand(DeserializerGenerator.GetCommand());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Worth adding top-level Makefile targets for these? Especially since then you can provide the correct default arguments for updating the parsed AST.

@keyboardDrummer
Copy link
Member Author

Closing in favor of: #6112

auto-merge was automatically disabled February 18, 2025 16:11

Pull request was closed

@keyboardDrummer keyboardDrummer deleted the jsonParser branch February 18, 2025 16:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants