Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parser Tests Tool #58

Merged
merged 2 commits into from
Nov 29, 2023
Merged

Parser Tests Tool #58

merged 2 commits into from
Nov 29, 2023

Conversation

Hydrocharged
Copy link
Collaborator

@Hydrocharged Hydrocharged commented Nov 28, 2023

This adds a tool that can download synopses from the Postgres documentation page, save those synopses under the testing/generation/command_docs/synopses folder, iterate over each synopsis to create a StatementGenerator that can create all permutations of a query based on the synopsis, check that the queries are valid against a Postgres installation, and then create a test file verifying the level of implementation for the query in DoltgreSQL (whether the query parses or can be fully transformed). The idea here is that we run this once (or whenever we upgrade to a newer Postgres version, like version 16), and then as we continue developing, we'll naturally add support for more statements. This tracks regressions too, so if something used to successfully parse and now it doesn't, then we can see that we've broken something. At least, that's the idea.

The major issue is that the permutation count is massive. The tool even supports variables as defined by each synopsis, and without adding those, I got to over 50 billion permutations for just SELECT before I stopped it. We need to find a way to get some reasonable subset for the massive statements, so we can discuss that later.

Regarding this PR though, the tool is mostly complete. There are a few "issues"

  • Doesn't currently check with an actual Postgres installation yet. This is easy to add, but I didn't bother since we need to find a way to get the total number of tests down first.
  • Doesn't actually do anything with variables yet. Permutation counts exhibit exponential behavior, so I wouldn't be surprised if the final numbers are in the quintillions.
  • Downloaded synopses require manual fiddling. This is mostly due to the Postgres documentation being inconsistent with whitespace, and writing a tool to deal with these inconsistencies was too much. Not all files require fiddling (variable substitution is handled by the tool), and it's not a lot of fiddling either, but it's there.

Besides those things, this is fully complete. The single example file in testing/generation/command_docs/output/abort_test.go works from start to finish (downloading synopsis, generating queries, creating test file), so you can see what it'll look like once we figure out how to get a subset.

Regarding how to review this, I'd mainly say to just like at that test file. You can also check out some of the synopses to see how some of them differ from what's on the website. If you really want to, you can look at the parser and generator, but it's not required.

@Hydrocharged Hydrocharged requested a review from zachmu November 28, 2023 14:48
Copy link
Member

@zachmu zachmu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't read this all that closely for obvious reasons, but the basic idea seems fine.

For the actual test generation, I'm less concerned about what parses or doesn't parse -- that doesn't really tell us that much, we expect the parser is probably mostly right. Conversion tests are more valuable, but without a correctness testing it's just a smoke test. Generating a few million of these kinds of statements via randomly sampling from the grammar space is great and worth doing, but we can squeeze more useful info out of what you've built.

What I really want to know is whether sequences of generated statements produce the same results as they do on postgres. Where you should take this next is looking into generating additional sqllogictest scripts that exercise more of the grammar. Then we can just run those things against Postgres to get a baseline and compare the results to Doltgres. This requires some semantic knowledge of which kinds of statements make sense in a sequence, as well as intelligently reusing identifiers from earlier in a sequence. Think of it as a layer on top of the statement generator you have here.

testing/generation/command_docs/create_tests.go Outdated Show resolved Hide resolved
@Hydrocharged Hydrocharged merged commit 85686ac into main Nov 29, 2023
6 checks passed
@Hydrocharged Hydrocharged deleted the daylon/all-commands branch November 29, 2023 07:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants