Parser Tests Tool #58

Hydrocharged · 2023-11-28T14:48:57Z

This adds a tool that can download synopses from the Postgres documentation page, save those synopses under the testing/generation/command_docs/synopses folder, iterate over each synopsis to create a StatementGenerator that can create all permutations of a query based on the synopsis, check that the queries are valid against a Postgres installation, and then create a test file verifying the level of implementation for the query in DoltgreSQL (whether the query parses or can be fully transformed). The idea here is that we run this once (or whenever we upgrade to a newer Postgres version, like version 16), and then as we continue developing, we'll naturally add support for more statements. This tracks regressions too, so if something used to successfully parse and now it doesn't, then we can see that we've broken something. At least, that's the idea.

The major issue is that the permutation count is massive. The tool even supports variables as defined by each synopsis, and without adding those, I got to over 50 billion permutations for just SELECT before I stopped it. We need to find a way to get some reasonable subset for the massive statements, so we can discuss that later.

Regarding this PR though, the tool is mostly complete. There are a few "issues"

Doesn't currently check with an actual Postgres installation yet. This is easy to add, but I didn't bother since we need to find a way to get the total number of tests down first.
Doesn't actually do anything with variables yet. Permutation counts exhibit exponential behavior, so I wouldn't be surprised if the final numbers are in the quintillions.
Downloaded synopses require manual fiddling. This is mostly due to the Postgres documentation being inconsistent with whitespace, and writing a tool to deal with these inconsistencies was too much. Not all files require fiddling (variable substitution is handled by the tool), and it's not a lot of fiddling either, but it's there.

Besides those things, this is fully complete. The single example file in testing/generation/command_docs/output/abort_test.go works from start to finish (downloading synopsis, generating queries, creating test file), so you can see what it'll look like once we figure out how to get a subset.

Regarding how to review this, I'd mainly say to just like at that test file. You can also check out some of the synopses to see how some of them differ from what's on the website. If you really want to, you can look at the parser and generator, but it's not required.

zachmu

I didn't read this all that closely for obvious reasons, but the basic idea seems fine.

For the actual test generation, I'm less concerned about what parses or doesn't parse -- that doesn't really tell us that much, we expect the parser is probably mostly right. Conversion tests are more valuable, but without a correctness testing it's just a smoke test. Generating a few million of these kinds of statements via randomly sampling from the grammar space is great and worth doing, but we can squeeze more useful info out of what you've built.

What I really want to know is whether sequences of generated statements produce the same results as they do on postgres. Where you should take this next is looking into generating additional sqllogictest scripts that exercise more of the grammar. Then we can just run those things against Postgres to get a baseline and compare the results to Doltgres. This requires some semantic knowledge of which kinds of statements make sense in a sequence, as well as intelligently reusing identifiers from earlier in a sequence. Think of it as a layer on top of the statement generator you have here.

testing/generation/command_docs/create_tests.go

…rate tests

Hydrocharged requested a review from zachmu November 28, 2023 14:48

zachmu approved these changes Nov 28, 2023

View reviewed changes

testing/generation/command_docs/create_tests.go Outdated Show resolved Hide resolved

Hydrocharged added 2 commits November 28, 2023 22:59

Added a tool to download command docs, parse their synopses, and gene…

52cfaf5

…rate tests

PR Feedback & Improvements

9e82020

Hydrocharged force-pushed the daylon/all-commands branch from 024f891 to 9e82020 Compare November 29, 2023 07:03

Hydrocharged merged commit 85686ac into main Nov 29, 2023
6 checks passed

Hydrocharged deleted the daylon/all-commands branch November 29, 2023 07:10

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Parser Tests Tool #58

Parser Tests Tool #58

Hydrocharged commented Nov 28, 2023 •

edited

Loading

zachmu left a comment

Parser Tests Tool #58

Parser Tests Tool #58

Conversation

Hydrocharged commented Nov 28, 2023 • edited Loading

zachmu left a comment

Choose a reason for hiding this comment

Hydrocharged commented Nov 28, 2023 •

edited

Loading