-
-
Notifications
You must be signed in to change notification settings - Fork 11
Add parameters and volatility to Function #104
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
enobayram
wants to merge
16
commits into
DanCardin:main
Choose a base branch
from
enobayram:function-parameters-and-volatility
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
Show all changes
16 commits
Select commit
Hold shift + click to select a range
48da897
Add parameters and volatility to functions
enobayram cac69ca
Implement parameters for MySQL as well
enobayram 38b4447
Make functions with params work with alembic
enobayram 84b2146
Add alembic tests with MySQL parametric function
enobayram 87b93f4
Add test steps for more complex functions
enobayram 1bd356b
Test down migrations too
enobayram c924e14
Fix idempotency of the migration generation
enobayram f83bf95
Fix typo
enobayram dd02e62
Add more complex function example for mysql
enobayram f42aa78
Update documentation for functions
enobayram 7f6fe9a
Add outer strip to function return type normalization
enobayram 9d9e5c7
Rename `result_string` schema query column to `return_type_string`
enobayram b939839
Delete the duplicate test file `tests/function/test_create.py`
enobayram 12fe2d2
Add RETURNS TABLE(...) example to function/test_create_postgresql.sql
enobayram 5cd7e40
Replace volatility_map with a classmethod on FunctionVolatility
enobayram 0dfc88d
Merge branch 'main' into function-parameters-and-volatility
enobayram File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i wonder if this should be some
list[Parameter] | None
rather than interpreting it from a string.that way the code downstream could be operating on
p.type
andp.name
rather than doing string stuff.there could even be a
function.with_parameters(Param(a, b), ...)
and/orfunction.with_parameters_from_string('a b', ...)
that still make it moderately convenient to define, while simplifying the impl code somewhat.With that said, i think this could be retroactively adapted to with some
normalize
code and some explicit asserts/casts to appease the type checker. so if you're happy with the design now, i'm also happyThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is a great suggestion. When I first designed it to be a
List[str]
, I had the most general definition of a function parameter in mind:I thought it's far too complicated to represent with a structured type. But looking back at the implementation in this PR, we are already making the assumption that the individual parameters need to be
[argname] argtype
, so the least we can do is to make that explicit by expecting a structuredParameter
type with thetype
andname
fields.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i hadn't looked into the full param def, but definitely all these optional bits seem ideal for a structured type, and something like
with_parameters_from_string
would basically perform double duty for supporting the syntax coming out of the pg_* table anyways, while also allowing the natural syntax for a user.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've been thinking hard trying to find a perspective to base this decision and the more I think about it, the more I realize that this is a non-trivial decision. There's a subtle trade-off here between convenience and flexibility:
CREATE FUNCTION
definition.CREATE FUNCTION
definitions using various Postgres features we can support correctly (i.e. creations, drops and replacements all work correctly and idempotently in both directions)It would be nice to be able to offer an interface such that a user can always grab what Postgres reports with pg_get_function_arguments and use it in a
Function
definition and know that it will always work correctly, now and in the future versions of this library. Regardless of how funky the function definition is (containing an argument, whose argname and argtype have quoted,
s andDEFAULT
s in their names, or maybe a complexRECORD
type or a weird default value etc. As long as the user fiddles with it and aligns it with what Postgres normalizes to, it works.But it would also be nice not to have to worry about whitespace, or
integer
vsINTEGER
etc. when you only need very simpleargname argtype
parameters. And again, we would like these simple cases to keep working over time as well.Anyway, my point is that as I think from both of these directions I realize that there's a friction between the two, but it's too late here and it's hard for me to articulate now, so I'll try to take another stab at it tomorrow.
Thanks again for the feedback!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so long as the strategy is using
pg_get_function_arguments
rather than using the constituent argument components on the pg_proc table, then the ability to support user strings will be as good as the library's ability to interpret the resultant string from postgres itself.i guess that's what you mean about whitespace and type normalization handling, but those seem straightforward to improve support for progressively. but i'm not super concerned about even supporting the full breadth of options in an initial impl.
i can't tell from your response whether you're deliberating about whether to define some
Arg[ument]
class or not, but i think doing so is mainly super useful because it keeps the comparison code super simple, a simple equality check, and doesn't require a very specific output normalization. (PlusArg('foo', type='int', default=4)
reads a lot better to me)parsing will need to be flexible with quoting and whatnot (like, in theory, it really doesn't need to start as a fully fleshed out parser)
but rendering to string can simply quote all names, for example, and not need match the canonical postgres repr.
i think the main consequential interface question is whether the basic type accepts
list[Arg]
orlist[Arg | str]
. it's maybe simpler dx, but objectively worse typing for the rendering code, versus a separate method for it. i personally prefer builder methods to handle stuff like parsing alternate formats, but we also accept roles a simple strings, so it's not like it's unprecedented here to have the simple interface be at the cost of typing somewhat.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that's the crux of the issue. Without a fully fleshed out parser (at least a significantly fleshed out one), I don't think you can even reliably turn the
pg_get_function_arguments
output into alist[str]
, let alonelist[Arg]
. So if we decide that the normalized form of the arguments list islist[Arg]
, then we're declaring that we only support functions whose argument lists we can parse.I think there's actually a third option:
list[Arg] | str
, i.e. we make a best effort at parsingpg_get_function_arguments
into alist[Arg]
to the extent that ourArg
type covers the fields of a valid Postgres function argument and also to the extend that our parser can reliably parse thepg_get_function_arguments
output into it. And we write the parser conservatively, so if it encounters anything it doesn't understand it just bails out and uses thepg_get_function_arguments
output verbatim.The advantage of this approach would be that for simple signatures that our parser recognizes, we can support the convenience of normalization, but if the user needs a more complex argument list, they just pass in a
raw_arguments
string, that they are responsible for making sure that Postgres normalizes that to itself (i.e. returns it back frompg_get_function_arguments
unchanged).BTW, my argument so far only covers the problem of identifying whether the declared function argument list matches the arguments of the existing function in the database. But we need to consider the following as well:
Unlike MySQL, Postgres allows function overloading. Without support for function arguments
sqlalchemy-declarative-extensions
was able to get away with identifying existing functions through their names alone, but with function arguments, we now need to use the pg_get_function_identity_arguments to match existing functions to declared. Unfortunately we need to handle this as soon as we support arguments, because aCREATE OR REPLACE myfunc
, will nor replace amyfunc
with a differentpg_get_function_identity_arguments
.If we allow arbitrary strings via a "raw_arguments" input, how do we construct the
DROP FUNCTION
statement. Unlike MySQL, Postgres requires the argument types in theDROP FUNCTION
statement. When there are noDEFAULT
values, the full arguments string gets accepted byDROP FUNCTION
, but if there are defaults, they need to be removed, which is the same aspg_get_function_identity_arguments
. So if the user defines theFunction
with araw_arguments
, we could expect anidentity_arguments
as well. We could also use theraw_arguments
as a default value foridentity_arguments
.My Suggested Solution
raw_arguments
andidentity_arguments
tobase.Function
function.compare.compare_functions
, we compare the functions based on(name, identity_arguments)
instead of justname
.identity_arguments
is always''
, it doesn't support function overloads anyway and itsDROP FUNCTION
statement doesn't require an argument list.parse_arguments(str) -> List[Arg] | None
forpostgresql.Function
andmysql.Function
..with_arguments(List[Arg])
helpers forpostgresql.Function
andmysql.Function
,.with_arguments
renders theList[Arg]
to a string and then parses it back usingparse_arguments
.List[Arg]
, then we throw an error that tells the user that we don't support their arguments through.with_arguments
yet and that they should instead pass in araw_arguments
that they make sure is fully-normalized.List[Arg]
comes back unchanged from our parser, we normalize theList[Arg]
and then render theraw_arguments
and theidentity_arguments
of theFunction
from the normalizedList[Arg]
The end result of this is that
raw_arguments
andidentity_arguments
themselves..with_arguments
, where normalization andidentity_arguments
is handled by us.Function
will be future-proof, because user-suppliedraw_arguments
are already normalized, and.with_arguments
is conservative in what it accepts, so a user can't shoehorn a weirdArg
that will start failing when we extend theArg
type and theparse_arguments
function in the future.What do you think? I can pivot this PR to implement this suggestion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
certainly, although i might suggest some
@property
namedidentity
that returns a unique string rather than just propagating the tuple directly everywhere it's keyed.this is where you start to lose me. particularly with
Imo ideally the canonical representation should be the normalized Arg one, and converting it to a string for comparison very much feels like a smell.
perhaps if the parser cannot parse the incoming string, then you could fall back to a raw repr that's required to be defined identically to postgres' normalized format in order to compare equal.
but i think the rendering code should otherwise not be expected to return the normalized format, but rather one that's guaranteed to parse (e.x. always quote)
I think rather than that, i'd put these fields on a
Arg
class directly. e.g.my goal being that:
I'm imagining
and i (if i'm not missing some of your concerns) feel like this ought enable avoiding rendering for the purposes of comparison.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm sorry for the delay here, I start every day intending to make the changes, but then get caught up in other stuff.
I want to mention that
list[Arg | RawArg]
may not work (and why I suggestlist[Arg] | RawArgs
), because with all the quoting and various complexities in the definitions of the arguments (like complex parametric, nested types, quoted names, default string values with escaped quotes etc.), it's a non-trivial assumption to make that we can always split a givenpg_get_function_arguments
orpg_get_function_identity_arguments
correctly to a list of arguments (raw or not). That's why I suggestRawArgs
as the most conservative fallback whenever we encounter a trickypg_get_function_arguments
result, instead of parsing it as aList[Arg|RawArg]
haphazardly.Here's the problem with this: What if the user passes
Function(...).with_arguments(Arg('a', 'int, a2 str'), RawArg('b varchar'), 'c int')
orFunction(...).with_arguments(Arg('a', "str = ', \','"), RawArg('b varchar'), 'c int')
. The former is obviously an error on the user's side, but the latter is almost valid. In any case, it's possible for something weird like this to work out of pure luck if the way we compare it to thepg_get_function_arguments
andpg_get_function_identity_arguments
happens to work. My problem with this is that when we improve our parsers in the future, what happened to work with the simpler parser, may stop working with the improved parser.That's the reason why I suggested rendering the
List[Arg]
into a string and parsing back with our (incomplete and conservative) parser. If the providedList[Arg]
passes through render->parse without changing, then we know there's nothing tricky in it, and that if it works today, it will always work.If it doesn't pass through unchanged, then we act conservatively and reject it entirely, telling the user to pass in the whole arguments string (not even a
List[RawArg]
, but aRawArgs
). When they pass in aRawArgs
, then the responsibility lies with them to make sure that the string matches exactly the canonical form that Postgres reduces it to. This way, once the user comes up with aRawArgs
that works, we know that it will work in the future versions too.Again, the main intention is to implement a solution that accepts structured input for simple cases, while allowing arbitrary
pg_get_function_arguments
for complex cases, all the while guaranteeing that what works now will keep working as we improve the parser as needed.All of that said , we could also just skip the whole
RawArgs
interface, and simply not provide any way to express functions with arguments that our parser doesn't support. In that case I would still suggest passing the givenList[Arg]
through render->parse and assert that it doesn't change so that you can't manage to sneak in something weird that will stop working with future parsers.My personal top priority is to make sure we can't break (seemingly) working code by improving the parser. The second priority is to allow arbitrary function definitions if the user is willing to do the hard work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I dont know whether you think this meaningfully changes the calculus or not, but it seems like you could directly use the proargnames, proargmodes, types, and for defaults use
pg_get_expr(proargdefaults,'pg_proc'::regclass)
to obtain the canonical default expression.it seems to me that they seems to have columns for all the things i'd expect to be relevant, and a way to turn the main problem child in terms of parsing (default) into an expression separately such that we dont actually need a parser as such