-
-
Notifications
You must be signed in to change notification settings - Fork 11
Add parameters and volatility to Function #104
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Add parameters and volatility to Function #104
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
honestly this looks really good! I left a couple of comments that I'd consider optional, so lmk if you plan to address them or not and i'll either await the update or merge as-is
@@ -24,19 +55,32 @@ class Function(base.Function): | |||
|
|||
security: FunctionSecurity = FunctionSecurity.invoker | |||
|
|||
#: Defines the parameters for the function, e.g. ["param1 int", "param2 varchar"] | |||
parameters: Optional[List[str]] = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i wonder if this should be some list[Parameter] | None
rather than interpreting it from a string.
that way the code downstream could be operating on p.type
and p.name
rather than doing string stuff.
there could even be a function.with_parameters(Param(a, b), ...)
and/or function.with_parameters_from_string('a b', ...)
that still make it moderately convenient to define, while simplifying the impl code somewhat.
With that said, i think this could be retroactively adapted to with some normalize
code and some explicit asserts/casts to appease the type checker. so if you're happy with the design now, i'm also happy
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is a great suggestion. When I first designed it to be a List[str]
, I had the most general definition of a function parameter in mind:
[ argmode ] [ argname ] argtype [ { DEFAULT | = } default_expr
I thought it's far too complicated to represent with a structured type. But looking back at the implementation in this PR, we are already making the assumption that the individual parameters need to be [argname] argtype
, so the least we can do is to make that explicit by expecting a structured Parameter
type with the type
and name
fields.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i hadn't looked into the full param def, but definitely all these optional bits seem ideal for a structured type, and something like with_parameters_from_string
would basically perform double duty for supporting the syntax coming out of the pg_* table anyways, while also allowing the natural syntax for a user.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've been thinking hard trying to find a perspective to base this decision and the more I think about it, the more I realize that this is a non-trivial decision. There's a subtle trade-off here between convenience and flexibility:
- Convenience, as in the convenience of not having to worry about how Postgres normalizes a given
CREATE FUNCTION
definition. - Flexibility, as in the range of
CREATE FUNCTION
definitions using various Postgres features we can support correctly (i.e. creations, drops and replacements all work correctly and idempotently in both directions)
It would be nice to be able to offer an interface such that a user can always grab what Postgres reports with pg_get_function_arguments and use it in a Function
definition and know that it will always work correctly, now and in the future versions of this library. Regardless of how funky the function definition is (containing an argument, whose argname and argtype have quoted ,
s and DEFAULT
s in their names, or maybe a complex RECORD
type or a weird default value etc. As long as the user fiddles with it and aligns it with what Postgres normalizes to, it works.
But it would also be nice not to have to worry about whitespace, or integer
vs INTEGER
etc. when you only need very simple argname argtype
parameters. And again, we would like these simple cases to keep working over time as well.
Anyway, my point is that as I think from both of these directions I realize that there's a friction between the two, but it's too late here and it's hard for me to articulate now, so I'll try to take another stab at it tomorrow.
Thanks again for the feedback!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so long as the strategy is using pg_get_function_arguments
rather than using the constituent argument components on the pg_proc table, then the ability to support user strings will be as good as the library's ability to interpret the resultant string from postgres itself.
i guess that's what you mean about whitespace and type normalization handling, but those seem straightforward to improve support for progressively. but i'm not super concerned about even supporting the full breadth of options in an initial impl.
i can't tell from your response whether you're deliberating about whether to define some Arg[ument]
class or not, but i think doing so is mainly super useful because it keeps the comparison code super simple, a simple equality check, and doesn't require a very specific output normalization. (Plus Arg('foo', type='int', default=4)
reads a lot better to me)
parsing will need to be flexible with quoting and whatnot (like, in theory, it really doesn't need to start as a fully fleshed out parser)
but rendering to string can simply quote all names, for example, and not need match the canonical postgres repr.
i think the main consequential interface question is whether the basic type accepts list[Arg]
or list[Arg | str]
. it's maybe simpler dx, but objectively worse typing for the rendering code, versus a separate method for it. i personally prefer builder methods to handle stuff like parsing alternate formats, but we also accept roles a simple strings, so it's not like it's unprecedented here to have the simple interface be at the cost of typing somewhat.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
parsing will need to be flexible with quoting and whatnot (like, in theory, it really doesn't need to start as a fully fleshed out parser)
I think that's the crux of the issue. Without a fully fleshed out parser (at least a significantly fleshed out one), I don't think you can even reliably turn the pg_get_function_arguments
output into a list[str]
, let alone list[Arg]
. So if we decide that the normalized form of the arguments list is list[Arg]
, then we're declaring that we only support functions whose argument lists we can parse.
i think the main consequential interface question is whether the basic type accepts
list[Arg]
orlist[Arg | str]
I think there's actually a third option: list[Arg] | str
, i.e. we make a best effort at parsing pg_get_function_arguments
into a list[Arg]
to the extent that our Arg
type covers the fields of a valid Postgres function argument and also to the extend that our parser can reliably parse the pg_get_function_arguments
output into it. And we write the parser conservatively, so if it encounters anything it doesn't understand it just bails out and uses the pg_get_function_arguments
output verbatim.
The advantage of this approach would be that for simple signatures that our parser recognizes, we can support the convenience of normalization, but if the user needs a more complex argument list, they just pass in a raw_arguments
string, that they are responsible for making sure that Postgres normalizes that to itself (i.e. returns it back from pg_get_function_arguments
unchanged).
BTW, my argument so far only covers the problem of identifying whether the declared function argument list matches the arguments of the existing function in the database. But we need to consider the following as well:
-
Unlike MySQL, Postgres allows function overloading. Without support for function arguments
sqlalchemy-declarative-extensions
was able to get away with identifying existing functions through their names alone, but with function arguments, we now need to use the pg_get_function_identity_arguments to match existing functions to declared. Unfortunately we need to handle this as soon as we support arguments, because aCREATE OR REPLACE myfunc
, will nor replace amyfunc
with a differentpg_get_function_identity_arguments
. -
If we allow arbitrary strings via a "raw_arguments" input, how do we construct the
DROP FUNCTION
statement. Unlike MySQL, Postgres requires the argument types in theDROP FUNCTION
statement. When there are noDEFAULT
values, the full arguments string gets accepted byDROP FUNCTION
, but if there are defaults, they need to be removed, which is the same aspg_get_function_identity_arguments
. So if the user defines theFunction
with araw_arguments
, we could expect anidentity_arguments
as well. We could also use theraw_arguments
as a default value foridentity_arguments
.
My Suggested Solution
- We add
raw_arguments
andidentity_arguments
tobase.Function
- In
function.compare.compare_functions
, we compare the functions based on(name, identity_arguments)
instead of justname
. - For MySQL,
identity_arguments
is always''
, it doesn't support function overloads anyway and itsDROP FUNCTION
statement doesn't require an argument list. - We implement
parse_arguments(str) -> List[Arg] | None
forpostgresql.Function
andmysql.Function
. - We implement
.with_arguments(List[Arg])
helpers forpostgresql.Function
andmysql.Function
,.with_arguments
renders theList[Arg]
to a string and then parses it back usingparse_arguments
.- If it can't parse it back, or parses it back to a different
List[Arg]
, then we throw an error that tells the user that we don't support their arguments through.with_arguments
yet and that they should instead pass in araw_arguments
that they make sure is fully-normalized. - If their
List[Arg]
comes back unchanged from our parser, we normalize theList[Arg]
and then render theraw_arguments
and theidentity_arguments
of theFunction
from the normalizedList[Arg]
- If it can't parse it back, or parses it back to a different
The end result of this is that
- We support the full range of Postgres/MySQL function arguments if the user is willing to pass in a fully normalized
raw_arguments
andidentity_arguments
themselves. - We also support a convenience method through
.with_arguments
, where normalization andidentity_arguments
is handled by us. - Both of these ways of creating a
Function
will be future-proof, because user-suppliedraw_arguments
are already normalized, and.with_arguments
is conservative in what it accepts, so a user can't shoehorn a weirdArg
that will start failing when we extend theArg
type and theparse_arguments
function in the future.
What do you think? I can pivot this PR to implement this suggestion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-
certainly, although i might suggest some
@property
namedidentity
that returns a unique string rather than just propagating the tuple directly everywhere it's keyed. -
this is where you start to lose me. particularly with
.with_arguments renders the List[Arg] to a string and then parses it back using parse_arguments.
Imo ideally the canonical representation should be the normalized Arg one, and converting it to a string for comparison very much feels like a smell.
perhaps if the parser cannot parse the incoming string, then you could fall back to a raw repr that's required to be defined identically to postgres' normalized format in order to compare equal.
but i think the rendering code should otherwise not be expected to return the normalized format, but rather one that's guaranteed to parse (e.x. always quote)
-
I think rather than that, i'd put these fields on a
Arg
class directly. e.g.class Function: arguments: list[Arg | RawArg] # and maybe str, so long as it's normalized to Arg @dataclass class Arg: ... all the parsable fields def from_string/parse(cls, repr: str) -> Arg | RawArg: ... @dataclass class RawArg: repr: str identity: str
my goal being that:
- the natural comparison between two instances of a Function will compare equal or not
- there's a structure (Arg/RawArg) that can know how to parse, and separately how to render
- we're not rendering nice python arg representations into strings as the "normal" format for comparison.
I'm imagining
Function(...).with_arguments(Arg('a', 'int'), RawArg('b varchar'), 'c int')
# if it gets a string, it attempts Arg.from_string/parse and if it can parse the arg, it returns an Arg, if it can't it returns a RawArg
and i (if i'm not missing some of your concerns) feel like this ought enable avoiding rendering for the purposes of comparison.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm sorry for the delay here, I start every day intending to make the changes, but then get caught up in other stuff.
I want to mention that list[Arg | RawArg]
may not work (and why I suggest list[Arg] | RawArgs
), because with all the quoting and various complexities in the definitions of the arguments (like complex parametric, nested types, quoted names, default string values with escaped quotes etc.), it's a non-trivial assumption to make that we can always split a given pg_get_function_arguments
or pg_get_function_identity_arguments
correctly to a list of arguments (raw or not). That's why I suggest RawArgs
as the most conservative fallback whenever we encounter a tricky pg_get_function_arguments
result, instead of parsing it as a List[Arg|RawArg]
haphazardly.
I'm imagining
Function(...).with_arguments(Arg('a', 'int'), RawArg('b varchar'), 'c int')
Here's the problem with this: What if the user passes Function(...).with_arguments(Arg('a', 'int, a2 str'), RawArg('b varchar'), 'c int')
or Function(...).with_arguments(Arg('a', "str = ', \','"), RawArg('b varchar'), 'c int')
. The former is obviously an error on the user's side, but the latter is almost valid. In any case, it's possible for something weird like this to work out of pure luck if the way we compare it to the pg_get_function_arguments
and pg_get_function_identity_arguments
happens to work. My problem with this is that when we improve our parsers in the future, what happened to work with the simpler parser, may stop working with the improved parser.
That's the reason why I suggested rendering the List[Arg]
into a string and parsing back with our (incomplete and conservative) parser. If the provided List[Arg]
passes through render->parse without changing, then we know there's nothing tricky in it, and that if it works today, it will always work.
If it doesn't pass through unchanged, then we act conservatively and reject it entirely, telling the user to pass in the whole arguments string (not even a List[RawArg]
, but a RawArgs
). When they pass in a RawArgs
, then the responsibility lies with them to make sure that the string matches exactly the canonical form that Postgres reduces it to. This way, once the user comes up with a RawArgs
that works, we know that it will work in the future versions too.
Again, the main intention is to implement a solution that accepts structured input for simple cases, while allowing arbitrary pg_get_function_arguments
for complex cases, all the while guaranteeing that what works now will keep working as we improve the parser as needed.
All of that said , we could also just skip the whole RawArgs
interface, and simply not provide any way to express functions with arguments that our parser doesn't support. In that case I would still suggest passing the given List[Arg]
through render->parse and assert that it doesn't change so that you can't manage to sneak in something weird that will stop working with future parsers.
My personal top priority is to make sure we can't break (seemingly) working code by improving the parser. The second priority is to allow arbitrary function definitions if the user is willing to do the hard work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I dont know whether you think this meaningfully changes the calculus or not, but it seems like you could directly use the proargnames, proargmodes, types, and for defaults use pg_get_expr(proargdefaults,'pg_proc'::regclass)
to obtain the canonical default expression.
it seems to me that they seems to have columns for all the things i'd expect to be relevant, and a way to turn the main problem child in terms of parsing (default) into an expression separately such that we dont actually need a parser as such
src/sqlalchemy_declarative_extensions/dialects/postgresql/query.py
Outdated
Show resolved
Hide resolved
JK i just rebased the PR version, but just something to be aware of if/when you develop on top of this. |
(Currently for Postgres only)
952377d
to
12fe2d2
Compare
@DanCardin Thank you very much for your comments! And also for resolving the issue with the CI. I've made sure that the tests pass locally, let's hope they pass in CI too! |
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #104 +/- ##
==========================================
- Coverage 89.87% 89.55% -0.33%
==========================================
Files 85 85
Lines 3811 3895 +84
Branches 778 802 +24
==========================================
+ Hits 3425 3488 +63
- Misses 314 331 +17
- Partials 72 76 +4 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Resolves #103
This PR adds support for declaring
Function
s withparameters
for both the MySQL and the PostgreSQL dialects. The PostgreSQL dialect ofFunction
s now also support a newvolatility
option, which can be one ofFunctionVolatility.{VOLATILE,STABLE,IMMUTABLE}
.The PR also adds test cases for a variety of
Function
definitions that cover the new features.The PR also updates the ReadTheDocs documentation.