Skip to content

Add parameters and volatility to Function #104

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 16 commits into
base: main
Choose a base branch
from

Conversation

enobayram
Copy link

Resolves #103

This PR adds support for declaring Functions with parameters for both the MySQL and the PostgreSQL dialects. The PostgreSQL dialect of Functions now also support a new volatility option, which can be one of FunctionVolatility.{VOLATILE,STABLE,IMMUTABLE}.

The PR also adds test cases for a variety of Function definitions that cover the new features.

The PR also updates the ReadTheDocs documentation.

Copy link
Owner

@DanCardin DanCardin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

honestly this looks really good! I left a couple of comments that I'd consider optional, so lmk if you plan to address them or not and i'll either await the update or merge as-is

@@ -24,19 +55,32 @@ class Function(base.Function):

security: FunctionSecurity = FunctionSecurity.invoker

#: Defines the parameters for the function, e.g. ["param1 int", "param2 varchar"]
parameters: Optional[List[str]] = None
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i wonder if this should be some list[Parameter] | None rather than interpreting it from a string.

that way the code downstream could be operating on p.type and p.name rather than doing string stuff.

there could even be a function.with_parameters(Param(a, b), ...) and/or function.with_parameters_from_string('a b', ...) that still make it moderately convenient to define, while simplifying the impl code somewhat.


With that said, i think this could be retroactively adapted to with some normalize code and some explicit asserts/casts to appease the type checker. so if you're happy with the design now, i'm also happy

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a great suggestion. When I first designed it to be a List[str], I had the most general definition of a function parameter in mind:

[ argmode ] [ argname ] argtype [ { DEFAULT | = } default_expr

I thought it's far too complicated to represent with a structured type. But looking back at the implementation in this PR, we are already making the assumption that the individual parameters need to be [argname] argtype, so the least we can do is to make that explicit by expecting a structured Parameter type with the type and name fields.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i hadn't looked into the full param def, but definitely all these optional bits seem ideal for a structured type, and something like with_parameters_from_string would basically perform double duty for supporting the syntax coming out of the pg_* table anyways, while also allowing the natural syntax for a user.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've been thinking hard trying to find a perspective to base this decision and the more I think about it, the more I realize that this is a non-trivial decision. There's a subtle trade-off here between convenience and flexibility:

  • Convenience, as in the convenience of not having to worry about how Postgres normalizes a given CREATE FUNCTION definition.
  • Flexibility, as in the range of CREATE FUNCTION definitions using various Postgres features we can support correctly (i.e. creations, drops and replacements all work correctly and idempotently in both directions)

It would be nice to be able to offer an interface such that a user can always grab what Postgres reports with pg_get_function_arguments and use it in a Function definition and know that it will always work correctly, now and in the future versions of this library. Regardless of how funky the function definition is (containing an argument, whose argname and argtype have quoted ,s and DEFAULTs in their names, or maybe a complex RECORD type or a weird default value etc. As long as the user fiddles with it and aligns it with what Postgres normalizes to, it works.

But it would also be nice not to have to worry about whitespace, or integer vs INTEGER etc. when you only need very simple argname argtype parameters. And again, we would like these simple cases to keep working over time as well.

Anyway, my point is that as I think from both of these directions I realize that there's a friction between the two, but it's too late here and it's hard for me to articulate now, so I'll try to take another stab at it tomorrow.

Thanks again for the feedback!

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so long as the strategy is using pg_get_function_arguments rather than using the constituent argument components on the pg_proc table, then the ability to support user strings will be as good as the library's ability to interpret the resultant string from postgres itself.

i guess that's what you mean about whitespace and type normalization handling, but those seem straightforward to improve support for progressively. but i'm not super concerned about even supporting the full breadth of options in an initial impl.

i can't tell from your response whether you're deliberating about whether to define some Arg[ument] class or not, but i think doing so is mainly super useful because it keeps the comparison code super simple, a simple equality check, and doesn't require a very specific output normalization. (Plus Arg('foo', type='int', default=4) reads a lot better to me)

parsing will need to be flexible with quoting and whatnot (like, in theory, it really doesn't need to start as a fully fleshed out parser)

but rendering to string can simply quote all names, for example, and not need match the canonical postgres repr.


i think the main consequential interface question is whether the basic type accepts list[Arg] or list[Arg | str]. it's maybe simpler dx, but objectively worse typing for the rendering code, versus a separate method for it. i personally prefer builder methods to handle stuff like parsing alternate formats, but we also accept roles a simple strings, so it's not like it's unprecedented here to have the simple interface be at the cost of typing somewhat.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

parsing will need to be flexible with quoting and whatnot (like, in theory, it really doesn't need to start as a fully fleshed out parser)

I think that's the crux of the issue. Without a fully fleshed out parser (at least a significantly fleshed out one), I don't think you can even reliably turn the pg_get_function_arguments output into a list[str], let alone list[Arg]. So if we decide that the normalized form of the arguments list is list[Arg], then we're declaring that we only support functions whose argument lists we can parse.

i think the main consequential interface question is whether the basic type accepts list[Arg] or list[Arg | str]

I think there's actually a third option: list[Arg] | str, i.e. we make a best effort at parsing pg_get_function_arguments into a list[Arg] to the extent that our Arg type covers the fields of a valid Postgres function argument and also to the extend that our parser can reliably parse the pg_get_function_arguments output into it. And we write the parser conservatively, so if it encounters anything it doesn't understand it just bails out and uses the pg_get_function_arguments output verbatim.

The advantage of this approach would be that for simple signatures that our parser recognizes, we can support the convenience of normalization, but if the user needs a more complex argument list, they just pass in a raw_arguments string, that they are responsible for making sure that Postgres normalizes that to itself (i.e. returns it back from pg_get_function_arguments unchanged).

BTW, my argument so far only covers the problem of identifying whether the declared function argument list matches the arguments of the existing function in the database. But we need to consider the following as well:

  1. Unlike MySQL, Postgres allows function overloading. Without support for function arguments sqlalchemy-declarative-extensions was able to get away with identifying existing functions through their names alone, but with function arguments, we now need to use the pg_get_function_identity_arguments to match existing functions to declared. Unfortunately we need to handle this as soon as we support arguments, because a CREATE OR REPLACE myfunc, will nor replace a myfunc with a different pg_get_function_identity_arguments.

  2. If we allow arbitrary strings via a "raw_arguments" input, how do we construct the DROP FUNCTION statement. Unlike MySQL, Postgres requires the argument types in the DROP FUNCTION statement. When there are no DEFAULT values, the full arguments string gets accepted by DROP FUNCTION, but if there are defaults, they need to be removed, which is the same as pg_get_function_identity_arguments. So if the user defines the Function with a raw_arguments, we could expect an identity_arguments as well. We could also use the raw_arguments as a default value for identity_arguments.

My Suggested Solution

  1. We add raw_arguments and identity_arguments to base.Function
  2. In function.compare.compare_functions, we compare the functions based on (name, identity_arguments) instead of just name.
  3. For MySQL, identity_arguments is always '', it doesn't support function overloads anyway and its DROP FUNCTION statement doesn't require an argument list.
  4. We implement parse_arguments(str) -> List[Arg] | None for postgresql.Function and mysql.Function.
  5. We implement .with_arguments(List[Arg]) helpers for postgresql.Function and mysql.Function, .with_arguments renders the List[Arg] to a string and then parses it back using parse_arguments.
    1. If it can't parse it back, or parses it back to a different List[Arg], then we throw an error that tells the user that we don't support their arguments through .with_arguments yet and that they should instead pass in a raw_arguments that they make sure is fully-normalized.
    2. If their List[Arg] comes back unchanged from our parser, we normalize the List[Arg] and then render the raw_arguments and the identity_arguments of the Function from the normalized List[Arg]

The end result of this is that

  1. We support the full range of Postgres/MySQL function arguments if the user is willing to pass in a fully normalized raw_arguments and identity_arguments themselves.
  2. We also support a convenience method through .with_arguments, where normalization and identity_arguments is handled by us.
  3. Both of these ways of creating a Function will be future-proof, because user-supplied raw_arguments are already normalized, and .with_arguments is conservative in what it accepts, so a user can't shoehorn a weird Arg that will start failing when we extend the Arg type and the parse_arguments function in the future.

What do you think? I can pivot this PR to implement this suggestion.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. certainly, although i might suggest some @property named identity that returns a unique string rather than just propagating the tuple directly everywhere it's keyed.

  2. this is where you start to lose me. particularly with

    .with_arguments renders the List[Arg] to a string and then parses it back using parse_arguments.

    Imo ideally the canonical representation should be the normalized Arg one, and converting it to a string for comparison very much feels like a smell.

    perhaps if the parser cannot parse the incoming string, then you could fall back to a raw repr that's required to be defined identically to postgres' normalized format in order to compare equal.

    but i think the rendering code should otherwise not be expected to return the normalized format, but rather one that's guaranteed to parse (e.x. always quote)

  3. I think rather than that, i'd put these fields on a Arg class directly. e.g.

    class Function:
        arguments: list[Arg | RawArg] # and maybe str, so long as it's normalized to Arg
    
    @dataclass
    class Arg:
       ... all the parsable fields
    
       def from_string/parse(cls, repr: str) -> Arg | RawArg: ...
    
    @dataclass
    class RawArg:
       repr: str
       identity: str

    my goal being that:

    • the natural comparison between two instances of a Function will compare equal or not
    • there's a structure (Arg/RawArg) that can know how to parse, and separately how to render
    • we're not rendering nice python arg representations into strings as the "normal" format for comparison.

I'm imagining

Function(...).with_arguments(Arg('a', 'int'), RawArg('b varchar'), 'c int')
# if it gets a string, it attempts Arg.from_string/parse and if it can parse the arg, it returns an Arg, if it can't it returns a RawArg

and i (if i'm not missing some of your concerns) feel like this ought enable avoiding rendering for the purposes of comparison.

Copy link
Author

@enobayram enobayram Apr 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm sorry for the delay here, I start every day intending to make the changes, but then get caught up in other stuff.

I want to mention that list[Arg | RawArg] may not work (and why I suggest list[Arg] | RawArgs), because with all the quoting and various complexities in the definitions of the arguments (like complex parametric, nested types, quoted names, default string values with escaped quotes etc.), it's a non-trivial assumption to make that we can always split a given pg_get_function_arguments or pg_get_function_identity_arguments correctly to a list of arguments (raw or not). That's why I suggest RawArgs as the most conservative fallback whenever we encounter a tricky pg_get_function_arguments result, instead of parsing it as a List[Arg|RawArg] haphazardly.

I'm imagining
Function(...).with_arguments(Arg('a', 'int'), RawArg('b varchar'), 'c int')

Here's the problem with this: What if the user passes Function(...).with_arguments(Arg('a', 'int, a2 str'), RawArg('b varchar'), 'c int') or Function(...).with_arguments(Arg('a', "str = ', \','"), RawArg('b varchar'), 'c int'). The former is obviously an error on the user's side, but the latter is almost valid. In any case, it's possible for something weird like this to work out of pure luck if the way we compare it to the pg_get_function_arguments and pg_get_function_identity_arguments happens to work. My problem with this is that when we improve our parsers in the future, what happened to work with the simpler parser, may stop working with the improved parser.

That's the reason why I suggested rendering the List[Arg] into a string and parsing back with our (incomplete and conservative) parser. If the provided List[Arg] passes through render->parse without changing, then we know there's nothing tricky in it, and that if it works today, it will always work.

If it doesn't pass through unchanged, then we act conservatively and reject it entirely, telling the user to pass in the whole arguments string (not even a List[RawArg], but a RawArgs). When they pass in a RawArgs, then the responsibility lies with them to make sure that the string matches exactly the canonical form that Postgres reduces it to. This way, once the user comes up with a RawArgs that works, we know that it will work in the future versions too.

Again, the main intention is to implement a solution that accepts structured input for simple cases, while allowing arbitrary pg_get_function_arguments for complex cases, all the while guaranteeing that what works now will keep working as we improve the parser as needed.

All of that said , we could also just skip the whole RawArgs interface, and simply not provide any way to express functions with arguments that our parser doesn't support. In that case I would still suggest passing the given List[Arg] through render->parse and assert that it doesn't change so that you can't manage to sneak in something weird that will stop working with future parsers.

My personal top priority is to make sure we can't break (seemingly) working code by improving the parser. The second priority is to allow arbitrary function definitions if the user is willing to do the hard work.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dont know whether you think this meaningfully changes the calculus or not, but it seems like you could directly use the proargnames, proargmodes, types, and for defaults use pg_get_expr(proargdefaults,'pg_proc'::regclass) to obtain the canonical default expression.

it seems to me that they seems to have columns for all the things i'd expect to be relevant, and a way to turn the main problem child in terms of parsing (default) into an expression separately such that we dont actually need a parser as such

@DanCardin
Copy link
Owner

DanCardin commented Apr 2, 2025

ah, you'll also need to rebase onto main, where i fixed the CI problem.

JK i just rebased the PR version, but just something to be aware of if/when you develop on top of this.

@DanCardin DanCardin force-pushed the function-parameters-and-volatility branch from 952377d to 12fe2d2 Compare April 2, 2025 20:02
@enobayram
Copy link
Author

@DanCardin Thank you very much for your comments! And also for resolving the issue with the CI. I've made sure that the tests pass locally, let's hope they pass in CI too!

Copy link

codecov bot commented Apr 2, 2025

Codecov Report

Attention: Patch coverage is 78.02198% with 20 lines in your changes missing coverage. Please review.

Project coverage is 89.55%. Comparing base (545cd8e) to head (0dfc88d).
Report is 2 commits behind head on main.

Files with missing lines Patch % Lines
...arative_extensions/dialects/postgresql/function.py 75.00% 12 Missing and 4 partials ⚠️
..._declarative_extensions/dialects/mysql/function.py 88.23% 1 Missing and 1 partial ⚠️
...sqlalchemy_declarative_extensions/function/base.py 60.00% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #104      +/-   ##
==========================================
- Coverage   89.87%   89.55%   -0.33%     
==========================================
  Files          85       85              
  Lines        3811     3895      +84     
  Branches      778      802      +24     
==========================================
+ Hits         3425     3488      +63     
- Misses        314      331      +17     
- Partials       72       76       +4     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support functions with parameters
2 participants