-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: multi-plugins with extra schemas #231
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: Henry Schreiner <[email protected]>
Signed-off-by: Henry Schreiner <[email protected]>
Signed-off-by: Henry Schreiner <[email protected]>
3790521
to
ff44b37
Compare
Signed-off-by: Henry Schreiner <[email protected]>
Signed-off-by: Henry Schreiner <[email protected]>
Signed-off-by: Henry Schreiner <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you very much for working on this @henryiii, sorry for the delay in reviewing.
Check to make sure names are reasonable, I didn't think too long and hard about names.
Names are hard 😅. Seems reasonable. What is the rationale behind StoredPlugin
? (I suppose the vision is that they come from SchemaStore).
Should tools be a required key (currently not)? (Someone could always add an empty dict if they specified the rest with classic entry points)
I am OK with that.
Should one of the two take priority over the other? For back-compat, validate-pyproject-schema-store will likely provide both for a while. Currently this is just de-duped based on either the tool name, if present, or the schema ID otherwise, and multi plugins are loaded afterwards, so I think they take priority, but I could switch it (and test it).
I don't have a strong opinion about this. I think the important part is to ensure consistency and replicability, which sorting is already doing.
Should filtering be any different?
Possibly also apply the filtering to the "multi" entry-points for the sake of symmetry? I don't foresee a lot of usage of this feature out of the context of tests (do you?), so it should not matter too much.
I'm pulling help text from the description
The help text is just a hint to show in the CLI (enable/disable
bits). I suppose it is intended to be a "oneliner" about the plugin. Getting the description
sounds reasonable. Maybe we want to truncate on the first line.
(I need to review that for setuptools
and distutils
plugins in a separate PR because now they are not displaying anything useful).
Very minor brainstorm, just to share (but I don't think it makes sense to spend much energy on it):
It seems that we are walking towards having a "generic" type of schema, and then a specialisation for schemas that also represent "tools".
Do you think we should work on the hierarchy of the plugin classes? Does it make sense to have PluginWrapper
supporting the "generic" plugin that is loaded by "multi" and then a subclass just for the "tools"? Basically:
class PluginWrapper:
# ...the implementation proposed for StoredPlugin in this PR
class ToolWrapper(PluginWrapper):
# the methods that are now in PluginWrapper right now
(Possibly when I first named PluginWrapper
I should have used a better name like SchemaWrapper
, but it is not a concern for now :P)
@@ -105,23 +133,42 @@ def load_from_entry_point(entry_point: EntryPoint) -> PluginWrapper: | |||
raise ErrorLoadingPlugin(entry_point=entry_point) from ex | |||
|
|||
|
|||
def load_from_multi_entry_point( | |||
entry_point: EntryPoint, | |||
) -> Generator[StoredPlugin, None, None]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I find it easier to read Iterator[T]
instead of Generator[T, None, None]
and understand what the function does (we don't have to remember which argument order for the return, yield and send).
Since it does not affect too much the typechecking, I tend to stick with the simpler approach. Is there a benefit in using Generator
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a habit I have due to a problem in typing: things like contextlib.contextmanager
takes an Iterator
because people usually type these as iterators. This is incorrect, as contextmanager
calls .close
, which is part of the Generator protocol, not the Iterator protocol. So mypy can't find a mistake like passing an Iterator
instead of a Generator
to things like contextmanager
. The reason they gave for not fixing it was that everyone types these as Iterator
. So I never type it as Iterator
. :) Python 3.13 also added single argument Generator[T]
, largely to help with this I believe.
Co-authored-by: Anderson Bravalheri <[email protected]> Signed-off-by: Henry Schreiner <[email protected]>
077b1d0
to
f5f2329
Compare
Signed-off-by: Henry Schreiner <[email protected]>
That's not what these two things represent. One is a plugin that is already loaded, and the other is one that is lazily loaded. Either one can have an empty tool name, which makes it a generic schema and not a tool-related one, but the key difference is lazy vs. not lazy. If you are fine with renaming, I think LazySchema and StoredSchema, for example, might be better. |
Signed-off-by: Henry Schreiner <[email protected]>
Signed-off-by: Henry Schreiner <[email protected]>
Added a test for the ordering. Currently multi plugins override matching classic plugins. Since they can provide helper schemas too, I think that's a fine ordering. Also I'm sorting the multi entry points by the entry point name, so which overrides if multiple plugins provide matching tools is predictable. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you Henry.
That's not what these two things represent. One is a plugin that is already loaded, and the other is one that is lazily loaded. Either one can have an empty tool name, which makes it a generic schema and not a tool-related one, but the key difference is lazy vs. not lazy. If you are fine with renaming, I think LazySchema and StoredSchema, for example, might be better.
We could change the name to LazySchema
, that would be OK for me, but I don't think we need to tackle this now...
for e in sorted( | ||
iterate_entry_points("validate_pyproject.multi_schema"), | ||
key=lambda e: e.name, | ||
reverse=True, | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also I'm sorting the multi entry points by the entry point name, so which overrides if multiple plugins provide matching tools is predictable.
Thank you very much @henryiii.
I am trying to understand the need of sorting twice to ensure replicability.
Ideally I would like to avoid sorting multiple times.
I wonder the following:
If we produce the iterators as:
tool_eps = (
(0, e.name, load_from_entry_point(e))
...
)
multi_eps = (
((1, e.name, p) for p in load_from_multi_entry_point(e))
...
)
Could we remove the sorted
in multi_eps
, and then change the sorting key for something like:
def _predictable_sorting(element):
priority, name, plugin = element
return (priority, name, plugin.tool or plugin.id)
And then
sorting = sorted(eps, key=_predictable_sorting)
dedup = {(e.tool or e.id): e for _, _, e in sorting}
(All code is untested)
Observation 1: Not sure we need priority
... It may be the case we can simply get rid of it and obtain good enough results.
Observation 2: That eps: Iterable[Union[StoredPlugin, PluginWrapper]]
explicit annotation will become a nuisance... I wish the typechecker could automatically infer it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not sorting1 twice, and it is not related to which type of entry point takes priority; to switch that, we just chain in the other order. This sort is sorting the multi entry points by name; this happens before they are called and it produces the Schemas. If there are two multi entry points producing 5 tools, this sort is over two items, the later sort is over the five tools. It adds stability if two plugins both define a multi entry point that produces the same name. So, for example, let's say a user has two plugins:
validate-pyproject-schema-store
defines a multi entry point with the nameschema-store
that produces a bunch of tools, including "cibuildwheel"- A hypothetical
my-schemas
package defines a multi entry point with the namemy-schemas
that also includes "cibuildwheel"
By sorting the multi entry points before rendering them, that ensures that my-schemas
's "cibuildwheel" is always the one chosen, making it repeatable.
Also, if a user has cibuildwheel
itself installed, which has a single entry point that produces "cibuildwheel", that will not take priority over the multi entry points. If it provided a multi entry point, then the name of the multi entry point would control the override order. Maybe we want single entry points to take priority, but since they are less powerful, I left it this way, but could switch if that makes more sense.
Footnotes
-
The same items, that is. ↩
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I see you are adding the name information. I can try that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, pushed something.
Signed-off-by: Henry Schreiner <[email protected]>
This closes #144 and closes #134, using the "That is a good approach!" approach. :)
This adds a new entry point,
validate_pyproject.multi_schema
, that returns a dictionary withtools
as well as extraschemas
. We could later add more optional keys to this dict, so it's flexible, and thetools
table allows fragments to be specified. This would allow validate-pyproject-schema-store to return all of it's known schemas in one entry point, rather than having to add an entry point for every table (and would allow the extra schemas to finally be present).A few points for review:
tools
be a required key (currently not)? (Someone could always add an empty dict if they specified the rest with classic entry points)