Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: multi-plugins with extra schemas #231

Open
wants to merge 11 commits into
base: main
Choose a base branch
from

Conversation

henryiii
Copy link
Collaborator

@henryiii henryiii commented Feb 26, 2025

This closes #144 and closes #134, using the "That is a good approach!" approach. :)

This adds a new entry point, validate_pyproject.multi_schema, that returns a dictionary with tools as well as extra schemas. We could later add more optional keys to this dict, so it's flexible, and the tools table allows fragments to be specified. This would allow validate-pyproject-schema-store to return all of it's known schemas in one entry point, rather than having to add an entry point for every table (and would allow the extra schemas to finally be present).

A few points for review:

  • Check to make sure names are reasonable, I didn't think too long and hard about names.
  • Should tools be a required key (currently not)? (Someone could always add an empty dict if they specified the rest with classic entry points)
  • Should one of the two take priority over the other? For back-compat, validate-pyproject-schema-store will likely provide both for a while. Currently this is just de-duped based on either the tool name, if present, or the schema ID otherwise, and multi plugins are loaded afterwards, so I think they take priority, but I could switch it (and test it).
  • Should filtering be any different?
  • I'm pulling help text from the description

Signed-off-by: Henry Schreiner <[email protected]>
Signed-off-by: Henry Schreiner <[email protected]>
@henryiii henryiii force-pushed the henryiii/feat/multi2 branch from 3790521 to ff44b37 Compare February 26, 2025 20:12
Copy link
Owner

@abravalheri abravalheri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you very much for working on this @henryiii, sorry for the delay in reviewing.

Check to make sure names are reasonable, I didn't think too long and hard about names.

Names are hard 😅. Seems reasonable. What is the rationale behind StoredPlugin? (I suppose the vision is that they come from SchemaStore).

Should tools be a required key (currently not)? (Someone could always add an empty dict if they specified the rest with classic entry points)

I am OK with that.

Should one of the two take priority over the other? For back-compat, validate-pyproject-schema-store will likely provide both for a while. Currently this is just de-duped based on either the tool name, if present, or the schema ID otherwise, and multi plugins are loaded afterwards, so I think they take priority, but I could switch it (and test it).

I don't have a strong opinion about this. I think the important part is to ensure consistency and replicability, which sorting is already doing.

Should filtering be any different?

Possibly also apply the filtering to the "multi" entry-points for the sake of symmetry? I don't foresee a lot of usage of this feature out of the context of tests (do you?), so it should not matter too much.

I'm pulling help text from the description

The help text is just a hint to show in the CLI (enable/disable bits). I suppose it is intended to be a "oneliner" about the plugin. Getting the description sounds reasonable. Maybe we want to truncate on the first line.
(I need to review that for setuptools and distutils plugins in a separate PR because now they are not displaying anything useful).


Very minor brainstorm, just to share (but I don't think it makes sense to spend much energy on it):

It seems that we are walking towards having a "generic" type of schema, and then a specialisation for schemas that also represent "tools".
Do you think we should work on the hierarchy of the plugin classes? Does it make sense to have PluginWrapper supporting the "generic" plugin that is loaded by "multi" and then a subclass just for the "tools"? Basically:

class PluginWrapper:
    # ...the implementation proposed for StoredPlugin in this PR

class ToolWrapper(PluginWrapper):
    # the methods that are now in PluginWrapper right now

(Possibly when I first named PluginWrapper I should have used a better name like SchemaWrapper, but it is not a concern for now :P)

@@ -105,23 +133,42 @@ def load_from_entry_point(entry_point: EntryPoint) -> PluginWrapper:
raise ErrorLoadingPlugin(entry_point=entry_point) from ex


def load_from_multi_entry_point(
entry_point: EntryPoint,
) -> Generator[StoredPlugin, None, None]:
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find it easier to read Iterator[T] instead of Generator[T, None, None] and understand what the function does (we don't have to remember which argument order for the return, yield and send).

Since it does not affect too much the typechecking, I tend to stick with the simpler approach. Is there a benefit in using Generator?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a habit I have due to a problem in typing: things like contextlib.contextmanager takes an Iterator because people usually type these as iterators. This is incorrect, as contextmanager calls .close, which is part of the Generator protocol, not the Iterator protocol. So mypy can't find a mistake like passing an Iterator instead of a Generator to things like contextmanager. The reason they gave for not fixing it was that everyone types these as Iterator. So I never type it as Iterator. :) Python 3.13 also added single argument Generator[T], largely to help with this I believe.

Co-authored-by: Anderson Bravalheri <[email protected]>
Signed-off-by: Henry Schreiner <[email protected]>
@henryiii henryiii force-pushed the henryiii/feat/multi2 branch from 077b1d0 to f5f2329 Compare March 11, 2025 15:12
Signed-off-by: Henry Schreiner <[email protected]>
@henryiii
Copy link
Collaborator Author

It seems that we are walking towards having a "generic" type of schema, and then a specialization for schemas that also represent "tools".

That's not what these two things represent. One is a plugin that is already loaded, and the other is one that is lazily loaded. Either one can have an empty tool name, which makes it a generic schema and not a tool-related one, but the key difference is lazy vs. not lazy. If you are fine with renaming, I think LazySchema and StoredSchema, for example, might be better.

@henryiii
Copy link
Collaborator Author

henryiii commented Mar 12, 2025

Added a test for the ordering. Currently multi plugins override matching classic plugins. Since they can provide helper schemas too, I think that's a fine ordering. Also I'm sorting the multi entry points by the entry point name, so which overrides if multiple plugins provide matching tools is predictable.

Copy link
Owner

@abravalheri abravalheri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you Henry.

That's not what these two things represent. One is a plugin that is already loaded, and the other is one that is lazily loaded. Either one can have an empty tool name, which makes it a generic schema and not a tool-related one, but the key difference is lazy vs. not lazy. If you are fine with renaming, I think LazySchema and StoredSchema, for example, might be better.

We could change the name to LazySchema, that would be OK for me, but I don't think we need to tackle this now...

Comment on lines +174 to +178
for e in sorted(
iterate_entry_points("validate_pyproject.multi_schema"),
key=lambda e: e.name,
reverse=True,
)
Copy link
Owner

@abravalheri abravalheri Mar 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also I'm sorting the multi entry points by the entry point name, so which overrides if multiple plugins provide matching tools is predictable.

Thank you very much @henryiii.

I am trying to understand the need of sorting twice to ensure replicability.
Ideally I would like to avoid sorting multiple times.

I wonder the following:

If we produce the iterators as:

    tool_eps = (
        (0, e.name, load_from_entry_point(e))
        ...
    )
    multi_eps = (
        ((1, e.name, p) for p in load_from_multi_entry_point(e))
        ...
    )

Could we remove the sorted in multi_eps, and then change the sorting key for something like:

def _predictable_sorting(element):
    priority, name, plugin = element
    return (priority, name, plugin.tool or plugin.id)

And then

sorting = sorted(eps, key=_predictable_sorting)
dedup = {(e.tool or e.id): e for _, _, e in sorting}

(All code is untested)

Observation 1: Not sure we need priority... It may be the case we can simply get rid of it and obtain good enough results.
Observation 2: That eps: Iterable[Union[StoredPlugin, PluginWrapper]] explicit annotation will become a nuisance... I wish the typechecker could automatically infer it.

Copy link
Collaborator Author

@henryiii henryiii Mar 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not sorting1 twice, and it is not related to which type of entry point takes priority; to switch that, we just chain in the other order. This sort is sorting the multi entry points by name; this happens before they are called and it produces the Schemas. If there are two multi entry points producing 5 tools, this sort is over two items, the later sort is over the five tools. It adds stability if two plugins both define a multi entry point that produces the same name. So, for example, let's say a user has two plugins:

  • validate-pyproject-schema-store defines a multi entry point with the name schema-store that produces a bunch of tools, including "cibuildwheel"
  • A hypothetical my-schemas package defines a multi entry point with the name my-schemas that also includes "cibuildwheel"

By sorting the multi entry points before rendering them, that ensures that my-schemas's "cibuildwheel" is always the one chosen, making it repeatable.

Also, if a user has cibuildwheel itself installed, which has a single entry point that produces "cibuildwheel", that will not take priority over the multi entry points. If it provided a multi entry point, then the name of the multi entry point would control the override order. Maybe we want single entry points to take priority, but since they are less powerful, I left it this way, but could switch if that makes more sense.

Footnotes

  1. The same items, that is.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I see you are adding the name information. I can try that.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, pushed something.

Signed-off-by: Henry Schreiner <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants