Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Validate node values #3715

Merged
merged 7 commits into from
Mar 18, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion RELEASE.md
Original file line number Diff line number Diff line change
@@ -1,9 +1,10 @@
# Upcoming Release 0.19.4

## Major features and improvements
* Improved error message when passing wrong value to node.
* Cookiecutter errors are shown in short format without the `--verbose` flag.
* Kedro commands now work from any subdirectory within a Kedro project.
* Kedro CLI now provides a better error message when project commands are run outside of a project i.e. `kedro run`
* Kedro CLI now provides a better error message when project commands are run outside of a project i.e. `kedro run`.

Check warning on line 7 in RELEASE.md

View workflow job for this annotation

GitHub Actions / vale

[vale] RELEASE.md#L7

[Kedro.abbreviations] Use 'that is' instead of abbreviations like 'i.e.'.
Raw output
{"message": "[Kedro.abbreviations] Use 'that is' instead of abbreviations like 'i.e.'.", "location": {"path": "RELEASE.md", "range": {"start": {"line": 7, "column": 100}}}, "severity": "WARNING"}

## Bug fixes and other changes
* Updated `kedro pipeline create` and `kedro pipeline delete` to read the base environment from the project settings.
Expand Down
23 changes: 21 additions & 2 deletions kedro/pipeline/node.py
Original file line number Diff line number Diff line change
Expand Up @@ -42,7 +42,7 @@ def __init__( # noqa: PLR0913
function. When dict[str, str] is provided, variable names
will be mapped to function argument names.
outputs: The name or the list of the names of variables used
as outputs to the function. The number of names should match
as outputs of the function. The number of names should match
the number of outputs returned by the provided function.
When dict[str, str] is provided, variable names will be mapped
to the named outputs the function returns.
Expand All @@ -67,7 +67,6 @@ def __init__( # noqa: PLR0913
and/or fullstops.

"""

if not callable(func):
raise ValueError(
_node_error_message(
Expand All @@ -83,6 +82,16 @@ def __init__( # noqa: PLR0913
)
)

for _input in _to_list(inputs):
if not isinstance(_input, str):
raise ValueError(
_node_error_message(
f"names of variables used as inputs to the function "
f"must be of 'String' type, but {_input} from {inputs} "
f"is '{type(_input)}'."
)
)

if outputs and not isinstance(outputs, (list, dict, str)):
raise ValueError(
_node_error_message(
Expand All @@ -91,6 +100,16 @@ def __init__( # noqa: PLR0913
)
)

for _output in _to_list(outputs):
if not isinstance(_output, str):
raise ValueError(
_node_error_message(
f"names of variables used as outputs of the function "
f"must be of 'String' type, but {_output} from {outputs} "
f"is '{type(_output)}'."
)
)

if not inputs and not outputs:
raise ValueError(
_node_error_message("it must have some 'inputs' or 'outputs'.")
Expand Down
2 changes: 1 addition & 1 deletion tests/ipython/conftest.py
Original file line number Diff line number Diff line change
Expand Up @@ -111,7 +111,7 @@ def dummy_node_empty_input():
return node(
func=dummy_function,
inputs=["", ""],
outputs=[None],
outputs=None,
Copy link
Contributor Author

@ElenaKhaustova ElenaKhaustova Mar 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@noklam is this a typo or expected? It breaks the validation added and contradicts the Node definition where we do not expect List[None] as an input:

class Node:
    """``Node`` is an auxiliary class facilitating the operations required to
    run user-provided functions as part of Kedro pipelines.
    """
    def __init__(
        self,
        func: Callable,
        inputs: str | list[str] | dict[str, str] | None,
        outputs: str | list[str] | dict[str, str] | None,
        *,
        name: str | None = None,
        tags: str | Iterable[str] | None = None,
        confirms: str | list[str] | None = None,
        namespace: str | None = None,
    ):

So I fixed it like this now and it works but should be discussed if this is an expected behaviour.

Copy link
Contributor

@noklam noklam Mar 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ElenaKhaustova #2408, If I remember, output doesn't support str but input does (which is the issue I link). They should be consistent, since you are looking at this already, maybe you have a quick answer for that?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ElenaKhaustova In this case I think changing to None is fine. It's "breaking" straightly but it's also a very small thing and easy to fix. I have never seen someone do [None] previously.

So after this validation [None] will not be accepted.

cc @merelcht

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, @noklam!

As an alternative, in case we need to support [None], we at least need to update accepted types in the Node constructor.

name="dummy_node_empty_input",
)

Expand Down
13 changes: 13 additions & 0 deletions tests/pipeline/test_node.py
Original file line number Diff line number Diff line change
Expand Up @@ -252,6 +252,14 @@ def duplicate_output_list_node():
return identity, "A", ["A", "A"]


def bad_input_variable_name():
return lambda x: None, {"a": 1, "b": "B"}, {"a": "A", "b": "B"}


def bad_output_variable_name():
return lambda x: None, {"a": "A", "b": "B"}, {"a": "A", "b": 2}


@pytest.mark.parametrize(
"func, expected",
[
Expand All @@ -275,6 +283,11 @@ def duplicate_output_list_node():
r"\(\[A\]\) -> \[A;A\] due to "
r"duplicate output\(s\) {\'A\'}.",
),
(bad_input_variable_name, "names of variables used as inputs to the function "),
(
bad_output_variable_name,
"names of variables used as outputs of the function ",
),
],
)
def test_bad_node(func, expected):
Expand Down