-
Notifications
You must be signed in to change notification settings - Fork 49
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ci[cartesian]: Thread safe parallel stencil tests #1849
ci[cartesian]: Thread safe parallel stencil tests #1849
Conversation
To avoid repeating boiler plate code in testing, `StencilTestSuite` provides a convenient interace to test gtscript stencils. Within that `StencilTestSuite` base class, generating the stencil is separated from running & validating the stencil code. Each deriving test class will end up with two tests: one for stencil generation and a second one to test the implementation by running the generated code with defined inputs and expected outputs. The base class was written such that the implementation test would re-use the generated stencil code from the first test. This introduces an implicit test order dependency. To save time and avoid unnecessary test failure outputs, failing to generate the stencil code would automatically skip the implementation/validation test. Running tests in parallel (with `xdist`) breaks the expected test execution order (in the default configuration). This leads to automatically skiped validation tests in case the stencil code wasn't generated yet. On the CI, we only run with 2 threads so only a couple tests were skipped usually. Locally, I was running with 16 threads and got ~30 skipped validation tests. This PR proposes to address the issue by setting an `xdist_group` mark on the generation/implementation tests that belong togehter. In combination with `--dist loadgroup`, this will keep the expected order where necessary. Only tests with `xdist_group` markers are affected by `--dist loadgroup`. Tests without that marker will be distributed normally as if in `--dist load` mode (the default so far). By grouping with `cls_name` and backend, we keep maximal parallelization, grouping only the two tests that are depending on each other.
6900575
to
8e8e497
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
some details inline
@@ -167,7 +167,7 @@ def get_globals_combinations(dtypes): | |||
generation_strategy=composite_strategy_factory( | |||
d, generation_strategy_factories | |||
), | |||
implementations=[], | |||
implementation=None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This might have been different in the past. The way we cache implementations now, there's only ever max one implementation per test context.
The generated implementations are cached in a :class:`utils.ImplementationsDB` | ||
instance, to avoid duplication of (potentially expensive) compilations. | ||
The generated implementation is cached in the test context, to avoid duplication | ||
of (potentially expensive) compilation. | ||
Note: This caching introduces a dependency between tests, which is captured by an | ||
`xdist_group` marker in combination with `--dist loadgroup` to ensure safe parallel | ||
test execution. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment was out of date. There's no utils.ImplementationDB
(anymore).
test["implementations"].append(implementation) | ||
assert test["implementation"] is None | ||
test["implementation"] = implementation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Assert our assumption that we only ever cache one implementation per test context.
implementation_list = test["implementations"] | ||
if not implementation_list: | ||
pytest.skip( | ||
"Cannot perform validation tests, since there are no valid implementations." | ||
) | ||
for implementation in implementation_list: | ||
if not isinstance(implementation, StencilObject): | ||
raise RuntimeError("Wrong function got from implementations_db cache!") | ||
implementation = test["implementation"] | ||
assert ( | ||
implementation is not None | ||
), "Stencil not yet generated. Did you attempt to run stencil tests in parallel?" | ||
assert isinstance(implementation, StencilObject) | ||
|
||
cls._run_test_implementation(parameters_dict, implementation) | ||
cls._run_test_implementation(parameters_dict, implementation) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Simplified since we don't have an array of implementations (anymore). Assert that the stencil code has been generated. If not, fail instead of skip. This leads to more errors in case code generation fails. Imo the best way to handle this is to get rid of the the ideas of having two tests (one for codegen and one for validation) per class. We could achieve the same level of parallelization with less glue code if we had just one test per class (codegen and validation inside the same test).
/cc @egparedes @havogt FYI |
The previous one was from when I thought we had to run these tests on one thread only.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
Description
To avoid repeating boiler plate code in testing,
StencilTestSuite
provides a convenient interace to test gtscript stencils.Within that
StencilTestSuite
base class, generating the stencil is separated from running & validating the stencil code. Each deriving test class will end up with two tests: one for stencil generation and a second one to test the implementation by running the generated code with defined inputs and expected outputs.The base class was written such that the implementation test would re-use the generated stencil code from the first test. This introduces an implicit test order dependency. To save time and avoid unnecessary test failure outputs, failing to generate the stencil code would automatically skip the implementation/validation test.
Running tests in parallel (with
xdist
) breaks the expected test execution order (in the default configuration). This leads to automatically skiped validation tests in case the stencil code wasn't generated yet. On the CI, we only run with 2 threads so only a couple tests were skipped usually. Locally, I was running with 16 threads and got ~30 skipped validation tests.This PR proposes to address the issue by setting an
xdist_group
mark on the generation/implementation tests that belong togehter. In combination with--dist loadgroup
, this will keep the expected order where necessary. Only tests withxdist_group
markers are affected by--dist loadgroup
. Tests without that marker will be distributed normally as if in--dist load
mode (the default so far). By grouping withcls_name
and backend, we keep maximal parallelization, grouping only the two tests that are depending on each other.Further reading: see
--dist
section inpytest-xdist
documentation.Requirements
Existing tests are still green. No more skipped tests \o/ Works as expected locally
N/A