Skip to content

Commit 856f0de

Browse files
authored
Enhance ids generated by parametrization. (#21)
1 parent 514f5f8 commit 856f0de

13 files changed

+519
-44
lines changed

docs/changes.rst

+2
Original file line numberDiff line numberDiff line change
@@ -17,6 +17,8 @@ all releases are available on `Anaconda.org <https://anaconda.org/pytask/pytask>
1717
- :gh:`18` changes the documentation theme to alabaster.
1818
- :gh:`19` adds some changes related to ignored folders.
1919
- :gh:`20` fixes copying code examples in the documentation.
20+
- :gh:`21` enhances the ids generated by parametrization, allows to change them via the
21+
``ids`` argument, and adds tutorials.
2022
- :gh:`23` allows to specify paths via the configuration file, documents the cli and
2123
configuration options.
2224

Original file line numberDiff line numberDiff line change
@@ -0,0 +1,69 @@
1+
How to extend parametrizations
2+
==============================
3+
4+
Parametrization helps you to reuse code and quickly scale from one to a multitude of
5+
tasks. Sometimes, these tasks are expensive because they take long or require a lot of
6+
resources. Thus, you only want to run them if really necessary.
7+
8+
9+
The problem
10+
-----------
11+
12+
There are two problems when extending parametrizations which might trigger accidental
13+
reruns of tasks.
14+
15+
16+
IDs
17+
~~~
18+
19+
If you do not know how ids for parametrized tasks are produced, read the following
20+
:ref:`section in the tutorial about parametrization <how_to_parametrize_a_task_the_id>`.
21+
22+
The problem is that argument values which are not booleans, numbers or strings produce
23+
positionally dependent ids. The position might change if you extend the parametrization
24+
which re-executes a task.
25+
26+
To resolve the problem, you can choose one of the two solutions in the tutorial. Either
27+
pass a function to convert non-standard objects to suitable representations or pass your
28+
own ids.
29+
30+
31+
Modification of the task module
32+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
33+
34+
To extend your parametrization, you would normally change the module in which the task
35+
is defined. By default, this triggers a re-run of the task.
36+
37+
38+
Solution: side-effect
39+
---------------------
40+
41+
The problem can be resolved by introducing a side-effect. Add another module with the
42+
following content.
43+
44+
.. code-block:: python
45+
46+
# Content of side_effect.py
47+
48+
ARG_VALUES = [(0,), (1,)]
49+
IDS = ["first_tuple", "second_tuple"]
50+
51+
And change the task module to
52+
53+
.. code-block:: python
54+
55+
import pytask
56+
from side_effect import ARG_VALUES, IDS
57+
58+
59+
@pytask.mark.parametrize("i", ARG_VALUES, ids=IDS)
60+
def task_example(i):
61+
pass
62+
63+
The key idea is to not reference the ``side_effect.py`` module as a dependency of the
64+
task. Now, you can extend the parametrization without re-executing former tasks.
65+
66+
**Caveat**: Be careful, because pytask does not care about which object is passed to the
67+
parametrized function. Thus, it would be better to replace ``IDS`` with a function which
68+
hashes the tuples to recognize changes as shown in the :ref:`tutorial
69+
<how_to_parametrize_a_task_convert_other_objects>`.

docs/how_to_guides/index.rst

+1
Original file line numberDiff line numberDiff line change
@@ -9,3 +9,4 @@ pytask.
99
:maxdepth: 1
1010

1111
how_to_write_a_plugin
12+
how_to_extend_parametrizations

docs/tutorials/how_to_parametrize_a_task.rst

+111
Original file line numberDiff line numberDiff line change
@@ -140,3 +140,114 @@ The signature can be passed in three different formats.
140140
.. code-block:: python
141141
142142
["first_argument", "second_argument"]
143+
144+
145+
.. _how_to_parametrize_a_task_the_id:
146+
147+
The id
148+
------
149+
150+
Every task has a unique id which can be used to :doc:`select it <how_to_select_tasks>`.
151+
The normal id combines the path to the module where the task is defined, a double colon,
152+
and the name of the task function. Here is an example.
153+
154+
.. code-block::
155+
156+
../task_example.py::task_example
157+
158+
This behavior would produce duplicate ids for parametrized tasks. Therefore, there exist
159+
multiple mechanisms to produce unique ids.
160+
161+
162+
Auto-generated ids
163+
~~~~~~~~~~~~~~~~~~
164+
165+
To avoid duplicate task ids, the ids of parametrized tasks are extended with
166+
descriptions of the values they are parametrized with. Booleans, floats, integers and
167+
strings enter the task id directly. For example, a task function which receives four
168+
arguments, ``True``, ``1.0``, ``2``, and ``"hello"``, one of each dtype, has the
169+
following id.
170+
171+
.. code-block::
172+
173+
task_example.py::task_example[True-1.0-2-hello]
174+
175+
Arguments with other dtypes cannot be easily converted to strings and, thus, are
176+
replaced with a combination of the argument name and the iteration counter.
177+
178+
For example, the following function is parametrized with tuples.
179+
180+
.. code-block:: python
181+
182+
@pytask.mark.parametrized("i", [(0,), (1,)])
183+
def task_example(i):
184+
pass
185+
186+
Since the tuples are not converted to strings, the ids of the two tasks are
187+
188+
.. code-block::
189+
190+
task_example.py::task_example[i0]
191+
task_example.py::task_example[i1]
192+
193+
194+
.. _how_to_parametrize_a_task_convert_other_objects:
195+
196+
Convert other objects
197+
~~~~~~~~~~~~~~~~~~~~~
198+
199+
To change the representation of tuples and other objects, you can pass a function to the
200+
``ids`` argument of the :func:`~_pytask.parametrize.parametrize` decorator. The function
201+
is called for every argument and may return a boolean, number, or string which will be
202+
integrated into the id. For every other return, the auto-generated value is used.
203+
204+
To get a unique representation of a tuple, we can use the hash value.
205+
206+
.. code-block:: python
207+
208+
def tuple_to_hash(value):
209+
if isinstance(value, tuple):
210+
return hash(a)
211+
212+
213+
@pytask.mark.parametrized("i", [(0,), (1,)], ids=tuple_to_hash)
214+
def task_example(i):
215+
pass
216+
217+
This produces the following ids:
218+
219+
.. code-block::
220+
221+
task_example.py::task_example[3430018387555] # (0,)
222+
task_example.py::task_example[3430019387558] # (1,)
223+
224+
225+
User-defined ids
226+
~~~~~~~~~~~~~~~~
227+
228+
Instead of a function, you can also pass a list or another iterable of id values via
229+
``ids``.
230+
231+
This code
232+
233+
.. code-block:: python
234+
235+
@pytask.mark.parametrized("i", [(0,), (1,)], ids=["first", "second"])
236+
def task_example(i):
237+
pass
238+
239+
produces these ids
240+
241+
.. code-block::
242+
243+
task_example.py::task_example[first] # (0,)
244+
task_example.py::task_example[second] # (1,)
245+
246+
This is arguably the easiest way to change the representation of many objects at once
247+
while also producing ids which are easy to remember and type.
248+
249+
250+
Further reading
251+
---------------
252+
253+
- :doc:`../how_to_guides/how_to_extend_parametrizations`.

docs/tutorials/how_to_select_tasks.rst

+27
Original file line numberDiff line numberDiff line change
@@ -50,6 +50,9 @@ for the analysis.
5050
Expressions
5151
-----------
5252

53+
General
54+
~~~~~~~
55+
5356
Expressions are similar to markers and offer the same syntax but target the task ids.
5457
Assume you have the following tasks.
5558

@@ -85,3 +88,27 @@ To execute a single task, say ``task_run_this_one`` in ``task_example.py``, use
8588
.. code-block:: console
8689
8790
$ pytask -k task_example.py::task_run_this_one
91+
92+
93+
.. _how_to_select_tasks_parametrization:
94+
95+
Parametrization
96+
~~~~~~~~~~~~~~~
97+
98+
If you have a task which is parametrized, you can select individual parametrizations.
99+
100+
.. code-block:: python
101+
102+
@pytask.mark.parametrize("i", range(2))
103+
def task_parametrized(i):
104+
pass
105+
106+
To run the task where ``i = 1``, type
107+
108+
.. code-block:: bash
109+
110+
$ pytask -k task_parametrized[1]
111+
112+
Booleans, floats, integers, and strings are used in the task id as they are, but all
113+
other Python objects like tuples are replaced with a combination of the argument name
114+
and an iteration counter. Multiple arguments are separated via dashes.

src/_pytask/debugging.py

+11-1
Original file line numberDiff line numberDiff line change
@@ -48,8 +48,14 @@ def pytask_parse_config(config, config_from_cli, config_from_file):
4848
)
4949

5050

51-
@hookimpl
51+
@hookimpl(trylast=True)
5252
def pytask_post_parse(config):
53+
"""Post parse the configuration.
54+
55+
Register the plugins in this step to let other plugins influence the pdb or trace
56+
option and may be disable it. Especially thinking about pytask-parallel.
57+
58+
"""
5359
if config["pdb"]:
5460
config["pm"].register(PdbDebugger)
5561

@@ -69,6 +75,8 @@ def pytask_execute_task(task):
6975

7076

7177
def wrap_function_for_post_mortem_debugging(function):
78+
"""Wrap the function for post-mortem debugging."""
79+
7280
@functools.wraps(function)
7381
def wrapper(*args, **kwargs):
7482
try:
@@ -93,6 +101,8 @@ def pytask_execute_task(task):
93101

94102

95103
def wrap_function_for_tracing(function):
104+
"""Wrap the function for tracing."""
105+
96106
@functools.wraps(function)
97107
def wrapper(*args, **kwargs):
98108
pdb.runcall(function, *args, **kwargs)

0 commit comments

Comments
 (0)