Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Passing string to inclusive kwarg of Series.between in src/tlo/methods/stunting.py gives wrong behaviour in Pandas v1.2 #942

Closed
matt-graham opened this issue Apr 17, 2023 · 0 comments · Fixed by #943
Assignees
Labels
bug Something isn't working

Comments

@matt-graham
Copy link
Collaborator

In the stunting module, the Series.between method is passed inclusive='left' at various points

mask = df.is_alive & df.age_exact_years.between(low_bound_age_in_years, high_bound_age_in_years,
inclusive='left')

Predictor('age_exact_years',
conditions_are_exhaustive=True,
conditions_are_mutually_exclusive=True).when('< 0.5',
p['base_inc_rate_stunting_by_agegp'][0])
.when('.between(0.5, 1.0, inclusive="left")',
p['base_inc_rate_stunting_by_agegp'][1])
.when('.between(1.0, 2.0, inclusive="left")',
p['base_inc_rate_stunting_by_agegp'][2])
.when('.between(2.0, 3.0, inclusive="left")',
p['base_inc_rate_stunting_by_agegp'][3])
.when('.between(3.0, 4.0, inclusive="left")',
p['base_inc_rate_stunting_by_agegp'][4])
.when('.between(4.0, 5.0, inclusive="left")',
p['base_inc_rate_stunting_by_agegp'][5])
.when('> 5.0', 0.0),

Predictor().when(
'(nb_breastfeeding_status == "none") & (age_exact_years.between(0.5, 2.0, inclusive="left"))',
p['rr_stunting_no_continued_breastfeeding']),

Predictor('age_exact_years',
conditions_are_exhaustive=True,
conditions_are_mutually_exclusive=True).when('< 0.5',
p['r_progression_severe_stunting_by_agegp'][0])
.when('.between(0.5, 1.0, inclusive="left")',
p['r_progression_severe_stunting_by_agegp'][1])
.when('.between(1.0, 2.0, inclusive="left")',
p['r_progression_severe_stunting_by_agegp'][2])
.when('.between(2.0, 3.0, inclusive="left")',
p['r_progression_severe_stunting_by_agegp'][3])
.when('.between(3.0, 4.0, inclusive="left")',
p['r_progression_severe_stunting_by_agegp'][4])
.when('.between(4.0, 5.0, inclusive="left")',
p['r_progression_severe_stunting_by_agegp'][5])
.when('> 5.0', 1.0),

This follows the usage recommended in the guidance notes in the wiki I wrote. Unfortunately it turns out I didn't check the Pandas documentation carefully enough when writing these, as for Pandas v1.2 inclusive should be a boolean argument corresponding to whether to include both bounds in the test or not. In Pandas v1.3 inclusive was changed to instead be a string argument which can take values in {"both", "neither", "left", "right"}, which corresponds to the expected behaviour for the usage recommended in the wiki and currently used in stunting. Passing a string for the inclusive argument in Pandas v1.2.2 (the version we are currently using) doesn't raise an error as the argument value is used directly as a boolean condition. As non-empty strings correspond to True in Python this means with Pandas v1.2.2 using inclusive="left" acts equivalently to inclusive=True (and to inclusive="both" in Pandas v1.3 and above).

This means currently the various usages in stunting are not behaving as expected. This I think explains at least some of the differences we see between the population dataframes outputted by simulations using Pandas v1.2.2 and v2.0.0 as mentioned in #763 (comment).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant