Passing string to `inclusive` kwarg of `Series.between` in `src/tlo/methods/stunting.py` gives wrong behaviour in Pandas v1.2 #942

matt-graham · 2023-04-17T15:06:45Z

In the stunting module, the Series.between method is passed inclusive='left' at various points

Lines 216 to 217 in d6dc5d8

    
           mask = df.is_alive & df.age_exact_years.between(low_bound_age_in_years, high_bound_age_in_years, 
        
                                                           inclusive='left')

TLOmodel/src/tlo/methods/stunting.py

Lines 315 to 329 in d6dc5d8

    
           Predictor('age_exact_years', 
        
                     conditions_are_exhaustive=True, 
        
                     conditions_are_mutually_exclusive=True).when('< 0.5', 
        
                                                                  p['base_inc_rate_stunting_by_agegp'][0]) 
        
                                                            .when('.between(0.5, 1.0, inclusive="left")', 
        
                                                                  p['base_inc_rate_stunting_by_agegp'][1]) 
        
                                                            .when('.between(1.0, 2.0, inclusive="left")', 
        
                                                                  p['base_inc_rate_stunting_by_agegp'][2]) 
        
                                                            .when('.between(2.0, 3.0, inclusive="left")', 
        
                                                                  p['base_inc_rate_stunting_by_agegp'][3]) 
        
                                                            .when('.between(3.0, 4.0, inclusive="left")', 
        
                                                                  p['base_inc_rate_stunting_by_agegp'][4]) 
        
                                                            .when('.between(4.0, 5.0, inclusive="left")', 
        
                                                                  p['base_inc_rate_stunting_by_agegp'][5]) 
        
                                                            .when('> 5.0', 0.0),

TLOmodel/src/tlo/methods/stunting.py

Lines 351 to 353 in d6dc5d8

    
           Predictor().when( 
        
               '(nb_breastfeeding_status == "none") & (age_exact_years.between(0.5, 2.0, inclusive="left"))', 
        
               p['rr_stunting_no_continued_breastfeeding']),

TLOmodel/src/tlo/methods/stunting.py

Lines 362 to 376 in d6dc5d8

    
           Predictor('age_exact_years', 
        
                     conditions_are_exhaustive=True, 
        
                     conditions_are_mutually_exclusive=True).when('< 0.5', 
        
                                                                  p['r_progression_severe_stunting_by_agegp'][0]) 
        
                                                            .when('.between(0.5, 1.0, inclusive="left")', 
        
                                                                  p['r_progression_severe_stunting_by_agegp'][1]) 
        
                                                            .when('.between(1.0, 2.0, inclusive="left")', 
        
                                                                  p['r_progression_severe_stunting_by_agegp'][2]) 
        
                                                            .when('.between(2.0, 3.0, inclusive="left")', 
        
                                                                  p['r_progression_severe_stunting_by_agegp'][3]) 
        
                                                            .when('.between(3.0, 4.0, inclusive="left")', 
        
                                                                  p['r_progression_severe_stunting_by_agegp'][4]) 
        
                                                            .when('.between(4.0, 5.0, inclusive="left")', 
        
                                                                  p['r_progression_severe_stunting_by_agegp'][5]) 
        
                                                            .when('> 5.0', 1.0),

This follows the usage recommended in the guidance notes in the wiki I wrote. Unfortunately it turns out I didn't check the Pandas documentation carefully enough when writing these, as for Pandas v1.2 inclusive should be a boolean argument corresponding to whether to include both bounds in the test or not. In Pandas v1.3 inclusive was changed to instead be a string argument which can take values in {"both", "neither", "left", "right"}, which corresponds to the expected behaviour for the usage recommended in the wiki and currently used in stunting. Passing a string for the inclusive argument in Pandas v1.2.2 (the version we are currently using) doesn't raise an error as the argument value is used directly as a boolean condition. As non-empty strings correspond to True in Python this means with Pandas v1.2.2 using inclusive="left" acts equivalently to inclusive=True (and to inclusive="both" in Pandas v1.3 and above).

This means currently the various usages in stunting are not behaving as expected. This I think explains at least some of the differences we see between the population dataframes outputted by simulations using Pandas v1.2.2 and v2.0.0 as mentioned in #763 (comment).

The text was updated successfully, but these errors were encountered:

matt-graham added the bug Something isn't working label Apr 17, 2023

matt-graham added this to the Update to Python 3.11 / Pandas 2.0 milestone Apr 17, 2023

matt-graham self-assigned this Apr 17, 2023

matt-graham mentioned this issue Apr 17, 2023

Fix incorrect usages of inclusive kwarg to Series.between #943

Merged

tamuri closed this as completed in #943 Apr 18, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Passing string to `inclusive` kwarg of `Series.between` in `src/tlo/methods/stunting.py` gives wrong behaviour in Pandas v1.2 #942

Passing string to `inclusive` kwarg of `Series.between` in `src/tlo/methods/stunting.py` gives wrong behaviour in Pandas v1.2 #942

matt-graham commented Apr 17, 2023

Passing string to inclusive kwarg of Series.between in src/tlo/methods/stunting.py gives wrong behaviour in Pandas v1.2 #942

Passing string to inclusive kwarg of Series.between in src/tlo/methods/stunting.py gives wrong behaviour in Pandas v1.2 #942

Comments

matt-graham commented Apr 17, 2023

Passing string to `inclusive` kwarg of `Series.between` in `src/tlo/methods/stunting.py` gives wrong behaviour in Pandas v1.2 #942

Passing string to `inclusive` kwarg of `Series.between` in `src/tlo/methods/stunting.py` gives wrong behaviour in Pandas v1.2 #942