-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update Python and packages #763
Comments
Steps :
|
Lots of errors because the old way of slicing dataframe to set categorical columns doesn't work. Column dtypes gets coerced to This demonstrates the problem (and possible solution): df = pd.DataFrame(data={'is_alive':[True, True, True], 'catcol': pd.Categorical([np.nan, np.nan, np.nan], categories=['A', 'B', 'C'])})
# In [54]: df.dtypes
# Out[54]:
# is_alive bool
# catcol category
# dtype: object
s = pd.Series(['A', 'A', 'B'])
# this works on current version, but breaks when dependencies are updated:
df.loc[df.is_alive, 'catcol'] = s.values[:]
# gives this warning:
# <ipython-input-51-f84c3c1188d6>:1: FutureWarning: In a future version, `df.iloc[:, i] = newvals` will attempt to set the values inplace instead of always setting a new array. To retain the old behavior, use either `df[df.columns[i]] = newvals` or, if columns are non-unique, `df.isetitem(i, newvals)`
# and breaks dtypes of columns
# In [52]: df.dtypes
# Out[52]:
# is_alive bool
# catcol object
# dtype: object
# this way works:
df = pd.DataFrame(data={'is_alive':[True, True, True], 'catcol': pd.Categorical([np.nan, np.nan, np.nan], categories=['A', 'B', 'C'])})
df.loc[df.is_alive, 'catcol'] = s.to_list()
df.dtypes
is_alive bool
catcol category
dtype: object
In [56]: df.catcol.cat.categories
Out[56]: Index(['A', 'B', 'C'], dtype='object')
|
I would vote for updating to Python 3.11 if we taking the pain of getting everything working with updated dependencies anyway as Python 3.11 has some performance improvements that can give significant speedups over 3.10 (10-60% depending on task). |
I think we're leaning toward using Anaconda packages. [Once we have this sorted...] I thought we could set up Dependabot to keep us up to date. There is some discussion about how to support conda envs, so might need to think of some alternative (to dependabot). |
@tamuri Can you remember which version of Pandas this breaks on? Running this example with Python 3.11 with Pandas 1.5.3 (current latest stable release) I get a deprecation warning (was previously a |
Running the non-slow tests on Python 3.11 with Pandas 2.0.0rc1 overall test summary is
with the main error types causing test failures are
Package versions in conda environment
|
Sorry, I didn't know pandas version at that time but, yes, must have been an older version. I'm getting deprecation warning too now. |
@tamuri and I discussed this issue in a meeting today (2023/03/27). The initial plan is to target getting both all tests passing with updated dependencies plus exact reproducibility (in terms of elementwise equality of final population data frame) of a run of the In terms of managing the transition, as a rough plan we may create a 'meta' branch / PR to track the changes needed to run successfully on the updated dependencies, with smaller PRs being made against this branch to deal with fixes for individual issues / failing tests, to facilitate splitting the work up. Similarly we will probably use this issue as meta-issue to track overall progress but create a task list / sub issues for specific changes. |
Another factor: how do we plan to roll this out? Will the updates be such that one will need the new environment as soon as they merge master? Or is there a grace period? I guess this is somewhat related to which versions we target. Do we make the necessary changes to allow updating the stack, whilst letting the code still run on existing version (e.g. v1.2.2 & v2 on pandas). That might not be desirable, in which case we need to have a "move day" |
/run python3.11-pandas2.0-all-tests |
1 similar comment
/run python3.11-pandas2.0-all-tests |
Run of all tests on Python v3.11 with Pandas v2.0 failed ❌🆔 12512251308 |
Workflow run failed as it looks like our self-hosted runners don't run with passwordless Might be worth updating test runners to install Python 3.11 therefore. Alternatively I could try using the |
Thanks, Matt. I've installed 3.11 on the test runners. The tox configuration didn't pick up the right version until I added Lines 15 to 17 in 9d93c77
I've not pushed that fix in case you wanted it to work some other way. Later on, we can use conda to handle installation of right version of python as well as dependencies, and update tox to use that too. |
Thanks @tamuri, I've submitted a new PR to remove the installation commands from the workflow jobs and update the |
/run python3.11-pandas2.0-all-tests |
Run of all tests on Python v3.11 with Pandas v2.0 failed ❌🆔 12538547731 |
Running the tests locally from latest
when running all tests with Have now started looking in to consistency in the population dataframes outputted using different Pandas versions. There does appear to currently be some differences - mainly concentrated in symptom columns and the |
/run python3.11-pandas2.0-all-tests |
With the fix in #943 population dataframes when running |
Run of all tests on Python v3.11 with Pandas v2.0 failed ❌🆔 12805460924 |
Same single test Runs of
and in all cases seem to be that the values are |
Changes in #944 fix the problem causing divergence in population dataframes when running with Pandas v1.2.2 vs v2.0.0 in lifestyle related properties described above (see #944 (comment)). Now running a full 20 year run of |
Failure in
to
As TLOmodel/src/tlo/methods/demography.py Lines 542 to 546 in 3c011e7
the alteration to AGE_RANGE_LOOKUP changes the index of the df_py dataframe in the test, that is df_py is equal to
rather than
with the first entry in the dataframe no longer at index 0. Indexing in the test with I assume the alteration of |
Looking into this a bit further, I think this is happening in quite a subtle way due to Lines 39 to 40 in 1ea5fa5
The behaviour of This means for example the usage in TLOmodel/tests/test_contraception.py Lines 753 to 760 in 1ea5fa5
will update |
Huh, good catch. Wrapping with |
The intention was these age range categories were fixed and all modules should use the ones setup by the demography module (or roll their own if necessary). I think that's right, @tbhallett ? |
Yeah think this should fix the immediate problem - I'm currently running tests after making basically this change (but using a |
Absolutely. |
We are getting exact equality of the final population dataframes after a full 20 year simulation of the |
/run python3.11-pandas2.0-all-tests |
Run of all tests on Python v3.11 with Pandas v2.0 succeeded ✅🆔 12902139034 |
/run python3.11-pandas2.0-all-tests |
Run of all tests on Python v3.11 with Pandas v2.0 failed ❌🆔 16880157883 |
Notes for upgrade to Python 3.11. We'll need to:
|
This particular upgrade is complete. |
WARNING: Don't do this until it's confirmed everything works!
We need to:
We've done this before in #144.
The text was updated successfully, but these errors were encountered: