Skip to content

Upgrade indicators to pandas v2.0.0 #1820

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
krivard opened this issue Apr 4, 2023 · 4 comments · Fixed by #1825
Closed

Upgrade indicators to pandas v2.0.0 #1820

krivard opened this issue Apr 4, 2023 · 4 comments · Fixed by #1825
Assignees
Labels
chore refactor Long-term projects to revise existing machinery

Comments

@krivard
Copy link
Contributor

krivard commented Apr 4, 2023

Pandas v2.0.0 was released 2023-04-03, and includes a few breaking changes. Among them:

  • .sum() now handles all types by default (e.g. strings, which get concatenated). There's a parameter to restrict to numeric types only.
  • .append() not allowed for data frames
  • .iteritems() not allowed for series

We've pinned to <2.0 for now, but we should start working on fixing our code to work with v2.

@melange396
Copy link
Contributor

Are we sure #1825 was fully tested for pandas v2 compatibility? I discovered recently that we are still stuck with "pandas<2" for the indicators (because the covidcast client has that requirement, and delphi_utils requres covidcast, and delphi_utils is imported just about everywhere in this repo). See #1972 (comment)

@melange396 melange396 reopened this Jul 11, 2024
@melange396 melange396 added the refactor Long-term projects to revise existing machinery label Jul 11, 2024
@melange396
Copy link
Contributor

@dshemetov
Copy link
Contributor

FWIW, cmu-delphi/covidcast#618 (covidcast Pandas<2 pin PR) was merged after #1825, so at the very least, the latter was definitely tested on Pandas>2. Nat made all the tests pass and I did a number of passes through the changelog and searching the repo for changed functions, trying to be thorough.

Side-note: it'd be nice if we had a way to easily AB test changes. Like, if we could deploy a Pandas>2 branch and compare its outputs with prod for a few weeks, we could increase our confidence in the change.

@melange396
Copy link
Contributor

Ah, good call. To fill the timeline out a little more: it looks like delphi_utils pinned to pandas<2 on 4 april 2023, corresponding to its release 0.3.12 on the same day, then the pin was removed on 11 april 2023, corresponding to release 0.3.13 on the next day, then covidcast pinned pandas<2 on 15 may 2023, presumably corresponding to its own release 0.2.0 on 23 may 2023, which wouldnt have gotten picked up by delphi_utils until release 0.3.16 on 1 june 2023... So AFAICT, we ran in production w/o the pin last year from april 11 until june 1, or about a month and a half.

The lack of that pin does not necessarily guarantee we actually used pandas v2 in that time (the aforementioned #1994 wouldve been able to tell us that for sure) because some other package mightve affected the requirement constraints (for instance, delphi-epidata mightve been in the mix somewhere, and it has pinned lots of old packages), though it does seem very probable to me that we did in fact use pandas v2 in production during that time.

Reclosing!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
chore refactor Long-term projects to revise existing machinery
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants