Skip to content

Evaluate alternatives to pandas-datareader for downloading World Bank data #987

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
spjuhel opened this issue Dec 13, 2024 · 5 comments
Closed
Assignees
Labels
accepting pull request Contribute by raising a pull request to resolve this issue! dependencies

Comments

@spjuhel
Copy link
Collaborator

spjuhel commented Dec 13, 2024

Is your feature request related to a problem? Please describe.
The pandas-datareader package has not been updated since 2021-07-13 and is now breaking with Python 3.12 and newer versions due to the removal of distutils. This package is currently used in util.finance.py for downloading World Bank data and in related tests.

Describe the solution you'd like
Identify and migrate to an alternative for downloading World Bank data. The steps could include:

  1. Researching viable replacements for pandas-datareader, such as wbdata (see here) or direct API usage via libraries like requests or pandas.
  2. Refactoring util.finance.py to use the new solution.
  3. Updating related tests to ensure compatibility with the new implementation.

Describe alternatives you've considered

  • Continuing to use pandas-datareader by forking and maintaining a private version. However, this would increase the maintenance burden on the project.
  • Temporarily patching the current implementation to support Python 3.12 and beyond, though this is not a sustainable long-term solution.

For reference, a related issue in pandas-datareader highlights these problems: pydata/pandas-datareader#977

@spjuhel spjuhel added dependencies accepting pull request Contribute by raising a pull request to resolve this issue! labels Dec 13, 2024
@spjuhel spjuhel changed the title Evaluate Alternatives to pandas-datareader for Downloading World Bank Data Evaluate alternatives to pandas-datareader for downloading World Bank data Dec 13, 2024
@spjuhel spjuhel assigned spjuhel and peanutfun and unassigned spjuhel Mar 6, 2025
@peanutfun
Copy link
Member

Check if the data is JSON. pandas.read_json supports URLs, other read functions might as well

@spjuhel
Copy link
Collaborator Author

spjuhel commented Mar 17, 2025

Note: Update the litpop tutorial here:

Regarding the GDP (nominal GDP at current USD) and income group values, they are obtained from the World Bank using the pandas-datareader API. If a value is missing, the value of the closest year is considered. When no values are provided from the World Bank, we use the Natural Earth repository values.

@peanutfun
Copy link
Member

Researching viable replacements for pandas-datareader, such as wbdata (see here) or direct API usage via libraries like requests or pandas.

wbdata actually looks quite nice. It has a small set of dependencies and even provides a persistent cache for the downloaded data.

The World Bank data is queried via an API, so building our own solution via requests can be a bit cumbersome. I tried directly accessing the data through pandas.read_json using an API request URL, but that is not straightforward. So I think we should go with wbdata.

@spjuhel
Copy link
Collaborator Author

spjuhel commented Mar 24, 2025

I see you are assigned, are you implementing this then? Happy to review it once done if that is the case :)

@peanutfun peanutfun mentioned this issue Mar 26, 2025
13 tasks
@peanutfun
Copy link
Member

Found a solution without using wbdata, see #1033

@spjuhel spjuhel closed this as completed Mar 26, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
accepting pull request Contribute by raising a pull request to resolve this issue! dependencies
Projects
None yet
Development

No branches or pull requests

2 participants