
A tool to craw data to your projects from open data portals
Report Bug
·
Request Feature
Table of Contents
Open Data Crawler is a tool to extract data from open data portals and statistics portals. The community can contribute adding support to other data portals or adding new features.
Features:
- Download datasets from open data portals or statsitics portal
- Download metadata from resources
- Filter by data type
- Filter by topic
This is an example of how you may give instructions on setting up your project locally. To get a local copy up and running follow these simple example steps.
-
You need python 3.9 installed
-
Clone the repo
git clone https://github.com/aberenguerpas/opendatacrawler.git
-
Move to root directory
cd opendatacrawler
-
Install the requirements from requirements.txt
pip3 install -r requirements.txt
-
Socrata portals requiere an app token to avoid throttling limits, you can obtain an api key here and set on
config.ini
- Run from the project root
python3 setup.py install
Use this tool is very simple, you only need to specify the data source and the tool automatically detect the portal type and starts to dowload the data.
python opendatacrawler -d https://data.smartdublin.ie/
python opendatacrawler -d https://data.smartdublin.ie/ -m
python opendatacrawler -d https://data.smartdublin.ie/ -pd
python opendatacrawler -d https://data.smartdublin.ie/ -t xls csv
python opendatacrawler -d https://data.smartdublin.ie/ -c tourism transport
python opendatacrawler -h
For more examples, please refer to the Documentation
- CKAN
- Socrata
- https://ec.europa.eu/eurostat (LIMITED
⚠️ )* - https://datacatalogapi.worldbank.org/ (LIMITED
⚠️ )* - https://datos.gob.es
- OpenDataSoft
* Works with restrictions or download limitations
See the open issues for a full list of proposed features (and known issues).
Contributions are what make the open source community such an amazing place to learn, inspire, and create. Any contributions you make are greatly appreciated.
If you have a suggestion or add site/portal support that would make this better, please fork the repo and create a pull request. You can also simply open an issue with the tag "enhancement". Don't forget to give the project a star! Thanks again!
- Fork the Project
- Create your Feature Branch (
git checkout -b feature/AmazingFeature
) - Commit your Changes (
git commit -m 'Add some AmazingFeature'
) - Push to the Branch (
git push origin feature/AmazingFeature
) - Open a Pull Request
- Create a file with the name of the portal + crawler Ex.
examplecrawler.py
inside the folderopendatacrawler
- Create a class ExampleCrawler who inherits from
OpenDataCrawlerInterface
- The class must contain at least the functions
get_package_list()
andget_package()
Check the descriptions of the functions on theopendatacrawlerInterface.py
- You can also use or add some functions to
utils.py
- Add in the function
detect_dms()
onodcrawler.py
a way to detect the site you want to add.
Distributed under the MIT License. See LICENSE
for more information.
🙋♂️Alberto Berenguer Pastor
📱@aberenguerpas
✉️ [email protected]