About The Project

Open Data Crawler is a tool to extract data from open data portals and statistics portals. The community can contribute adding support to other data portals or adding new features.

Features:

Download datasets from open data portals or statsitics portal
Download metadata from resources
Filter by data type
Filter by topic

(back to top)

Getting Started

This is an example of how you may give instructions on setting up your project locally. To get a local copy up and running follow these simple example steps.

Requirements

You need python 3.9 installed

Clone the repo

git clone https://github.com/aberenguerpas/opendatacrawler.git

Move to root directory
```
cd opendatacrawler
```
Install the requirements from requirements.txt
```
pip3 install -r requirements.txt
```
Socrata portals requiere an app token to avoid throttling limits, you can obtain an api key here and set on config.ini

Installation

Run from the project root
```
python3 setup.py install 
```

(back to top)

Usage

Use this tool is very simple, you only need to specify the data source and the tool automatically detect the portal type and starts to dowload the data.

Examples

Dowload all data from a portal:

python opendatacrawler -d https://data.smartdublin.ie/

Dowload all data with their metadata:

python opendatacrawler -d https://data.smartdublin.ie/ -m

Dowload partial dataset (first 50 lines for csv files):

python opendatacrawler -d https://data.smartdublin.ie/ -pd

Dowload specific fromat data. For example xls and csv:

python opendatacrawler -d https://data.smartdublin.ie/ -t xls csv

Dowload specifics categories. For example xls and csv:

python opendatacrawler -d https://data.smartdublin.ie/ -c tourism transport

Help with all posible commands:

python opendatacrawler -h

For more examples, please refer to the Documentation

(back to top)

Currently supported portals and sites

CKAN
Socrata
https://ec.europa.eu/eurostat (LIMITED ⚠️)*
https://datacatalogapi.worldbank.org/ (LIMITED ⚠️)*
https://datos.gob.es
OpenDataSoft

* Works with restrictions or download limitations

See the open issues for a full list of proposed features (and known issues).

(back to top)

Contributing

Contributions are what make the open source community such an amazing place to learn, inspire, and create. Any contributions you make are greatly appreciated.

If you have a suggestion or add site/portal support that would make this better, please fork the repo and create a pull request. You can also simply open an issue with the tag "enhancement". Don't forget to give the project a star! Thanks again!

Fork the Project
Create your Feature Branch (git checkout -b feature/AmazingFeature)
Commit your Changes (git commit -m 'Add some AmazingFeature')
Push to the Branch (git push origin feature/AmazingFeature)
Open a Pull Request

Add support to other portal

Create a file with the name of the portal + crawler Ex. examplecrawler.py inside the folder opendatacrawler
Create a class ExampleCrawler who inherits from OpenDataCrawlerInterface
The class must contain at least the functions get_package_list() and get_package() Check the descriptions of the functions on the opendatacrawlerInterface.py
You can also use or add some functions to utils.py
Add in the function detect_dms() on odcrawler.py a way to detect the site you want to add.

(back to top)

License

Distributed under the MIT License. See LICENSE for more information.

(back to top)

Colaborators

Contact

🙋‍♂️Alberto Berenguer Pastor
📱@aberenguerpas
✉️ alberto.berenguer@ua.es

(back to top)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!