This is a project done as a part of Data Science and AI course at Becode in 2024. To do the project we build a dataset gathering information about at least 10.000 properties all over Belgium.
- Web scraping the website Immoweb to gather properties data.
- Saving the data in CSV format for further processing.
Make sure you have the following:
- Python 3.x installed.
- pip for managing Python packages.
- for the required libraries please refer to requirements.txt --- install using the command pip install -r utils/requirements.txt
This script will:
- Retrieve a list of properties from the HTML page source of the website.
- Extract poperties' information from immoweb for each property.
- Save the output in a CSV file which could be used for related analysis.
The project has the following core components:
-
utils: is a directory contains data files property_links.csv all_properties_output.csv
-
main.py --- To execute the project using python main.py
fetch_links(): Uses Requests and BeautifulSoup to get a list of properties' URLs. get_property_data(): Uses BeautifulSoup to scrape the property's data and saves it to a CSV file, using the list of URLS. clean_save_dataset(): Uses Pandas to clean the dataset and saves it to another csv file.
-
requirements.txt : contains list of dependencies for the project.
This proects is done by: