This Python project is a web scraper that extracts real estate listings (such as apartments for sale) from the website Divar. It uses Selenium
and BeautifulSoup
to automate the extraction of data like title, price, area, and additional details of apartments for sale in various neighborhoods of Shiraz (or any other city of your choice).
- Uses
Selenium
with a Firefox WebDriver to dynamically load pages. - Prevents loading of images to improve scraping speed.
- Extracts and saves property details, including title, subtitle, area (in meters), year of construction, number of rooms, total price, price per meter, floor, and a description.
- Saves the extracted data to a CSV file.
- Easily adjustable to target different cities or areas by changing the URLs.
To run this project, you need the following dependencies:
- Python 3.x
selenium
beautifulsoup4
geckodriver
(for Firefox WebDriver)csv
(part of Python's standard library)
You can install the necessary Python packages using pip:
pip install selenium beautifulsoup4
Make sure to have Firefox installed along with geckodriver
. You can download geckodriver
from here and add it to your PATH.
The script works in the following steps:
- Initialize Selenium WebDriver: The
UrlScraper
class initializes a Firefox WebDriver instance with options to prevent image loading for better performance. - Load Pages: It loads pages dynamically from the real estate section of Divar and scrolls down to fetch more listings.
- Scrape Data: Extracts details of properties like title, subtitle, price, area, number of rooms, and other relevant details using
BeautifulSoup
. - Save to CSV: The data is saved to a CSV file (
DivarScraper.csv
) in a structured format with appropriate headers. - Tear Down: Once all the data has been fetched and saved, the WebDriver and file resources are closed.
- Clone the repository or download the script.
- Ensure that the necessary dependencies are installed.
- Modify the list of neighborhoods or city areas (found in the
urls
list) if needed, to match your preferred search regions. - Run the script:
python divar_scraper.py
The script will fetch apartment data from Divar and save it to DivarScraper.csv
in the working directory.
- Change the City: To scrape data from other cities, modify the base URL in the
scrap_page
method. For example, replaceshiraz
with another city. - Change Scraping Parameters: The query string parameters in the URLs, such as
building-age
andsort
, can be adjusted to modify search results.
The scraped data is saved in a CSV file with the following columns:
Mahale
: NeighborhoodTitle
: Property titleSubtitle
: Subtitle (if available)Meter
: Area of the apartmentYear
: Year of constructionRoom
: Number of roomsTotalPrice
: Total price of the apartmentMeterPrice
: Price per square meterFloor
: Floor numberDescription
: Additional description about the property
- The scraper includes a delay of a few seconds to ensure that the data is fully loaded on the page before scraping.
- The current script is designed to scrape apartments in the city of Shiraz, but can easily be adapted for other cities or property types by modifying the base URLs.
- Be cautious when scraping websites and ensure that you comply with their terms of service.
- Add multi-threading to speed up the scraping process.
- Implement error handling for missing data fields.
- Support for other property types (e.g., rentals, commercial properties).
This project is licensed under the MIT License.
Please feel free to get in touch with me -> [email protected]