Task 1 - Otodom scraper #4

detker · 2023-12-05T20:54:21Z

No description provided.

TheRealSeber

Generally fine, try to acquaint with pre-commit

TheRealSeber · 2023-12-07T10:56:01Z

otodom/task_1/wk/fin.py

+    headers = {
+        "User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/109.0.0.0 Safari/537.36"
+    }


we should define constans at the beggining of the file with capital letters

TheRealSeber · 2023-12-07T10:58:58Z

otodom/task_1/wk/fin.py

+    regex = re.compile("Idź do strony [0-9]+$")
+    result = doc.find_all("a", {"aria-label": regex})
+    pages_n = max(list(map(lambda x: int(x.string), result)))


well, definetly there is a better approach

You just could find the nav, and form its a childrens take the last element. The best way is always to find the most general solution when looking for the elements

TheRealSeber · 2023-12-07T11:02:19Z

otodom/task_1/wk/fin.py

+        if not response.ok:
+            with open("db.json", "w", encoding="utf-8") as file:
+                json.dump(scrapped_data, file, ensure_ascii=False, indent=4)
+            print("Already scrapped data saved in db.json")
+            print("Error occured on page " + str(i) + ". Aborting.")
+            sys.exit(1)


it may happen that for some reason 1 of 100 times the server response will simply fail on their site (5XX code). So do we really should exit on such circumstances?

TheRealSeber · 2023-12-07T11:04:11Z

otodom/task_1/wk/fin.py

+    tags = record.find_all("p")
+    for tag in tags:
+        if tag.has_attr("title"):


just find a tag with an attribute title instead of all

TheRealSeber · 2023-12-07T11:08:07Z

otodom/task_1/wk/fin.py

+    record_dict["promoted"] = (
+        True if len(record.find_all("p", string="Podbite")) > 0 else False
+    )


i dived into the html code. What if the site loaded in english? Or we would like to scrap in english? Then maybe it wouldn't work.

Finding single element p whose parent is a span i think would be more error-omitting approach

TheRealSeber · 2023-12-07T11:11:24Z

otodom/task_1/wk/fin.py

+    if len(zl) > 1:
+        record_dict["price"] = (zl[0].string + ", " + zl[1].string).replace("\xa0", " ")
+    else:
+        record_dict["price"] = ""


no need to scrap price /m^2 since we can simply divide price/area and get it

TheRealSeber · 2023-12-07T11:13:36Z

otodom/task_1/wk/fin.py

+    record_dict["estate_agency"] = (
+        False if len(record.find_all("p", string="Oferta prywatna")) > 0 else True
+    )


if there is an estate agency we would like to get its name

TheRealSeber · 2023-12-07T11:14:30Z

otodom/task_1/wk/generate_link.py

+    distance = settings["distance_radius"]
+    if distance != "None":
+        url += "?distanceRadius=" + str(distance)
+
+    # price
+    min_p = settings["price_min"]
+    max_p = settings["price_max"]
+    if min_p != "None":
+        url += "&priceMin=" + str(min_p)
+    if max_p != "None":
+        url += "&priceMax=" + str(max_p)


see
#1 (comment)

first commit

252514d

detker changed the title ~~Task 1 - Otodom scrapper~~ Task 1 - Otodom scraper Dec 6, 2023

detker added 9 commits December 7, 2023 11:03

Update fin.py

482bd45

Update generate_link.py

a6bad34

Update settings.json

19f9587

Update fin.py

fa801f2

Update generate_link.py

daa189a

Add files via upload

740a229

Update fin.py

9863d91

Update generate_link.py

65e8f85

Update generate_link.py

5b3513d

TheRealSeber requested changes Dec 7, 2023

View reviewed changes

detker added 2 commits December 17, 2023 22:44

improvements

e795e0f

improvements

c38da9f

detker merged commit e96c6ad into master Dec 30, 2023
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Task 1 - Otodom scraper #4

Task 1 - Otodom scraper #4

detker commented Dec 5, 2023

TheRealSeber left a comment •

edited

Loading

TheRealSeber Dec 7, 2023

TheRealSeber Dec 7, 2023

TheRealSeber Dec 7, 2023

TheRealSeber Dec 7, 2023

TheRealSeber Dec 7, 2023

TheRealSeber Dec 7, 2023

TheRealSeber Dec 7, 2023

TheRealSeber Dec 7, 2023

Task 1 - Otodom scraper #4

Task 1 - Otodom scraper #4

Conversation

detker commented Dec 5, 2023

TheRealSeber left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

TheRealSeber left a comment •

edited

Loading