Skip to content

Commit 66a03ae

Browse files
committed
docs: first two tasks for Otodom and Pracuj.pl
1 parent fdaeede commit 66a03ae

File tree

3 files changed

+83
-0
lines changed

3 files changed

+83
-0
lines changed

README.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
### Otodom / Pracuj tasks repository
2+
This repository are going to contain early stages of the pracuj / otodom scrapers. Here we will create fundamentals for the future one big project and the done things are going to be merged into one.
3+
4+
The tasks you can find in the corresponding directories.

otodom/task_1/task_1.md

Lines changed: 43 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,43 @@
1+
# Otodom
2+
### Task 1
3+
- #### First stage
4+
Let the User provide a link for the page of listings. For example [this one](https://www.otodom.pl/pl/wyniki/sprzedaz/mieszkanie/mazowieckie/warszawa/warszawa). You should be able to collect information from the page and create a JSON (dict) from it as following:
5+
```json
6+
{
7+
"url": "str",
8+
"otodom_id": "str",
9+
"title" : "str",
10+
"localization": {
11+
"province": "str",
12+
"city": "str",
13+
"district": "str",
14+
"street": "str",
15+
},
16+
"promoted": "bool",
17+
"price": "int",
18+
"rooms": "int",
19+
"area": "int",
20+
"estate_agency": "str"
21+
}
22+
```
23+
If something is missing you can leave the value as an empty string.
24+
* #### Second stage
25+
The Bot should be able to iterate through all the listings pages. The listings should be again collected and the duplicates should be removed.
26+
### Task 2
27+
28+
Create a **settings.json** file. It should contain things which are going to define what bot is going to scrap. An example may look like:
29+
```json
30+
{
31+
"base_url": "str",
32+
"price_min": "str",
33+
"price_max": "str",
34+
"city": "str",
35+
"property_type": "str",
36+
"only_for_sale": "bool",
37+
"only_for_rent": "bool",
38+
...
39+
}
40+
```
41+
and so on. Anything what may be usefull **please try to include**. Dependingly on the data the URL should be somehow generated. Look into Url how the Url is changed accordingly to what search parameters you applied on the site.
42+
43+
**Solutions** you can create in the **pracuj/task1/<your_name>** file and then make create a pull request.

pracuj/task_1/task_1.md

Lines changed: 36 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,36 @@
1+
# Pracuj
2+
### Task 1
3+
- #### First stage
4+
Let the User provide a link for the page of listings. For example [this one](https://www.pracuj.pl/praca/warszawa;wp?rd=30&cc=5016%2C5015&sal=1). **We only want to fetch listings with a given salary range.** You should be able to collect information from the page and create a JSON (dict) from it as following:
5+
```json
6+
{
7+
"url": "str",
8+
"pracuj_id": "str",
9+
"title" : "str",
10+
"company": "str",
11+
"type_of_contract": "list[str]",
12+
"salary": "int",
13+
"specialization": "str",
14+
"operating_mode": "list[str]",
15+
"promoted": "bool",
16+
}
17+
```
18+
If something is missing you can leave the value as an empty string.
19+
* #### Second stage
20+
The Bot should be able to iterate through all the listings pages. The listings should be again collected and the duplicates should be removed.
21+
### Task 2
22+
23+
Create a **settings.json** file. It should contain things which are going to define what bot is going to scrap. An example may look like:
24+
```json
25+
{
26+
"base_url": "str",
27+
"salary_min": "str",
28+
"salary_max": "str",
29+
"city": "str",
30+
"category": "str",
31+
...
32+
}
33+
```
34+
and so on. Anything what may be usefull **please try to include**. Start with the most important things. Dependingly on the data the URL should be somehow generated. Look into Url how the Url is changed accordingly to what search parameters you applied on the site.
35+
36+
**Solutions** you can create in the **pracuj/task1/<your_name>** file and then make create a pull request.

0 commit comments

Comments
 (0)