autodata-crawler

Content (car data and images) crawler for http://auto-data.net

Can crawl car data and car images. Saves car data to structured xml files, so that it can be read later preferably to a database. Car images are saved into respective folders for each type of car.

It uses xidel to do the xml parsing with respect to XPATH. The executable is included with the repository.

Code was written in 2013, but I have recently (2015-10-26) tested it and it is running.

How to run

Car data crawler

./cardata_starter.sh 
Usage: <from> (inclusive) <to> (exclusive) <per_process>
Example: 0 19000 100

Output folder: ./cardata

Numbers are taken from urls of the car data, such as http://www.auto-data.net/tr/?f=showCar&car_id=xxxxx. cardata_starter.sh creates multiple cardata.sh processes. The example (0 19000 100) creates 190 processes which handle 100 cars each.

Car images crawler

./images.sh 
Usage: <from> (inclusive) <to> (exclusive)
Example: ./images.sh 1 10667 2>/dev/null

Output folder: ./images

Limitations (please contribute!)

It uses Turkish site (http://www.auto-data.net/tr/) so the data comes in Turkish. You can change the language parameter to your liking (even better, improve it to accept the language as a parameter and send a pull request!). You should probably change images.sh:47 to correctly parse the car name.
It does not take advantage of HTTP/1.1 keep-alive feature, so it may be not very efficient.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
cardata.sh		cardata.sh
cardata_starter.sh		cardata_starter.sh
images.sh		images.sh
template.html		template.html
xidel		xidel

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

autodata-crawler

How to run

Car data crawler

Car images crawler

Limitations (please contribute!)

About

Releases

Packages

Languages

License

bekce/autodata-crawler

Folders and files

Latest commit

History

Repository files navigation

autodata-crawler

How to run

Car data crawler

Car images crawler

Limitations (please contribute!)

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages