Scraplite

Advanced Scrapy Scraper Library, made for production scale. Automated using AWS and Cerery under the hood.

Features

Scalable data extraction using Scrapy framework
Automated task scheduling with Celery
Distributed computing using AWS infrastructure
Support for handling JavaScript-rendered websites
Customizable scraping pipelines for data processing and storage

Installation

Clone the repository:

git clone https://github.com/chibuezedev/Scraprite.git

Install the dependencies using pip:

cd advanced-web-scraper
pip install -r requirements.txt

Configure the AWS credentials in settings.py:

AWS_ACCESS_KEY_ID = '<your-access-key-id>'
AWS_SECRET_ACCESS_KEY = '<your-secret-access-key>'

Set up the Celery task broker and result backend in settings.py:

CELERY_BROKER_URL = 'redis://localhost:6379/0'
CELERY_RESULT_BACKEND = 'redis://localhost:6379/0'

Start the Celery worker:

celery -A scraper worker --loglevel=info

Usage

Create a new spider by defining the scraping rules in spiders/my_spider.py. You can refer to the Scrapy documentation for more information on defining spiders.
Customize the data processing and storage pipelines in pipelines.py according to your requirements.
Run the scraper using the following command:
```
scrapy crawl my_spider
```
Replace my_spider with the name of your spider.
To schedule tasks automatically, use Celery's task scheduling mechanism. Refer to the Celery documentation for more information on scheduling tasks.

Contributing

Contributions are welcome! If you encounter any issues or have suggestions for improvements, please open an issue or submit a pull request. Please make sure to follow the code of conduct.

License

This project is licensed under the MIT License. See the LICENSE file for more details.

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
.github/workflows		.github/workflows
.idea		.idea
amazonscraper		amazonscraper
chocolatescraper		chocolatescraper
indeed		indeed
walmart		walmart
.gitignore		.gitignore
README.md		README.md
commands.md		commands.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Scraplite

Features

Installation

Usage

Contributing

License

About

Releases

Packages

Languages

chibuezedev/Scraplite

Folders and files

Latest commit

History

Repository files navigation

Scraplite

Features

Installation

Usage

Contributing

License

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages