Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scrapy price monitor update #14

Open
wants to merge 8 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
57 changes: 48 additions & 9 deletions scrapy_price_monitor/.gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,6 @@ __pycache__/

# Distribution / packaging
.Python
env/
build/
develop-eggs/
dist/
Expand All @@ -20,9 +19,13 @@ lib64/
parts/
sdist/
var/
wheels/
pip-wheel-metadata/
share/python-wheels/
*.egg-info/
.installed.cfg
*.egg
MANIFEST

# PyInstaller
# Usually these files are written by a python script from a template
Expand All @@ -37,13 +40,16 @@ pip-delete-this-directory.txt
# Unit test / coverage reports
htmlcov/
.tox/
.nox/
.coverage
.coverage.*
.cache
nosetests.xml
coverage.xml
*,cover
*.cover
*.py,cover
.hypothesis/
.pytest_cache/

# Translations
*.mo
Expand All @@ -52,6 +58,8 @@ coverage.xml
# Django stuff:
*.log
local_settings.py
db.sqlite3
db.sqlite3-journal

# Flask stuff:
instance/
Expand All @@ -66,27 +74,58 @@ docs/_build/
# PyBuilder
target/

# IPython Notebook
# Jupyter Notebook
.ipynb_checkpoints

# IPython
profile_default/
ipython_config.py

# pyenv
.python-version

# celery beat schedule file
# pipenv
# According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control.
# However, in case of collaboration, if having platform-specific dependencies or dependencies
# having no cross-platform support, pipenv may install dependencies that don't work, or not
# install all needed dependencies.
#Pipfile.lock

# PEP 582; used by e.g. github.com/David-OConnor/pyflow
__pypackages__/

# Celery stuff
celerybeat-schedule
celerybeat.pid

# dotenv
.env
# SageMath parsed files
*.sage.py

# virtualenv
.venv/
# Environments
.env
.venv
env/
venv/
ENV/
env.bak/
venv.bak/

# Spyder project settings
.spyderproject
.spyproject

# Rope project settings
.ropeproject

.scrapy
# mkdocs documentation
/site

# mypy
.mypy_cache/
.dmypy.json
dmypy.json

# Pyre type checker
.pyre/

.idea
26 changes: 7 additions & 19 deletions scrapy_price_monitor/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,8 @@ Scrapy Price Monitor
====================

This is a simple price monitor built with [Scrapy](https://github.com/scrapy/scrapy)
and [Scrapy Cloud](https://scrapinghub.com/scrapy-cloud).
and [Scrapy Cloud](https://www.zyte.com/scrapy-cloud/). It is an updated version of
[this sample](https://github.com/scrapinghub/sample-projects/tree/master/scrapy_price_monitor/_scrapy_price_monitor_OLD).

It is basically a Scrapy project with one spider for each online retailer that
we want to monitor prices from. In addition to the spiders, there's a Python
Expand All @@ -19,11 +20,6 @@ the already supported retailers, just add a new key for that product and add
the URL list as its value, such as:

{
"headsetlogitech": [
"https://www.amazon.com/.../B005GTO07O/",
"http://www.bestbuy.com/.../3436118.p",
"http://www.ebay.com/.../110985874014"
],
"NewProduct": [
"http://url.for.retailer.x",
"http://url.for.retailer.y",
Expand All @@ -34,16 +30,8 @@ the URL list as its value, such as:

## Supporting Further Retailers

This project currently only works with 3 online retailers, and you can list them
running:

$ scrapy list
amazon.com
bestbuy.com
ebay.com

If the retailer that you want to monitor is not yet supported, just create a spider
to handle the product pages from it. To include a spider for samsclub.com, you
To add a retailer, just create a spider to handle the product pages from it.
To include a spider for samsclub.com, you
could run:

$ scrapy genspider samsclub.com samsclub.com
Expand Down Expand Up @@ -74,7 +62,7 @@ later when showing how to schedule the project on Scrapy Cloud.

1. Clone this repo:

$ git clone [email protected]:stummjr/scrapy_price_monitor.git
$ git clone [email protected]:further-reading/price-monitoring-sample.git

2. Enter the folder and install the project dependencies:

Expand Down Expand Up @@ -141,9 +129,9 @@ To do that, first add your Scrapy Cloud project id to [settings.py `SHUB_PROJ_ID

Then run the spiders via command line:

$ scrapy crawl bestbuy.com
$ scrapy crawl books.toscrape.com

This will run the spider named as `bestbuy.com` and store the scraped data into
This will run the spider named as `books.toscrape.com` and store the scraped data into
a Scrapy Cloud collection, under the project you set in the last step.

You can also run the price monitor via command line:
Expand Down
92 changes: 92 additions & 0 deletions scrapy_price_monitor/_scrapy_price_monitor_OLD/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,92 @@
# Byte-compiled / optimized / DLL files
__pycache__/
*.py[cod]
*$py.class

# C extensions
*.so

# Distribution / packaging
.Python
env/
build/
develop-eggs/
dist/
downloads/
eggs/
.eggs/
lib/
lib64/
parts/
sdist/
var/
*.egg-info/
.installed.cfg
*.egg

# PyInstaller
# Usually these files are written by a python script from a template
# before PyInstaller builds the exe, so as to inject date/other infos into it.
*.manifest
*.spec

# Installer logs
pip-log.txt
pip-delete-this-directory.txt

# Unit test / coverage reports
htmlcov/
.tox/
.coverage
.coverage.*
.cache
nosetests.xml
coverage.xml
*,cover
.hypothesis/

# Translations
*.mo
*.pot

# Django stuff:
*.log
local_settings.py

# Flask stuff:
instance/
.webassets-cache

# Scrapy stuff:
.scrapy

# Sphinx documentation
docs/_build/

# PyBuilder
target/

# IPython Notebook
.ipynb_checkpoints

# pyenv
.python-version

# celery beat schedule file
celerybeat-schedule

# dotenv
.env

# virtualenv
.venv/
venv/
ENV/

# Spyder project settings
.spyderproject

# Rope project settings
.ropeproject

.scrapy
Loading