A middleware that helps implementing login facility for Scrapy spiders
-
Load middleware in settings.py
SPIDER_MIDDLEWARES = { [...], 'scrapy_login.LoginMiddleware': 200, }
-
Implement
do_login(response, username, password)in your spider class.responsevar is a response from first start request. This method can returnRequestorDeferredthat resolves toRequest. -
Implement
check_login(response)method in your spider. It has to checkresponseafter login for login indicators (eg. logout button, elements that are not available without login) and returnTrueif login succeed, otherwiseFalseorstrproviding error message. -
Run your spider with arguments
usernameandpassword, for example:scrapy crawl -a username=johndoe -a password=mysecret dmoz.com