Part 1: Building the basic scraping functionality.

Get the targeted website >> fetch('https://www.chocolate.co.uk/collections/all')
Check if it was successfull >> response
Get a targeted div of product is { >> products = response.css('product-item')}
Get first product-item >> response.css('product-item').get()
Get the lenght of the returned product >> len(products)
Get the price of the product from a span >> product.css('span.price').get().replace('\n Sale price','').replace('','')
Note: this is used when we have a html objects returns from a request instead of the data due to its nested nature.
pagination >> response.css('[rel="next"] ::attr(href)').get().

Part 2: Cleaning Dirty Data & Dealing With Edge Cases

Save data to S3 bucket 'scrapy crawl chocolatespider -O s3://aws_key:aws_secret@mybucket/path/to/myscrapeddata.csv:csv'
create a file pip freeze requirements.txt > requirements.txt

Provide feedback

Saved searches