Downloads paper filings from the Companies House API.
Uses Scrapy to manage downloading account filings from the Companies House API.
NOTE: This project was created as a prototype
scrapy crawl latest_paper_filing
Assumptions:
- Conda package manager installed
Steps:
Clone the project:
clone git@github.com:ONSBigData/companies_house_filing_fetcher.gitSet up the Python environment:
cd companies_house_filing_fetcher
conda env create -f environment.yml
conda activate pdf_downloaderDownload a BasicCompanyDataAsOneFile csv from http://download.companieshouse.gov.uk/en_output.html.
Copy the config files to ~/config and edit their contents:
mkdir ~/config
cp ch_api_key.example.ini ~/config/ch_api_key_example.ini
cp filing_fetcher_config.example.yml ~/config/filing_fetcher_config.ymlReview config values in spiders/settings.py.
Run the downloader:
scrapy crawl latest_paper_filing