-
Notifications
You must be signed in to change notification settings - Fork 23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding GAO reports (not GAO IG) #302
Conversation
Amazing, thank you! I'll take a look at this in a few days. |
Hell yeah! I'll let @divergentdave review and merge, but this is super solid work, thank you. |
I pushed some miscellaneous changes throughout for error handling, style, etc. I ran it on a year and things look good, going to run it over the full archive next. Note to self: need to add this to |
Okay, this looks good. I saw a light dusting of 404 errors and duplicate report IDs, mostly in older years. I'm going to merge this, do an archive run on the production server, index everything, and then add it to |
The scraping is done, the reports are ingested, and I've added it to |
Awesome, thank you! |
Much belatedly, here is a scraper for GAO reports and restricted reports per #269. It gathers 52,000 reports--90GB--dating back to 1940, though the default year for archiving is set to 1970 here.