Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding GAO reports (not GAO IG) #302

Merged
merged 11 commits into from
Jul 19, 2017
Merged

Conversation

lukerosiak
Copy link
Contributor

Much belatedly, here is a scraper for GAO reports and restricted reports per #269. It gathers 52,000 reports--90GB--dating back to 1940, though the default year for archiving is set to 1970 here.

@divergentdave
Copy link
Contributor

Amazing, thank you! I'll take a look at this in a few days.

@konklone
Copy link
Member

konklone commented Jul 2, 2017

Hell yeah! I'll let @divergentdave review and merge, but this is super solid work, thank you.

@divergentdave
Copy link
Contributor

I pushed some miscellaneous changes throughout for error handling, style, etc. I ran it on a year and things look good, going to run it over the full archive next.

Note to self: need to add this to safe.yml

@divergentdave
Copy link
Contributor

Okay, this looks good. I saw a light dusting of 404 errors and duplicate report IDs, mostly in older years. I'm going to merge this, do an archive run on the production server, index everything, and then add it to safe.yml.

@divergentdave divergentdave merged commit 0bcb50d into unitedstates:master Jul 19, 2017
divergentdave added a commit that referenced this pull request Jul 21, 2017
@divergentdave
Copy link
Contributor

The scraping is done, the reports are ingested, and I've added it to safe.yml going forward. All done here, thanks again @lukerosiak!

@lukerosiak
Copy link
Contributor Author

Awesome, thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants