Periodic and event-driven processes paths through components and persistent data stores.
Batch sets are used approximately once per week.
-
Run the batch enqueue with the script
openaddr-enqueue-sources
. This will require a current Github access token and a connection to the machine database:openaddr-enqueue-sources -t <Github Token> -d <Database URL>
-
Complete sources are read from Github’s API using the current master branch of the OpenAddresses repository.
-
A new empty set is created in the
sets
table, and becomes visible at results.openaddresses.io/sets. -
New runs are slowly drip-fed into the
tasks
queue. New items are only enqueued when the queue length is zero, to prevent Worker auto-scale costs from ballooning. -
Worker processes runs from the queue, storing results in S3 and passing completed runs to the
done
queue. -
Completed run information is handled by Dequeuer.
-
When all runs are finished, new coverage maps are rendered and
openaddr-enqueue-sources
exits successfully.
Continuous integration jobs are used each time an OpenAddresses contributor modifies the main repository with a pull request.
-
A contributor issues a pull request.
-
Github posts a blob of JSON data describing the edits to Webhook
/hook
endpoint. -
Webhook immediately attempts to create a new empty job in the
jobs
table and enqueues any new source runs found in the edits.If this step fails, an error status is posted back to the Github status API, and no job or run is created.
If this step succeeds, a pending status is posted back to the Github status API, and the job becomes visible at results.openaddresses.io/jobs.
-
Worker processes runs from the queue, storing results in S3 and passing completed runs to the
done
queue. -
Completed run information is handled by Dequeuer.
-
When all runs are finished, a final success or failure status is posted back to the Github status API.
New Zip collections are generated every other night.
-
Run the collection with the script
openaddr-collect-extracts
. This will require a connection to the machine database and S3 access credentials in environment variables:openaddr-collect-extracts -d <Database URL>
-
Current data is read from the
sets
andruns
tables, using the most-recent successful run for each source listed in the most recent set. This will include older successful runs for sources that have since failed. -
New Zip archives are created for geographic regions of the world.
-
Zip archives are uploaded to S3 in predictable locations overwriting previous archives, and immediately available from results.openaddresses.io.