Costs

Machine’s components and processes can be tuned to raise or lower costs and efficiency based on tradeoffs and goals.

Adjust frequencies of expensive scheduled tasks
Modify run-reuse timeout for previously-calculated results
Change types and workloads of worker instances
Change types and numbers of webhook instances

Scheduled Tasks

Large tasks that use the entire OpenAddresses dataset are scheduled with AWS Cloudwatch events. Event rules are updated with details found in update-scheduled-tasks.py, and typically trigger task-specific, single-use EC2 instances via AWS Lambda code found in run-ec2-command.py.

Tasks:

Calculate Coverage
Collect Extracts
Enqueue Sources
Index Tiles
Update Dotmap

Each task has a variable run frequency, instance type, and time limit.

Worker

Does the actual work of running a source and producing output files.

Workers are defined in the CI Workers 6.x AutoScaling Group, with a variable target count of m3.medium instances. When there are new jobs available via the queue, workers are added. After a quiet period, they are terminated.

Each worker instance has two parallel worker processes, set in the CI Workers 6.x Launch Configuration:

honcho -f /usr/local/src/openaddr/ops/Procfile-worker start -c worker=2

Cached results of previous runs can be re-used, as long as they are within the defined RUN_REUSE_TIMEOUT time period currently defined as 10 days.

Webhook

This Python + Flask application is the center of the OpenAddresses Machine. Webhook maintains a connection to the database and queue, listens for new CI jobs from Github event hooks on the OpenAddresses repository, queues new source runs, and displays results of batch sets over time.

It’s defined in the CI Webhooks 6.x AutoScaling Group, with a target instance count of one t2.small EC2 instance running a Gunicorn process with multiple workers.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!