Machine’s components and processes can be tuned to raise or lower costs and efficiency based on tradeoffs and goals.
- Adjust frequencies of expensive scheduled tasks
- Modify run-reuse timeout for previously-calculated results
- Change types and workloads of worker instances
- Change types and numbers of webhook instances
Large tasks that use the entire OpenAddresses dataset are scheduled with AWS Cloudwatch events.
Event rules are updated with details found in update-scheduled-tasks.py
,
and typically trigger task-specific, single-use EC2 instances via AWS Lambda
code found in run-ec2-command.py
.
Tasks:
Each task has a variable run frequency, instance type, and time limit.
Does the actual work of running a source and producing output files.
Workers are defined in the CI Workers 6.x AutoScaling Group,
with a variable target count of m3.medium
instances. When there are new jobs
available via the queue, workers are added. After a quiet period, they are
terminated.
Each worker instance has two parallel worker processes, set in the CI Workers 6.x Launch Configuration:
honcho -f /usr/local/src/openaddr/ops/Procfile-worker start -c worker=2
Cached results of previous runs can be re-used, as long as they are within the
defined RUN_REUSE_TIMEOUT
time period
currently defined as 10 days.
This Python + Flask application is the center of the OpenAddresses Machine. Webhook maintains a connection to the database and queue, listens for new CI jobs from Github event hooks on the OpenAddresses repository, queues new source runs, and displays results of batch sets over time.
It’s defined in the CI Webhooks 6.x AutoScaling Group,
with a target instance count of one t2.small
EC2 instance running a Gunicorn
process with multiple workers.