Skip to content
This repository has been archived by the owner on May 5, 2022. It is now read-only.

Latest commit

 

History

History
58 lines (42 loc) · 3.22 KB

costs.md

File metadata and controls

58 lines (42 loc) · 3.22 KB

Costs

Machine’s components and processes can be tuned to raise or lower costs and efficiency based on tradeoffs and goals.

  • Adjust frequencies of expensive scheduled tasks
  • Modify run-reuse timeout for previously-calculated results
  • Change types and workloads of worker instances
  • Change types and numbers of webhook instances

Scheduled Tasks

Large tasks that use the entire OpenAddresses dataset are scheduled with AWS Cloudwatch events. Event rules are updated with details found in update-scheduled-tasks.py, and typically trigger task-specific, single-use EC2 instances via AWS Lambda code found in run-ec2-command.py.

Tasks:

Each task has a variable run frequency, instance type, and time limit.

Worker

Does the actual work of running a source and producing output files.

Workers are defined in the CI Workers 6.x AutoScaling Group, with a variable target count of m3.medium instances. When there are new jobs available via the queue, workers are added. After a quiet period, they are terminated.

Each worker instance has two parallel worker processes, set in the CI Workers 6.x Launch Configuration:

honcho -f /usr/local/src/openaddr/ops/Procfile-worker start -c worker=2

Cached results of previous runs can be re-used, as long as they are within the defined RUN_REUSE_TIMEOUT time period currently defined as 10 days.

Webhook

This Python + Flask application is the center of the OpenAddresses Machine. Webhook maintains a connection to the database and queue, listens for new CI jobs from Github event hooks on the OpenAddresses repository, queues new source runs, and displays results of batch sets over time.

It’s defined in the CI Webhooks 6.x AutoScaling Group, with a target instance count of one t2.small EC2 instance running a Gunicorn process with multiple workers.