Skip to content

Long queue monitoring and alerting #501

@danwetherald

Description

@danwetherald

Hello everyone,

I ran into an incident where we had an extremely large backlog of jobs in our queue as things exponentially fell behind and simply could not catch up.

It would have been nice if I had some monitoring setup to alert me when job queues begin to become super long.

Another fun idea is using the fly.io machines api to automatically start more worker machines until the job queue has been reduced to a normal length.

https://fly.io/docs/machines/api/machines-resource/

@rosa I wanted to see if you had a best practice to go about accomplishing such monitoring.

Thanks!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions