Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Postgres + Workers production connection deadlock #3514

Open
Cherry opened this issue Feb 10, 2025 · 0 comments
Open

Postgres + Workers production connection deadlock #3514

Cherry opened this issue Feb 10, 2025 · 0 comments

Comments

@Cherry
Copy link
Contributor

Cherry commented Feb 10, 2025

Recently, I ran into a weird issue in production where sometimes our Worker would just explode, and throw obscure CONNECT_TIMEOUT errors to Postgres, despite working previously for months. Our production environment was down, but with assistance from @WalshyDev and hours of debugging, we found that massively reducing our number of PG connections resolved the problem.

However, take this example worker:

import postgres from 'postgres';

export default {
    async fetch(request, env, ctx): Promise<Response> {
        const connection = postgres('postgres://reader:[email protected]:5432/pfmegrnargs');

        const results = await Promise.all([
            connection`SELECT 1`,
            connection`SELECT 2`,
            connection`SELECT 3`,
            connection`SELECT 4`,
            connection`SELECT 5`,
            connection`SELECT 6`,
            connection`SELECT 7`,
            connection`SELECT 8`,
            connection`SELECT 9`,
            connection`SELECT 10`,
        ]);
        ctx.waitUntil(connection.end());

        return Response.json(results);
    },
} satisfies ExportedHandler<Env>;

Locally, this works without issue. However if you deploy this to production Workers, it'll end up in an exception being thrown as can be seen at https://postgres-deadlock-example.jross.workers.dev/, and errors like this in the logs:

I suspect this is something to do with the 6 concurrent connections that Workers allow, before subsequent ones are queued, but considering that these DB connections aren't closed until the end of the requests, they're hanging open forever and causing a deadlock? This is a pretty common pattern in production apps that use Hono, Next.js, etc - you have a function that either returns a db instance, or creates one and then sets it on some reusable storage like context, ALS, etc. so you don't constantly spin up and down new connections to the DB.

The default postgres max connections is 10, which is what I've been using up to this point in my works since all the Cloudflare docs don't specify any options in all of their docs examples, and I believe I've seen folks mention that defaults are fine. If I drop max to 3, with something like postgres(..., { max: 3 }), the problem doesn't happen because only 3 connections are being spun up by the driver.


I reported this to a few folks, and there's a couple of changes in docs and postgres now at:

Ideally, with TCP connections, deadlocks would be logged in some way that's much easier to debug. Chasing obscure timeout errors led to the most confusion here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant