Skip to content

Make Pool safe for use by multiple goroutines #498

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 3 commits into from

Conversation

CodyDWJones
Copy link
Contributor

@CodyDWJones CodyDWJones commented Sep 15, 2021

Reason for the change
Fixes #497

Description
WARNING: this might be a breaking change for some applications.
To be specific, if an application calls Query() from N goroutines
but doesn't close the returned cursors, and is using a pool with
fewer than N entries, the last call will block forever.
Needless to say README.md already recommends closing cursors to
prevent leaking connections, so developers were warned...

Pool entries are acquired with "non blocking mutexes" to
promote concurrency.

Remote queries are not fast compared to the instructions on the
local processor. It would be preferable to use sync.Cond so
goroutines waiting for a connection could sleep until one becomes
available. However, that introduces a race condition I wasn't able
to resolve without a (blocking) mutex. If every connection is busy,
the pool waits before trying again.

The wait logic can be overridden by the library user, since
applications have different workloads; batch loading is quite
different from individual queries serving a UI. A developer can
implement a fixed delay, exponential delay, etc. in tandem with
delay constants appropriate for their application.

Code examples

Checklist

References

@CodyDWJones CodyDWJones force-pushed the fix-connection-pool branch 2 times, most recently from c797538 to 72df7aa Compare September 15, 2021 05:29
@CodyDWJones CodyDWJones changed the title Fix for #497 - Make Pool safe for use by multiple goroutines Make Pool safe for use by multiple goroutines Sep 15, 2021
Cody Jones added 2 commits September 15, 2021 16:36
WARNING: this might be a breaking change for some applications.
To be specific, if an application calls Query() from N goroutines
but doesn't close the returned cursors, and is using a pool with
fewer than N entries, the last call will block forever.
Needless to say README.md already recommends closing cursors to
prevent leaking connections, so developers were warned...

Pool entries are acquired with "non blocking mutexes" to
promote concurrency.

Remote queries are not fast compared to the instructions on the
local processor.  It would be preferable to use sync.Cond so
goroutines waiting for a connection could sleep until one becomes
available.  However, that introduces a race condition I wasn't able
to resolve without a (blocking) mutex.  If every connection is busy,
the pool waits before trying again.

The wait logic can be overridden by the library user, since
applications have different workloads; batch loading is quite
different from individual queries serving a UI.  A developer can
implement a fixed delay, exponential delay, etc. in tandem with
delay constants appropriate for their application.
@CodyDWJones CodyDWJones force-pushed the fix-connection-pool branch 4 times, most recently from 85005ba to 28be3f2 Compare September 16, 2021 16:26
At present only the generated changefeed tests are affected, and
running gen_tests produces many distracting changes, so I manually
altered TearDownSuite() in only the changefeed tests.

(Ultimately I'd like to see rethinkdb#495 and rethinkdb#496 merged, after which
gen_tests can be run to fix TearDownSuite() in the other files
for consistency.)

This change also closes some cursors that were previously leaked in
hand-written tests.  In most cases this isn't necessary.  But like
the changefeed tests, sometimes the test framework uses multiple
goroutines, in which case the default single-connection pool will
block if any tests fail to close their cursors.

Finally, this includes a .travis.yml change previously submitted
in rethinkdb#494 so the build can be verified.
@CMogilko
Copy link
Member

Hello,
Thanks for your contribution,
But connections and pool are designed that connection can works with multiple queries (and cursors) simultaneously. Pools are used only for even balancing.
If application calls N=10 queries on a pool with M=2 connections, each connection creates N/M=5 cursors on the same socket with different query tokens.
It is very deprecated style to use a connection for a single query simultaneously, that used in first version of protocol.

So I suggest that blocking forever caused by something different. Can you provide stacktrace of blocked goroutines?

@CodyDWJones
Copy link
Contributor Author

This was a misunderstanding on my part which I regret. I'll close the issue. Thank you for clarifying!

Would you accept a PR removing the "not thread safe" comments from Connection and Cursor? It seems those were added years ago when locking was removed from Cursor (f485b19) but it was later restored (72a4a26). I think they are quite misleading. I could also expand Pool's comment so it's clear to readers it's not a traditional pool from which connections are briefly borrowed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Connection pool is not safe
2 participants