-
Notifications
You must be signed in to change notification settings - Fork 46
Close pending statements on connection close #170
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
bebd337
to
394e3f9
Compare
} | ||
// Closing remaining statements is not required by JDBC spec, | ||
// but it is a reasonable expectation from clients point of view. | ||
List<DuckDBPreparedStatement> stmtList = new ArrayList<>(statements); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why creating a new ArrayList
to iterate over statements
?
What if statements
is already empty?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the review! This set is being modified by statements themselves, when they are closed. So we are getting the local copy to iterate over. If it is empty - then there is nothing to close.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be better to use a concurrent set (like ConcurrentHashMap.newKeySet()) instead of using a non-concurrent one and making a defensive copy for iteration ? (Assuming order of iteration is not required to be maintained)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the review, another pair of eyes on concurrency topics is always appreciated! ConcurrentHashSet
was considered, but without additional locking it is not "synchronized enough" (we don't want new elements added when removal is running, though this scenario handling seems to be incomplete now - needs to be improved). And with additional locking we don't need additional "synchronization" that happens inside the ConcurrentHashMap
. Also the order of destruction of statements is a nice property (perhaps needs to be reversed to follow "last created - first deleted" convention from C++), another list will still be needed for it with ConcurrentHashSet
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR!
Can we perhaps add some tests with multi-threading as well? I'm not sure how this works in the Java world but I can imagine there being some potential problems when one thread is using a prepared statement and the other closes the connection.
Thanks for the review! While, in general, client code is not expected to use |
The big one that we use (primarily) for motherduck, is In one thread we use the standard JDBC flow for running a query.
In another thread we may need to kill the query:
The blocking nature of the JDBC API make this frustratingly tough to make reasonably threadsafe. It is nice if the statement close also cancels queries, but that isn't the case with all JDBC drivers 😭 |
if (conn_ref == null) { | ||
return true; | ||
} | ||
synchronized (this) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The use of "synchronized" might cause thread-pinning when used with virtual threads (atleast for java versions from 19 to 23). Would it be better to use a ReentrantLock instead ?
Similar discussion in pgjdbc: pgjdbc/pgjdbc#1951
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
M, thread pinning here is either very short (when synchronized
block only flips the flag) or unavoidable, when it goes to a native call. Unlike Postgres, there is no IO done from Java in DuckDB - the pinned thread is used to do the actual work in DB engine code. At the same time, ReentrantLock
s (with a few volatile
fields) should not be in any way worse than synchronized
blocks. So perhaps it makes sense to use them consistently instead of synchronized
.
} | ||
// Closing remaining statements is not required by JDBC spec, | ||
// but it is a reasonable expectation from clients point of view. | ||
List<DuckDBPreparedStatement> stmtList = new ArrayList<>(statements); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be better to use a concurrent set (like ConcurrentHashMap.newKeySet()) instead of using a non-concurrent one and making a defensive copy for iteration ? (Assuming order of iteration is not required to be maintained)
Thanks for the details! With concurrent closing, despite no synchronization at all between I now have a SIGSEGV reproducer and going to add synchronization in JNI (perhaps moving all synchronization from Java there too). And cancelling of queries seems to work reliably, so going to add cancelling before closing the statements in a connection cleanup. The hanging is also reproducible, when connection is closed while some query is still running, it happens even when corresponding statement is closed beforehands. Cancelling queries before closing statements seems to be solving the hanging as well. |
897e47f
to
a2fc047
Compare
I've added concurrent tests and implemented synchronization in the native part to make these tests to not crash or hang. Now all operations on connections, statements and results are only performed while holding a lock specific to this object. I've ended up using global registries to keep the locks for objects shared with Java part (added longer description on registries and their usage to holders.hpp). These registries are clunky, but I was unable to get anything more elegant (like Locking is done as straightforward as possible, only scoped
This is very verbose, but at least should be straightforward to maintain if used consistently. Another thing, is that the long queries execution is done while holding a statement lock. I was thinking on releasing the lock while query is running (and re-locking to prepare the result to pass it to Java), but decided to keep this part simple (at least for now). Query interrupt seems to be effective to quickly stop running queries, this interrupt is used on connection when it is closed. Also, I did not touch synchronization in Appender and in Arrow - going to address these parts separately. PS: |
c9501be
to
2c789d0
Compare
a45f49c
to
bc6b233
Compare
bc6b233
to
9d5cf66
Compare
This is a version 3 of this PR that attempts to safely close pending statements when the connection is closed. In it all tracking and locking logic is moved from C++ to Java: - `Connection`, `Statement` and `ResultSet` instances use their own locks; the check that corresponding reference is still alive is performed before every native call after obtaining the lock. - `Statement` lock is held during the query execution, note, this lock is NOT requied to call `statement#cancel()` because this operation is implemented on a `Connection` level. - When a `Connection` is being closed, pending query is cancelled first, and then all active statements are closed in a reverse creation order. Note, thread safety for Appender and Arrow interfaces is going to be addressed in subsequent PRs. Testing: new tests added for various sequential and concurrent closure scenarios. Fixes: duckdb#101
9d5cf66
to
8562602
Compare
This is a version 3 of this PR that attempts to safely close pending
statements when the connection is closed. In it all tracking and
locking logic is moved from C++ to Java:
Connection
,Statement
andResultSet
instances use their ownlocks; the check that corresponding reference is still alive is
performed before every native call after obtaining the lock.
Statement
lock is held during the query execution, note, this lockis NOT requied to call
statement#cancel()
because this operation isimplemented on a
Connection
level.Connection
is being closed, pending query is cancelledfirst, and then all active statements are closed in a reverse creation
order.
Note, thread safety for Appender and Arrow interfaces is going to be
addressed in subsequent PRs.
Testing: new tests added for various sequential and concurrent closure
scenarios.
Fixes: #101
Edit: description is updated to match updated impl.