-
Notifications
You must be signed in to change notification settings - Fork 41
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Close pending statements on connection close #170
base: main
Are you sure you want to change the base?
Conversation
bebd337
to
394e3f9
Compare
} | ||
// Closing remaining statements is not required by JDBC spec, | ||
// but it is a reasonable expectation from clients point of view. | ||
List<DuckDBPreparedStatement> stmtList = new ArrayList<>(statements); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why creating a new ArrayList
to iterate over statements
?
What if statements
is already empty?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the review! This set is being modified by statements themselves, when they are closed. So we are getting the local copy to iterate over. If it is empty - then there is nothing to close.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be better to use a concurrent set (like ConcurrentHashMap.newKeySet()) instead of using a non-concurrent one and making a defensive copy for iteration ? (Assuming order of iteration is not required to be maintained)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the review, another pair of eyes on concurrency topics is always appreciated! ConcurrentHashSet
was considered, but without additional locking it is not "synchronized enough" (we don't want new elements added when removal is running, though this scenario handling seems to be incomplete now - needs to be improved). And with additional locking we don't need additional "synchronization" that happens inside the ConcurrentHashMap
. Also the order of destruction of statements is a nice property (perhaps needs to be reversed to follow "last created - first deleted" convention from C++), another list will still be needed for it with ConcurrentHashSet
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR!
Can we perhaps add some tests with multi-threading as well? I'm not sure how this works in the Java world but I can imagine there being some potential problems when one thread is using a prepared statement and the other closes the connection.
Thanks for the review! While, in general, client code is not expected to use |
The big one that we use (primarily) for motherduck, is In one thread we use the standard JDBC flow for running a query.
In another thread we may need to kill the query:
The blocking nature of the JDBC API make this frustratingly tough to make reasonably threadsafe. It is nice if the statement close also cancels queries, but that isn't the case with all JDBC drivers 😭 |
if (conn_ref == null) { | ||
return true; | ||
} | ||
synchronized (this) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The use of "synchronized" might cause thread-pinning when used with virtual threads (atleast for java versions from 19 to 23). Would it be better to use a ReentrantLock instead ?
Similar discussion in pgjdbc: pgjdbc/pgjdbc#1951
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
M, thread pinning here is either very short (when synchronized
block only flips the flag) or unavoidable, when it goes to a native call. Unlike Postgres, there is no IO done from Java in DuckDB - the pinned thread is used to do the actual work in DB engine code. At the same time, ReentrantLock
s (with a few volatile
fields) should not be in any way worse than synchronized
blocks. So perhaps it makes sense to use them consistently instead of synchronized
.
} | ||
// Closing remaining statements is not required by JDBC spec, | ||
// but it is a reasonable expectation from clients point of view. | ||
List<DuckDBPreparedStatement> stmtList = new ArrayList<>(statements); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be better to use a concurrent set (like ConcurrentHashMap.newKeySet()) instead of using a non-concurrent one and making a defensive copy for iteration ? (Assuming order of iteration is not required to be maintained)
Thanks for the details! With concurrent closing, despite no synchronization at all between I now have a SIGSEGV reproducer and going to add synchronization in JNI (perhaps moving all synchronization from Java there too). And cancelling of queries seems to work reliably, so going to add cancelling before closing the statements in a connection cleanup. The hanging is also reproducible, when connection is closed while some query is still running, it happens even when corresponding statement is closed beforehands. Cancelling queries before closing statements seems to be solving the hanging as well. |
897e47f
to
a2fc047
Compare
I've added concurrent tests and implemented synchronization in the native part to make these tests to not crash or hang. Now all operations on connections, statements and results are only performed while holding a lock specific to this object. I've ended up using global registries to keep the locks for objects shared with Java part (added longer description on registries and their usage to holders.hpp). These registries are clunky, but I was unable to get anything more elegant (like Locking is done as straightforward as possible, only scoped
This is very verbose, but at least should be straightforward to maintain if used consistently. Another thing, is that the long queries execution is done while holding a statement lock. I was thinking on releasing the lock while query is running (and re-locking to prepare the result to pass it to Java), but decided to keep this part simple (at least for now). Query interrupt seems to be effective to quickly stop running queries, this interrupt is used on connection when it is closed. Also, I did not touch synchronization in Appender and in Arrow - going to address these parts separately. PS: |
a2fc047
to
c9501be
Compare
This change adds synchronization between accessing and deleting underlying native objects for `Connection`, `Statement` and `ResultSet`. All synchronization is done in JNI part, Java-level synchronization is removed from a few places where it was used. `volatile` fields are added when checking whether the object is closed or not. `Connection`'s underlying native object maintains a list of `Statement`s currently open on this `Connection`. These statements are closed when the connection is closed. Running queries are cancelled (interrupted) automatically when the `Connection` is closed. Note: `Statement.close()` is blocked if a long query, that was started from this statement, is still running. `Statement.cancel()` must be called manually before calling `close()` to perform the closing promptly. This cannot be done automatically, because `cancel()` is implemented in the engine as a `Connection`-level operation, thus calling `cancel()` on a `Statement` can interrupt the query running on another `Statement` on the same `Connection`. Synchronization for `Appender` is going to be added in a separate PR. Testing: new tests added for various sequential and concurrent closure scenarios. Fixes: duckdb#101
c9501be
to
2c789d0
Compare
This change adds synchronization between accessing and deleting
underlying native objects for
Connection
,Statement
andResultSet
. All synchronization is done in JNI part, Java-levelsynchronization is removed from a few places where it was used.
volatile
fields are added when checking whether the object is closedor not.
Connection
's underlying native object maintains a list ofStatement
scurrently open on this
Connection
. These statements are closed whenthe connection is closed. Running queries are cancelled (interrupted)
automatically when the
Connection
is closed.Note:
Statement.close()
is blocked if a long query, that was startedfrom this statement, is still running.
Statement.cancel()
must becalled manually before calling
close()
to perform the closingpromptly. This cannot be done automatically, because
cancel()
isimplemented in the engine as a
Connection
-level operation, thuscalling
cancel()
on aStatement
can interrupt the query runningon another
Statement
on the sameConnection
.Synchronization for
Appender
is going to be added in a separate PR.Testing: new tests added for various sequential and concurrent closure
scenarios.
Fixes: #101
Edit: description is updated to match updated impl.