Close pending statements on connection close #170

staticlibs · 2025-03-20T16:02:59Z

This is a version 3 of this PR that attempts to safely close pending
statements when the connection is closed. In it all tracking and
locking logic is moved from C++ to Java:

Connection, Statement and ResultSet instances use their own
locks; the check that corresponding reference is still alive is
performed before every native call after obtaining the lock.
Statement lock is held during the query execution, note, this lock
is NOT requied to call statement#cancel() because this operation is
implemented on a Connection level.
When a Connection is being closed, pending query is cancelled
first, and then all active statements are closed in a reverse creation
order.

Note, thread safety for Appender and Arrow interfaces is going to be
addressed in subsequent PRs.

Testing: new tests added for various sequential and concurrent closure
scenarios.

Fixes: #101

Edit: description is updated to match updated impl.

jivanic-demystdata · 2025-03-20T23:07:33Z

src/main/java/org/duckdb/DuckDBConnection.java

+            }
+            // Closing remaining statements is not required by JDBC spec,
+            // but it is a reasonable expectation from clients point of view.
+            List<DuckDBPreparedStatement> stmtList = new ArrayList<>(statements);


Why creating a new ArrayList to iterate over statements?
What if statements is already empty?

Thanks for the review! This set is being modified by statements themselves, when they are closed. So we are getting the local copy to iterate over. If it is empty - then there is nothing to close.

Would it be better to use a concurrent set (like ConcurrentHashMap.newKeySet()) instead of using a non-concurrent one and making a defensive copy for iteration ? (Assuming order of iteration is not required to be maintained)

Thanks for the review, another pair of eyes on concurrency topics is always appreciated! ConcurrentHashSet was considered, but without additional locking it is not "synchronized enough" (we don't want new elements added when removal is running, though this scenario handling seems to be incomplete now - needs to be improved). And with additional locking we don't need additional "synchronization" that happens inside the ConcurrentHashMap. Also the order of destruction of statements is a nice property (perhaps needs to be reversed to follow "last created - first deleted" convention from C++), another list will still be needed for it with ConcurrentHashSet.

Mytherin

Thanks for the PR!

Can we perhaps add some tests with multi-threading as well? I'm not sure how this works in the Java world but I can imagine there being some potential problems when one thread is using a prepared statement and the other closes the connection.

staticlibs · 2025-03-21T09:04:43Z

Can we perhaps add some tests with multi-threading as well?

Thanks for the review! While, in general, client code is not expected to use Connection or Statement instances concurrently in different threads (common case is: taking connection from a pool, using it in a single thread, and then returning it back to pool), close calls can realistically happen from other threads (for example, in shutdown cleanup code). So in this change only closing logic is synchronized for potential concurrent usage. The behaviour on this concurrent call is a valid concern - per JDBC spec it is "implementation-defined" what happens when active connection is closed while queries are running. We at least should not crash when native statement is deleted while still in use. Will add the concurrent closure test coverage.

jonathanswenson · 2025-03-21T21:25:28Z

The big one that we use (primarily) for motherduck, is statement.cancel() from a different thread for query cancellation.

In one thread we use the standard JDBC flow for running a query.

create connection
create statement (and stash it somewhere)
executeQuery on statement
Iterate through results.

In another thread we may need to kill the query:

detect that the query needs to be killed, grab the stashed statement
call cancel on that statement to cancel the inflight query
[optionally] close the statement to try to prevent new queries from starting -- we have this disabled for duckdb / motherduck now due to causing a variety of SIGSEGVs.

The blocking nature of the JDBC API make this frustratingly tough to make reasonably threadsafe. It is nice if the statement close also cancels queries, but that isn't the case with all JDBC drivers 😭

juja0 · 2025-03-22T08:22:48Z

src/main/java/org/duckdb/DuckDBConnection.java

+        if (conn_ref == null) {
+            return true;
+        }
+        synchronized (this) {


The use of "synchronized" might cause thread-pinning when used with virtual threads (atleast for java versions from 19 to 23). Would it be better to use a ReentrantLock instead ?

Similar discussion in pgjdbc: pgjdbc/pgjdbc#1951

M, thread pinning here is either very short (when synchronized block only flips the flag) or unavoidable, when it goes to a native call. Unlike Postgres, there is no IO done from Java in DuckDB - the pinned thread is used to do the actual work in DB engine code. At the same time, ReentrantLock s (with a few volatile fields) should not be in any way worse than synchronized blocks. So perhaps it makes sense to use them consistently instead of synchronized.

juja0 · 2025-03-22T08:25:21Z

src/main/java/org/duckdb/DuckDBConnection.java

+            }
+            // Closing remaining statements is not required by JDBC spec,
+            // but it is a reasonable expectation from clients point of view.
+            List<DuckDBPreparedStatement> stmtList = new ArrayList<>(statements);


Would it be better to use a concurrent set (like ConcurrentHashMap.newKeySet()) instead of using a non-concurrent one and making a defensive copy for iteration ? (Assuming order of iteration is not required to be maintained)

staticlibs · 2025-03-25T16:02:42Z

@jonathanswenson

Thanks for the details!

With concurrent closing, despite no synchronization at all between close() and other operations, it appeared to be not trivial to get a SIGSEGV at will. The only long operation execute() is actually carefully written to get all required data at the beginning and it is not touching the statement state at all after the execution begins. So to get a SIGSEGV it is necessary to call close on a statement after the JNI execute() call entered, but before the execution begins in the engine, that is a pretty narrow time period.

I now have a SIGSEGV reproducer and going to add synchronization in JNI (perhaps moving all synchronization from Java there too). And cancelling of queries seems to work reliably, so going to add cancelling before closing the statements in a connection cleanup.

The hanging is also reproducible, when connection is closed while some query is still running, it happens even when corresponding statement is closed beforehands. Cancelling queries before closing statements seems to be solving the hanging as well.

staticlibs · 2025-03-27T01:36:48Z

@Mytherin

I've added concurrent tests and implemented synchronization in the native part to make these tests to not crash or hang. Now all operations on connections, statements and results are only performed while holding a lock specific to this object. I've ended up using global registries to keep the locks for objects shared with Java part (added longer description on registries and their usage to holders.hpp). These registries are clunky, but I was unable to get anything more elegant (like weak_ptr) - the main problem is that bare pointers are coming from Java side, the underlying contents of these pointers can be deleted concurrently, so some external lock was required to dereference such a pointer.

Locking is done as straightforward as possible, only scoped std::lock_guards are used (no passing locks between calls, no recursive locks, no atomics etc). Their usage requires the multi-step dance on every access to a connection/statement/result set:

check that object is alive
get its shared_ptr mutex into a local var
lock this mutex
re-check that object is still alive
dereference the object and do the work

This is very verbose, but at least should be straightforward to maintain if used consistently.

Another thing, is that the long queries execution is done while holding a statement lock. I was thinking on releasing the lock while query is running (and re-locking to prepare the result to pass it to Java), but decided to keep this part simple (at least for now). Query interrupt seems to be effective to quickly stop running queries, this interrupt is used on connection when it is closed.

Also, I did not touch synchronization in Appender and in Arrow - going to address these parts separately.

PS: ~~the CI run is failing on linking/symbols problem that seems to be unrelated to this change, will look at it tomorrow.~~
edit: fixed this, was a missing .cpp entry in CMakeLists.txt.in.

This is a version 3 of this PR that attempts to safely close pending statements when the connection is closed. In it all tracking and locking logic is moved from C++ to Java: - `Connection`, `Statement` and `ResultSet` instances use their own locks; the check that corresponding reference is still alive is performed before every native call after obtaining the lock. - `Statement` lock is held during the query execution, note, this lock is NOT requied to call `statement#cancel()` because this operation is implemented on a `Connection` level. - When a `Connection` is being closed, pending query is cancelled first, and then all active statements are closed in a reverse creation order. Note, thread safety for Appender and Arrow interfaces is going to be addressed in subsequent PRs. Testing: new tests added for various sequential and concurrent closure scenarios. Fixes: duckdb#101

staticlibs mentioned this pull request Mar 20, 2025

Deadlock when opening a 2nd connection with an unclosed statement from previous connection #101

Closed

staticlibs force-pushed the statement_close branch from bebd337 to 394e3f9 Compare March 20, 2025 16:49

jivanic-demystdata reviewed Mar 20, 2025

View reviewed changes

Mytherin reviewed Mar 21, 2025

View reviewed changes

juja0 reviewed Mar 22, 2025

View reviewed changes

staticlibs mentioned this pull request Mar 25, 2025

SIGSEGV with Concurrent Write #175

Open

2 tasks

staticlibs force-pushed the statement_close branch 2 times, most recently from 897e47f to a2fc047 Compare March 27, 2025 01:06

staticlibs force-pushed the statement_close branch 2 times, most recently from c9501be to 2c789d0 Compare March 27, 2025 01:45

staticlibs mentioned this pull request Mar 27, 2025

add types.cpp to jni build #178

Closed

staticlibs force-pushed the statement_close branch 3 times, most recently from a45f49c to bc6b233 Compare April 7, 2025 20:12

staticlibs force-pushed the statement_close branch from bc6b233 to 9d5cf66 Compare April 15, 2025 00:51

staticlibs force-pushed the statement_close branch from 9d5cf66 to 8562602 Compare April 22, 2025 11:46

staticlibs merged commit 2acc78b into duckdb:main Apr 22, 2025
7 checks passed

staticlibs deleted the statement_close branch April 22, 2025 13:00

staticlibs mentioned this pull request May 12, 2025

bad_weak_ptr #219

Open

2 tasks

elefeint mentioned this pull request May 18, 2025

db will not be closed once .prepare() is called duckdb/duckdb-node-neo#211

Closed

staticlibs mentioned this pull request Jul 2, 2025

JDBC thread stuck in duckdb_jdbc_execute when querying DuckDB duckdb/duckdb#18123

Open

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Close pending statements on connection close #170

Close pending statements on connection close #170

Uh oh!

staticlibs commented Mar 20, 2025 •

edited

Loading

Uh oh!

jivanic-demystdata Mar 20, 2025

Uh oh!

staticlibs Mar 21, 2025

Uh oh!

juja0 Mar 22, 2025

Uh oh!

staticlibs Mar 22, 2025

Uh oh!

Mytherin left a comment

Uh oh!

staticlibs commented Mar 21, 2025

Uh oh!

jonathanswenson commented Mar 21, 2025

Uh oh!

juja0 Mar 22, 2025

Uh oh!

staticlibs Mar 22, 2025 •

edited

Loading

Uh oh!

juja0 Mar 22, 2025

Uh oh!

staticlibs commented Mar 25, 2025

Uh oh!

staticlibs commented Mar 27, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Close pending statements on connection close #170

Close pending statements on connection close #170

Uh oh!

Conversation

staticlibs commented Mar 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jivanic-demystdata Mar 20, 2025

Choose a reason for hiding this comment

Uh oh!

staticlibs Mar 21, 2025

Choose a reason for hiding this comment

Uh oh!

juja0 Mar 22, 2025

Choose a reason for hiding this comment

Uh oh!

staticlibs Mar 22, 2025

Choose a reason for hiding this comment

Uh oh!

Mytherin left a comment

Choose a reason for hiding this comment

Uh oh!

staticlibs commented Mar 21, 2025

Uh oh!

jonathanswenson commented Mar 21, 2025

Uh oh!

juja0 Mar 22, 2025

Choose a reason for hiding this comment

Uh oh!

staticlibs Mar 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

juja0 Mar 22, 2025

Choose a reason for hiding this comment

Uh oh!

staticlibs commented Mar 25, 2025

Uh oh!

staticlibs commented Mar 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

staticlibs commented Mar 20, 2025 •

edited

Loading

staticlibs Mar 22, 2025 •

edited

Loading

staticlibs commented Mar 27, 2025 •

edited

Loading