-
Notifications
You must be signed in to change notification settings - Fork 300
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SqlClient. Inappropriate behavior after a commit timeout #130
Comments
I responded in corefx and with initial triage thoughts, copied here for continuity.
Can you close the corefx issue if you want to track this here so that the other issues doesn't get moved and become a duplicate please. |
@Wraith2 Yeah, you responded. But:
That is not a problem. I can close that issue or I close this one. However you say. What is important is that this bug is fixed. At least recognized and a real workaround is provided until the fix is ready. |
I'm a contributor and I focus on SqlClient quite a lot but I don't work for Microsoft. Nothing I say is in any sense official. I looked at the details you provided and based on my experience I think it's the same or a similar issue to the one linked so I provided that information for others. If I get chance I'll investigate. The request to have one issue is just so we don't end up with fragmented conversations. |
Okay, this means that there is no official reaction. And that's what bothers me here. I have presented a critical bug along with reproduction and i'm getting zero official reaction. This is not how i expect this to happen. And this is precisely why I opened this issue again here. To get some attention from the appropriate team who can schedule and resolve this. If you can investigate - this is fine. I even can do it myself, i'm pretty sure the fix is not that complex. The question is if this fix will be accepted or not. For this to be accepted, it should be first recognized as a bug. And that's not what happening here. For the workaround, I've already found one. It is to catch the timeout exception from Commit and call SqlConnection.ClearPool(https://docs.microsoft.com/en-us/dotnet/api/system.data.sqlclient.sqlconnection.clearpool?view=netframework-4.8). However, this method "Empties the connection pool associated with the specified connection". This may have undesirable performance effects in a highly parallel server app with hundreds of connections. So, for now i've applied this workaround in our apps, but by no means it is a permanent solution. |
Thanks for looking into it @Wraith2. I agree it looks similar to the other issue. CC: @cheenamalhotra |
@David-Engel Thanks for prioritizing it. If I wanted to look, what would be the right place to do that and submit a PR? This repository does not have any code as far as i can see. |
The code in the process of being forked. For the moment the version in corefx can be used. This is likely to be a very complex change and not a quick fix. |
okay, i will leave it to someone who knows what they are doing then |
@dmytro-gokun I'd like to clarify that GitHub is not replacement for support. If you have immediate problem and you need immediate support, we recommend to contact official Microsoft support - also linked here - product support. |
@karelz Thanks for the clarification. I have workaround for now. So, for the company i work at, everything is under control (hopefuly). But think about this. These days, there are thousands of companies running their stuff on SQL Azure. This means, they hit these commit timeouts from time to time (I described in my start post why). Now, even if they implemented proper transient exception handling, they will hit this super ugly thing which causes SqlClient to return empty sets when there is some data in the DB. That is super-crazy and is super hard to find the reason of if you face it. No respectful DB provider has to behave this way. I caught it almost by accident. This is why my perception was that the bug is critical and has to be fixed as soon as possible. People may run into it all the time, they just do not realize that. But they will face consequences later - in form or corrupt data, wrong decision by the code and so on. That is scary as hell. I hope this explains my reaction and my surprise for the lack of attention. In any case, I might be over-dramatizing here. The ball is on you side of the field and you decide what to do with it. |
Triggering a Commit Timeout with errorcode The problem is I don't see the issue happening with this scenario so it might not be related to Commit Timeout exception with errorcode I would request you to try setting |
This will cause a command timeout and not a commit timeout. Those are different things and are handled differently by SqlClient. Have you run my code? Were you able to reproduce the error? If you are not able to reproduce it and my description is not clear enough - please let me know and I will try to explain it better. |
You might be able to trigger a commit timeout by suspending the You also can set up mirroring or AG and pause the network. Or, put the transaction log on storage that you can pause. You can pause an SMB file share by interrupting the network. |
@GSPP Thanks. These are very clever ideas (Y). |
We added a change to mark active connection as not poolable and doomed when an exception is captured during transaction commit, ensuring corrupted connection is dropped on Dispose. The fix will be available as part of GA release of 1.0. |
Stable version 1.0.19239.1 for Microsoft.Data.SqlClient 1.0 is now available to download that contains the fix. Let us know if you need any more support. |
@cheenamalhotra Thanks a lot! I've started using this release in our production environment. If I see the problem happening again, i will open a new issue. |
I've initially created this bug in the CoreFX repository: https://github.com/dotnet/corefx/issues/38900, but it does not look like anyone is looking there. So I'm repeating it here. A small addition: I've now tested this using both System.Data.SqlClient (4.6.1) and Microsoft.Data.SqlClient (1.0.19128.1-Preview), compiled for both .NET Core 2.2 & .NET Framework 4.8. In each of these four cases I was able to reproduce this bug.
=============================================================
I have a SQL Azure database and I use System.Data.SqlClient (4.6.1) to talk to it. From time to time (sometimes several times a day, sometimes no incidents for weeks), it would produce a SqlException with Number = -2 (Timeout) during a SqlTransaction.Commit call. I assume that's happening because SQL Azure databases have mirroring set up and there is some transient network condition happening around Commit() which prevents that from finishing. That's fine with me - i know those are transient exceptions and would retry them after some interval. The problem is that the connection which caused that exception is returned to the pool. The next time it's got picked from the pool, it seems to be in some invalid state. Then, when i do "SELECT * FROM MyTable WHERE Id = @id" on that connection, it would return an empty set, even though that record is 100% there in the table. That is one really disturbing bug.
I've tried to build a small reproduction for this issue. The hardest thing was to have SqlTransation.Commit() to throw a timeout exception. I invented a trick to do that (see below, ForceCommitTimeout() function). It's a bit clumsy one and does not always reproduce. If it does not repoduce in your environment, try increasing RowNumber and running the code several times. From time to time it would produce an exception. Note, that in the connection string I've set "Connection Timeout=1". That timeout value is also used as commit timeout by SqlTransaction. In any case, if someone knows a better way to force a timeout from SqlTransation.Commit() - please share it with me.
The second important point to build a stable reproduction was to set "Max Pool Size=1". This is to make sure that the same connection is returned from the pool when you open a new connection after a commit timeout has happened.
Before you run the code, you need to create a database named "TestBed" on your DB server and then create a test table in it:
Now, when you are all set up, here's the code to reproduce the problem:
When i run it and i'm lucky to hit a timeout, i get output like this:
Notice, that the second call to SelectOne() outputs "0". Even though it does "SELECT 1"! And that is the essence of this (really weird) bug. I've been able to reproduce it on both SQL Azure and my local SQL Server 2017 Dev Edition.
The text was updated successfully, but these errors were encountered: