-
Notifications
You must be signed in to change notification settings - Fork 300
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Errors after upgrading to 5.2.0 from 5.1.5 on Linux #2378
Comments
@alex-jitbit is this happening on a regular connection? I mean there is no AAD included? can you provide a sample repro please? |
Linux uses managed SNI and I think the improvements were done mostly on the native side, which is windows only. Which change did you mean? |
No, no AAD, my connection string uses explicit username/password combo
A simple repro would be:
Compile on .NET 8, run on Ubuntu 22.04 (AWS) connecting to external SQL Server (also Ubuntu 22 on AWS). Reverting to 5.1.5 fixed the problem immediately. P.S. Can't repro on WSL Ubuntu connecting to Windows-hosted MS SQL Server, I assume the issue is with connecting to a linux-hosted SQL Server OR it happens under heavy load only. |
@alex-jitbit I will test it today and will update you after. |
I had the same problem, I downgrade to 5.1.5 and it worked again SQL 2019 - Windows Server - .NET 8 |
I was not able to repro the issue on Ubuntu 22.04 as a local server, but I will test it with a remote server. If there is any issue it should be related to #1029 Update: I tested with an azure SQL server at East US (adding more latency), but was not able to repro the issue. |
here is my test setup: csproj: <Project Sdk="Microsoft.NET.Sdk">
<PropertyGroup>
<OutputType>Exe</OutputType>
<TargetFramework>net8.0</TargetFramework>
<ImplicitUsings>enable</ImplicitUsings>
<Nullable>enable</Nullable>
</PropertyGroup>
<ItemGroup>
<PackageReference Include="Microsoft.Data.SqlClient" Version="5.2.0" />
</ItemGroup>
</Project> Program.cs using Microsoft.Data.SqlClient;
SqlConnectionStringBuilder builder = new(){
DataSource ="*******.database.windows.net",
UserID = "******",
Password = "*****",
InitialCatalog = "Northwind",
MaxPoolSize = 250
};
using SqlConnection conn = new(builder.ConnectionString);
conn.Open();
Console.WriteLine(conn.State); I will test with a remote on premises server later today. |
Also seeing the same. Downgrading to Additional debug info if it's helpful.
|
If you are on Linux/macOS and specify both port and instance name in the connection string (like server,12345\instance), that might be the source issue. There appears to have been a regression in 5.2.0 on non-Windows where it isn't ignoring the instance name when both it and the port are specified. |
Some obervation from me hoping it helps investigation: We started getting this problem in alpine for a test that starts multiple threads connecting some same database in parallel. Other tests work fine. Our connection strings do not specify instance or port. Tried adding some delays in the code inside the different tasks to affect timing and then we got a different error instead of the one mentioned in this ticket: System.InvalidOperationException: Timeout expired. The timeout period elapsed prior to obtaining a connection from the pool. This may have occurred because all pooled connections were in use and max pool size was reached. Here's how threads are created in the test: Reverting back to v5.1.5 fixed this, so we are not updating to v5.2.0 until we know more. |
I can confirm that we too experience this error under a heavy load with multiple threads (not sure if this is the culprit) |
can you guys test with this package and see if the issue is resolved? just change the extension to |
@JRahnama Any chance you could add it to nuget.org so our build/test system can find it? |
@sturledahl this package is not officially signed and is not suitable for production use. I just wanted to confirm that the fix has resolved the issue for users before proceeding with a hotfix release. |
any update on this? |
Were you able to test with the sample package?
|
@JRahnama well no. We haven't updated to 5.2 yet because of this issue. We are currently using version 5.1.5 and running into #449, but it is only happening in our production environment (with a lot of traffic) and even there only once every few weeks. We have no controlled environment to test this. Maybe @alex-jitbit has a way to reproduce it and see if the 6.0.0 version resolves it |
Same issue on Windows! |
Unfortunately this bug reproducible in production only (under high load) and frankly I'm too afraid to try beta fixes on my prod. |
@alex-jitbit is it possible to test with 5.2.0-preview2 and 5.2.0-preview5 versions to identify what changed caused the issue? |
We have same issue as people above after upgrading to the 5.2.0. Issue is reproducible from either Alpine containers or VM with Amazon Linux 2023. SQL Server is running on Windows Server and connection string contains named instance and port. @JRahnama I've tested with different versions and here's the outcome:
|
Unfortunatly, I couldn't wait till release, But I found out that, one other cause of this error can be due to the compatibility level of your sql-server. For example Linq-Queries with a collection filter like so
|
@JRahnama any update on a release date for 5.2.1 yet? It's been a couple of weeks now and still no hotfix |
The provided workarounds aren't mitigating the issue completely in our case. The only "solution" is actually to downgrade to 5.1.5 We're load testing our service with k6 using the ramping-vus executor. We're ramping it up to 1000 vus and let the test run for 3min. For one of our scenario we tested using different connection timeouts with the 5.2.1 version as suggested workaround 1)
The higher we put the timeout the better it went. Going to 180s we got zero errors, but we started to get client timeouts. Which is not acceptable as well. The suggested workaround 2) to increase the Threadpool minimum size did not provide any better results. Honestly, we didn't try out the suggested workaround 3) as it sounds just too scary.. So please a proper fix would be really appreciated. Is there in the meantime an ETA for the fix? |
@ErikEJ @JRahnama would you know more about an ETA for when the connection pooling/handling is fixed and released in a 5.1.6 (hotfix) or 5.2.2/3 (on top of HEAD) version ? From https://www.nuget.org/packages/Microsoft.Data.SqlClient... |
So, have I read this wrong or does the latest stable release (5.2.1) in nuget have this issue? |
@robs The latest version 5.2.1 on nuget still has this issue, it has not been fixed yet. |
Oh wow, surely the "stable" flag should be removed then? We've not updated to 5.2.x yet but we're running Linux clients to Windows MS SQL. Is everyone just waiting in the hope there's a 5.1.x release that has the updated dependencies? |
@cbi-at-varian Great work showing what we are experiencing |
5.1.x will be released soon ( in a week or so) with updated dependencies. |
And what about 5.2.x? Will it also be released soon? |
Could someone please test the package below to see if the issue is resolved? Simply change the extension to Update: Our test pipelines passed with the changes, and the provided repro is working well with this set of modifications. The fix appears promising. I’ll wait for the impacted users' test results before proceeding with the PR. Also I will look into adding some tests if possible to prevent this from happening in future. |
If the fix works it will be backported to 5.2 and the hot fix will be released in second half of August. |
The issue was at TCPHanlde.Connect function. Having those async tasks in the middle of the primary sync flow was causing thread pool starvation under load. The proposed solution was to remove that as we are using socket.Select() later with the timeout to honor the timeout. Which was also mentioned at here by @roji |
I integrated the package provided by @JRahnama and run the load tests. It seems to fix the issue, thanks for that! Here the numbers to compare against:
Please note the numbers do not imply a certain quality of the fix, the load test runs aren't deterministic. Important for me is that we keep the expected thresholds and that there are no failures. It feels the fix is slightly slower, but only a benchmark would reveal that. |
@cbi-at-varian what does the |
Here are benchmarks after the fix: BenchmarkDotNet v0.14.0, Windows 11 (10.0.22631.4037/23H2/2023Update/SunValley3) Job=.NET 8.0 Runtime=.NET 8.0
below are results from 5.1.5:: BenchmarkDotNet v0.14.0, Windows 11 (10.0.22631.4037/23H2/2023Update/SunValley3) Job=.NET 8.0 Runtime=.NET 8.0
My sample app was the repro provided from this issue with ConnectTimeout set to 10 seconds. |
There are a couple of PRs to be bacported to 5.2 and the we will proceed with the release of hotfix 5.2.2 |
@saurabh500 the measurements are on the client side of the load tests. Those numbers therefore include the web stack, some small business logic including accessing the database. |
We are still seeing the same issue on our side with this version. open MS ticket 2406250040008194 Is there someone here from MS that can help with this issue? |
Can a preview nuget be pushed? |
M.D.SqlClient v6.0.0-preview1 and the v5.2.2 hotfix release are set to be released within the next day or two. |
Hotfix v5.2.2 is released. |
We're still getting this under very heavy load
Rolling back to 5.1.5 fixes the issue. The errors are intermittent & occasional. |
@alex-jitbit or @patrickg-hchb or someone else, do you still see this problem with latest version v6.0.1? We have locked all our apps to v5.1.5 until now due to this issue, but want to go to latest. Just not sure if we should risk it.. |
@sturledahl we are on 5.2.2 all good. Looks safe. P.S. Had to rewrite hot path code to ease the load on connection pool (tighten "open/close" timing and eliminate multiple connections when several methods call each other in a chain). |
We've had 5.2.2 in prod for a number of months now, no issues to report |
After upgrading 5.1.5 to 5.2.0 on Linux (Ubuntu) .NET 8 I'm getting thousands of errors:
Some requests work fine, but about 50% throw this error. Reverting back to 5.1.5 solves the problem.
Further technical details
Microsoft.Data.SqlClient version: 5.2
.NET target: NET 8
SQL Server version: SQL 2017 on Linux
Operating system: Ubuntu 22
The text was updated successfully, but these errors were encountered: