Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SqlDataReader.Close blocks in SNIReadSyncOverAsync after early breaking from reading large dataset #34

Closed
JanKrivanek opened this issue Apr 18, 2018 · 17 comments
Labels
🎨 By Design Issues due to driver feature design and will not be fixed.
Milestone

Comments

@JanKrivanek
Copy link
Member

SqlDataReader is taking unproportionaly long time to Dispose if you are early breaking enumeration through large data set.

I can observer order of magnitude difference in Dispose time depending on number in top clause in select statement, but regardless of the number of rows actually selected.

Let's have the following code:

    public static void Test()
    {
        IEnumerable<int> en1 = EnumerateLargeData(1000);
        // log time here
        foreach (int i in en1)
        { }
        // log time here

        IEnumerable<int> en2 = EnumerateLargeData(2000000000);
        foreach (int i in en2)
        { }
        // log time here
    }

    public static IEnumerable<int> EnumerateLargeData(int maxCount)
    {
        using (SqlConnection sqlConnection = new SqlConnection(connectionString))
        {
            sqlConnection.Open();
            using (SqlCommand getDataCommand = new SqlCommand("[dbo].[GetLargeData_SP]", sqlConnection))
            {
                getDataCommand.CommandTimeout = 0;
                getDataCommand.CommandType = CommandType.StoredProcedure;
                getDataCommand.Parameters.AddWithValue("@MaxCount", maxCount);

                using (var reader = getDataCommand.ExecuteReader())
                {
                    int recordsCount = 0;
                    while (reader.Read())
                    {
                        yield return 1;
                        if (recordsCount++ > 100)
                        {
                            //Here is where issue happens
                            break;
                        }
                    }
                }
            }
        }
    }

Where SP is selecting data from some very large table:

CREATE procedure [dbo].[GetLargeData_SP]
	@MaxCount INT
AS
BEGIN
	SELECT TOP (@MaxCount)
		[DataPoint]
	FROM 
		[dbo].[LargeDataTable] WITH(NOLOCK)
END

You will see a very large difference in Dispose time depending on the maxCount argument - especially when pulling data over slower netwrok.
In my scenario I'm done fetching data in few hundreds milliseconds but then stuck in Dispose for 2 minutes. Exactly in this stack:

image

@AfsanehR-zz AfsanehR-zz self-assigned this Apr 24, 2018
@AfsanehR-zz
Copy link

Hi @jakrivan I tried with maximum rows of '73840823' but I don't see it hanging. Could you try different max sizes and can you share with us at which points you see the delays happening? It could be the server is overloaded and is not signalling to the client application to release its resources so its getting timeouts. Thanks.

@JanKrivanek
Copy link
Member Author

Hi @afsanehr ,

I've post full repro sample here: https://jankrivanek456274688.wordpress.com/2018/05/14/beware-of-early-breaking-from-sqldatareader-reading-over-large-dataset/ (I've tried to post it to my msdn blog, but somehow it's stuck in 'Pending' state and I don't have publish right any more).

The number of originally requested rows and then the latency between server and client are the main factors influencing this behavior (as from the github published reader code it seems that it's continuing to read over all the remaining rows). So you may be able to repro with your code - just increase the number of top rows at least 100 times.

Please let me know if you are still not able to repro.

Thanks
Jan

@AfsanehR-zz
Copy link

Hi @jakrivan , I tested with 131229530 rows and enumerating through data as you provided in your sample, but still unable to repro. could you kindly share what version of .Net Core are you using and if this happens with .NET Core 2.1 RC?
Would be great to also share the output of dotnet --info
Thanks!

@JanKrivanek
Copy link
Member Author

Hi @afsanehr

Sorry for confusion on my side - the repro is from .NET standard (4.6.2), will try also on .NET Core 2.1 RC and will let you know. If reproducible only on 4.7.2 I'll move this item to dotnet/standard repo.

>dotnet --info
.NET Command Line Tools (1.0.3)

Product Information:
Version: 1.0.3
Commit SHA-1 hash: 37224c9917

Runtime Environment:
OS Name: Windows
OS Version: 6.3.9600
OS Platform: Windows
RID: win81-x64
Base Path: C:\Program Files\dotnet\sdk\1.0.3

Thanks
Jan

@JanKrivanek
Copy link
Member Author

JanKrivanek commented May 17, 2018

Hi @afsanehr

So I tried on .NET core 2.1 with System.Data.SqlClient 4.5.0-rc against Microsoft SQL Server 2016 (SP1-CU8) (KB4077064) – 13.0.4474.0 (X64) on remote server with netwrok RT latency of about 55ms.

Here is the info output:

>dotnet --info
.NET Command Line Tools (2.1.200)

Product Information:
Version: 2.1.200
Commit SHA-1 hash: 2edba8d7f1

Runtime Environment:
OS Name: Windows
OS Version: 6.3.9600
OS Platform: Windows
RID: win81-x64
Base Path: C:\Program Files\dotnet\sdk\2.1.200\

Microsoft .NET Core Shared Framework Host

Version : 2.0.7
Build : 2d61d0b043915bc948ebf98836fefe9ba942be11

I can still see a significant difference in disposal time based on top count:

C:\Program Files\dotnet>dotnet run --project myApp
First case time: 00:00:00.9159470
Second case time: 00:00:07.9673436

It's fully inline with my original measurements on .NET standard 4.6.2 (available here)

I run the code that I linked above (you can simply copy paste it with no need to tweak anything but the connection string).

Thanks
Jan

EDIT:
The network latency is likely one of the crucial factors for repro.
I tried with local sql server and didn't get a repro:

C:\Program Files\dotnet>dotnet run --project myApp
First case time: 00:00:00.8501625
Second case time: 00:00:00.9905979

So it seems that with some nontrivial network latency buffers (on sql server side?) builds up and dispose code tries to traverse through those instead of simply throwing them away (guessing from the stack trace - especially functioons like 'TryReadNetworkPacket').
Please use more distant sql server or simply forward your traffic via some proxy server.

@AfsanehR-zz
Copy link

@jakrivan ok thanks for confirming. I was also using a local server which I did not see this issue happening. Could you confirm at which point of enumerating data you see the delays happening?

@JanKrivanek
Copy link
Member Author

@afsanehr the delay happens in Dispose code (so after breaking from enumerating) - the stack trace at the end of the first post here shows the exact stack where the Dispose consumes majority of the time.

All thats needed is a remote SQL server, or forwarded traffic via whatever remote box to your devbox (e.g. via 'netsh interface portproxy add v4tov4 listenport=1433 connectport=1433 connectaddress=<devbox_ip>' on the remote server) and you'll be able to repro.

@divega
Copy link

divega commented May 16, 2019

As recently announced in the .NET Blog, focus on new SqlClient features an improvements is moving to the new Microsoft.Data.SqlClient package. For this reason, we are moving this issue to the new repo at https://github.com/dotnet/SqlClient. We will still use https://github.com/dotnet/corefx to track issues on other providers like System.Data.Odbc and System.Data.OleDB, and general ADO.NET and .NET data access issues.

@divega divega transferred this issue from dotnet/corefx May 16, 2019
@divega divega added this to the 1.0.0 milestone May 17, 2019
@WillFM
Copy link

WillFM commented Jul 22, 2019

I've ran into the exact same issue. In my case, I'm using LINQ.Take() extension method, (the reader is wrapped into an yielded IEnumerable that is returning a type from the rows being returned). Once the Take() method finishes its enumeration, I have to wait for several minutes before the Readers.Dispose() to finally return

Owning assembly: v4.5.2 (.Net Framework Library project)
Calling assembly: net462 (AspNetCore 2.2 core project)

Any updates on this?

Also has anyone found any potential work-around(s)? I supposed I can implement TOP into my stored procedure, but was hoping to avoid schema changes to the database.

@JanKrivanek
Copy link
Member Author

@WillFM

I did 2 things to successfully workaround thiss issue (I use them in combination to be on a sure side):

  • Explicitly calling SqlCommnad.Cancel when I want to stop enumerating the data
  • Decreasing the size of the batch that single command call can return and iteratively calling the tha backend SP (requires you have unique ordered column that you can use to distinguish what data were already returned and what needs to be returned)

Both things however require you to refactor your code and write a method that explicitly hanldes the iteration over the data (so that it can call Cancel or so that it calls iteratively to backend). The second workaround requires change in your SP.

@David-Engel
Copy link
Contributor

This is actually by design. The correct behavior of a client talking to SQL Server over TDS is to either Cancel the command or consume the results. In the repro, the driver is cleaning up for the application by consuming the results, which is the safe option since it does not know what the executed statement is doing. Failing to consume the results is unsafe and can result in inconsistent application behavior. Calling Cancel would be a choice the application has to make, if it is safe to do so.

This old blog post talks about failing to consume results from SQL Server with example scenarios:
https://blogs.msdn.microsoft.com/psssql/2008/07/23/how-it-works-attention-attention-or-should-i-say-cancel-the-query-and-be-sure-to-process-your-results/

@jakrivan @WillFM

@David-Engel David-Engel added the 🎨 By Design Issues due to driver feature design and will not be fixed. label Jul 23, 2019
@divega
Copy link

divega commented Jul 24, 2019

@David-Engel new CloseAsync and DisposeAsync APIs are being added to the ADO.NET provider model in .NET Core 3.0. I think these could help avoid blocking in a a future version of SqlClient, when you are able to target .NET Standard 2.1. Perhaps having a general issue in the backlog for implementing the new async surface would be a good idea (unless this already exists).

cc @roji

@roji
Copy link
Member

roji commented Jul 24, 2019

The correct behavior of a client talking to SQL Server over TDS is to either Cancel the command or consume the results. In the repro, the driver is cleaning up for the application by consuming the results, which is the safe option since it does not know what the executed statement is doing. Failing to consume the results is unsafe and can result in inconsistent application behavior. Calling Cancel would be a choice the application has to make, if it is safe to do so.

FWIW the same happens in Npgsql, and I'd expect it to be somewhat standard behavior across database drivers.

@divega we have #113 to track implementation of the new async APIs. However, while close/dispose of the reader could help not block the thread while results are consumed, it would still let the query run to completion, possibly using server and network resources, which isn't ideal. The only way around that would be to trigger cancellation when the reader is disposed, but as @David-Engel wrote, it's not right for the driver to do this since we don't want what's running and whether it's safe to cancel.

My general response to this would be to not request a huge amount of results if one isn't sure they're going to be needed. SQL paging, cursors and similar mechanisms can be used to fetch results in chunks, providing natural "exit" points. Otherwise the application can trigger cancellation itself as mentioned above.

@FixRM
Copy link

FixRM commented Jul 28, 2021

Hi @jakrivan, hi @David-Engel. We have same issue with Microsoft D365 installation. Should we upgrade .NET Framework to latest version, or this issue was not resolved?

@David-Engel
Copy link
Contributor

Should we upgrade .NET Framework to latest version, or this issue was not resolved?

The driver behavior described in this issue is By Design and has not changed. Upgrading won't change anything for you. It's up to the application calling the driver to safely call cancel before closing or the driver will do the only safe thing it can: Consume all results to ensure the server executes everything the application sent in the last command.

@FixRM
Copy link

FixRM commented Jul 29, 2021

Thank you @David-Engel. I will contact MS D365 support.

@jnm2
Copy link

jnm2 commented Sep 29, 2022

Even if you do cancel appropriately using the cancellation token passed to SqlDataReader.ReadAsync, this problem still stands. When ReadAsync completes asynchronously and you get to the SqlDataReader.Dispose() at the end of your using scope, the Dispose call blocks until the whole result set finishes.

Repro at #1065 (comment).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🎨 By Design Issues due to driver feature design and will not be fixed.
Projects
None yet
Development

No branches or pull requests

8 participants