-
Notifications
You must be signed in to change notification settings - Fork 300
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add async
snapshot continue capability for multipacket fields
#3161
base: main
Are you sure you want to change the base?
Conversation
/azp run |
Commenter does not have sufficient privileges for PR 3161 in repo dotnet/SqlClient |
/azp run |
Azure Pipelines successfully started running 1 pipeline(s). |
Unfortunately, new security requirements restrict our ability to run CI for non-maintainers. Feel free to ping the team alias (@dotnet/sqlclientdevteam) or any of us directly and we'll do our best to trigger a pipeline run ASAP. See also: #3152 |
Based on the feedback provided I have added a new compatibility switch called |
@Wraith2 Is this the "final" PR ?? (I lost track a few years ago) |
Yup, part 3 of 3 of #2608 so this build should be faster stable async strings. |
/azp run |
Azure Pipelines successfully started running 1 pipeline(s). |
src/Microsoft.Data.SqlClient/src/Microsoft/Data/SqlClient/LocalAppContextSwitches.cs
Outdated
Show resolved
Hide resolved
src/Microsoft.Data.SqlClient/src/Microsoft/Data/SqlClient/TdsParserStateObject.cs
Outdated
Show resolved
Hide resolved
src/Microsoft.Data.SqlClient/src/Microsoft/Data/SqlClient/LocalAppContextSwitches.cs
Outdated
Show resolved
Hide resolved
src/Microsoft.Data.SqlClient/src/Microsoft/Data/SqlClient/LocalAppContextSwitches.cs
Outdated
Show resolved
Hide resolved
src/Microsoft.Data.SqlClient/src/Microsoft/Data/SqlClient/TdsParserStateObject.cs
Show resolved
Hide resolved
src/Microsoft.Data.SqlClient/src/Microsoft/Data/SqlClient/TdsParserStateObject.cs
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some comments for discussion, and a couple of typos.
/azp run |
Azure Pipelines successfully started running 1 pipeline(s). |
CI is broken. Can you re-kick it please. |
I restarted the failing jobs |
The jobs are failing because the manual tests are timing out. The tests are timing out because stream operations are hitting a path that I wasn't aware of (and didn't break the last two times I reimplemented this) so more work is needed. |
I've made many changes to get streams working. Most things seem to work now but it's hard to tell with how test experience is locally. Can someone on the @dotnet/sqlclientdevteam run the CI please. |
/azp run |
Azure Pipelines successfully started running 1 pipeline(s). |
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #3161 +/- ##
==========================================
- Coverage 72.93% 66.47% -6.47%
==========================================
Files 287 282 -5
Lines 59173 59259 +86
==========================================
- Hits 43160 39393 -3767
- Misses 16013 19866 +3853
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
I'll try to look later. rebasing this set of changes is not fun. |
3e8eb21
to
0fbae86
Compare
Rebased, kick CI please. |
use as instead of casts for safety
I recovered the perf for the compatibility path and kept the extra speed on the new path. |
/azp run |
Azure Pipelines successfully started running 1 pipeline(s). |
Ci is slowly working through the stages. the package was built and is here https://sqlclientdrivers.visualstudio.com/904996cc-6198-4d39-8540-eca72bdf0b7b/_apis/build/builds/110519/artifacts?artifactName=Artifacts&api-version=7.1&%24format=zip |
@Wraith2 So I ran the EF Core 9 (release/9.0 branch) tests, and then updated to the package above. The tests run around the same amount of time as previously, but two (one) tests are failing! I have attached the results. These are the queries against Northwind, you can find a .sql script with the database here in the EF Core repo:
[ConditionalTheory]
[MemberData(nameof(IsAsyncData))]
public virtual Task FromSqlRaw_queryable_with_null_parameter(bool async)
{
uint? reportsTo = null;
return AssertQuery(
async,
ss => ((DbSet<Employee>)ss.Set<Employee>()).FromSqlRaw(
NormalizeDelimitersInRawString(
// ReSharper disable once ExpressionIsAlwaysNull
"SELECT * FROM [Employees] WHERE [ReportsTo] = {0} OR ([ReportsTo] IS NULL AND {0} IS NULL)"), reportsTo),
ss => ss.Set<Employee>().Where(x => x.ReportsTo == reportsTo));
} |
Thanks, it's a real new bug. I'll investigate. |
@Wraith2 Cool, let me know if you need additional repro info. |
Fixed. It looks like the existing tests don't cover the case of a binary non-plp column so the image column in the query needed tweaking to be able to deal with the continue state. I've added covering tests based on the test provided. @dotnet/sqlclientdevteam can you kick the CI please. |
/azp run |
Azure Pipelines successfully started running 1 pipeline(s). |
all green, artifacts here https://sqlclientdrivers.visualstudio.com/904996cc-6198-4d39-8540-eca72bdf0b7b/_apis/build/builds/110607/artifacts?artifactName=Artifacts&api-version=7.1&%24format=zip If anyone can think of a way to get wider testing of the PR artifacts it might be interesting to try it. It's clear that the CI doesn't give coverage of all code paths and there's no reason to assume that EFCore tests cover regions that SqlClient ones don't. The more we find and fix before it reaches the unwilling public the better it will be. |
@Wraith2 All EF Core 9 tests passing with latest package, and run duration around the same as with 5.1.6 |
@Wraith2 A way to get wider testing is to include this in a 7.0 preview |
All EF tests passing is good news. Unless the tests have a lot of long strings I wouldn't expect to see any large time differences. In terms of testing reading 1Mib is much the same as reading 100Mib. It's been marked for inclusion in the next preview but I don't know how much update previews get. If previews are widely used then that will be useful. |
I was able to replicate your performance numbers with the new internal perf tests we're building out. Very exciting! |
Good news. Thanks. Let me know if anything needs explaining I can either try to write it up or arrange another call to go through it. |
@@ -159,6 +159,16 @@ public static bool UseCompatibilityProcessSni | |||
return (bool)switchesType.GetProperty(nameof(UseCompatibilityProcessSni), BindingFlags.Public | BindingFlags.Static).GetValue(null); | |||
} | |||
} | |||
|
|||
public static bool UseCompatibilityAsyncBehaviour |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was this reflection layer added because LocalAppContextSwitches is internal?
cc @benrr101 another internal class testing challenge. We can't actually easily test our compatibility flags :(
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. Im re-using one of the files from the main project to make sure we can't get out of sync between test and real implementations but that means that various classes and namespaces have to be put in exactly the right places.
@@ -37,18 +37,34 @@ private SqlCachedBuffer(List<byte[]> cachedBytes) | |||
/// </summary> | |||
internal static TdsOperationStatus TryCreate(SqlMetaDataPriv metadata, TdsParser parser, TdsParserStateObject stateObj, out SqlCachedBuffer buffer) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note to self: This type class is XML specific. It's not used for other data types. Naming could use an update for clarity.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Wraith2 checking my understanding of this class:
PLP bytes are read from the network in chunks (when overall length is unknown) or in their entirety (when overall length is known). If chunked, each chunk ends up in a separate byte[]. Each byte[] is added to the overall list of byte[]s: _cachedBytes
. Consumers can stream the PLP data by streaming through the byte arrays.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That matches what I found. It's the first time I've encountered this particular class and it was a bit surprising.
List<byte[]> cachedBytes = null; | ||
if (isAvailable) | ||
{ | ||
cachedBytes = stateObj.TryTakeSnapshotStorage() as List<byte[]>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In what situation would we need to restore the cached bytes list from the snapshot. Isn't the point of this class that they are already cached here? Is it because this TryCreate
method may be called multiple times? And if so, is there some way we can avoid that?
It is confusing to me that the snapshot storage may contain either byte[] or List<byte[]>.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The snapshot storage can contain byte[]
, char[]
or List<byte[]>
depending on the whether we're reading binary, strings or xml. I hadn't realised that there was the List<byte[]>
option until recently. The spanshot field used to be called _plpBuffer
but I renamed it to be more general as I found that it needed to contain multiple types.
byteArr = new byte[cb]; | ||
result = stateObj.TryReadPlpBytes(ref byteArr, 0, cb, out cb); | ||
byte[] byteArr = new byte[cb]; | ||
// pass false for the writeDataSizeToSnapshot parameter because we want to only take data |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this required due to the snapshot behavior above (using the snapshot to store the List<byte[]>)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. In this situation in the cached buffer each read is self contained so we don't want any of the continue logic inside the callee to kick in.
return result; | ||
if (result == TdsOperationStatus.NeedMoreData && isAvailable && cb == byteArr.Length) | ||
{ | ||
// succeeded in getting the data but failed to find the next plp length |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you have a concrete example of how this might happen? Is this if plplength is greater than the max chunk size?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It happens at the end of each packet. We've read the bytes that we know were in the packet, we need more data but when we asked for another packet we got need-more-data back.
@@ -2063,7 +2157,10 @@ internal TdsOperationStatus TryReadNetworkPacket() | |||
internal void PrepareReplaySnapshot() | |||
{ | |||
_networkPacketTaskSource = null; | |||
_snapshot.MoveToStart(); | |||
if (!_snapshot.MoveToContinue()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the key line for the performance improvements. If we have a continuation point, we can start the packet replay from there instead of the start of the snapshot. This method is invoked whenever we continue an async read operation.
{ | ||
length = (ulong)stateObj.GetSnapshotStorageLength<byte>(); | ||
isNull = length == 0; | ||
return TdsOperationStatus.Done; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this case, we can skip because we'll only have a continuing snapshot if we already read the header?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, sort of.
In this case we must skip because the length is only present in the first packet before we start reading data. If we aren't in the first packet in the snapshot then a read here is incorrect.
Reading the length from the storage array avoids needing to keep another field in the snapshot to store the length of non-plp fields.
(bool isAvailable, bool isStarting, bool isContinuing) = stateObj.GetSnapshotStatuses(); | ||
if (isAvailable) | ||
{ | ||
if (isContinuing || isStarting) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lamenting the fact that we have to check the snapshot state in so many different places 😢
Not to be addressed in this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can appreciate the sentiment.
Consider that there was the possibility that it would need to have changes made in the relationship between the SqlDataReader and the parser which would be much more complicated. I've done my best to minimize the amount of code that needs to be changed to accommodate the new functionality. There may be future opportunities for improvement.
I found that the complexity in this PR is mostly not in the code it's in understanding the sequence of events during async callbacks. In particular the nuance between is isAvailable
, isStarting
and isContinuing
flags isn't big in terms of code but matters immensely for correctness and the ability to turn off the continue capability.
if (isContinuing || isStarting) | ||
{ | ||
temp = stateObj.TryTakeSnapshotStorage() as byte[]; | ||
Debug.Assert(bytes == null || bytes.Length == length, "stored buffer length must be null or must have been created with the correct length"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we expect the snapshot storage to have this length? It doesn't seem to me that this would be true in general.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a coordination with the other part of the code which is keeping making sure that the array has been written in the need-more-data state.
The only way to arrive at this line of code is if we are continuing and that means that we have reached at least the second packet and know that we have hit a packet transition where needs-more-data was returned, in this situation the array that we created to copy the data into while we were in the isStarting state will have been written back to the snapshot storage.
It is also logically incorrect to end an async read with a non-null _storage buffer in the snapshot. That means that you can never find a buffer that some previous operation left hanging around. An async operation must either succeed and have taken the array cleanly or the cleanup process for snapshots will have nulled the array.
while (charsLeft > 0) | ||
{ | ||
if (partialReadInProgress) | ||
{ | ||
goto resumePartialRead; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd prefer to avoid using goto even if it means another layer of nesting.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why? It's a supported and useful language construct which is doing exactly what we need in this case.
I know there are people who have ideological objections to goto but in this case it's the cleanest way to get the logic we need. Unfortunately the language definite assignment rules prevent it being used to go directly the place we really want.
Split out from #2608 per discussion detailed in #2608 (comment)
Adds infrastructure to the async reader snapshot to allow the last packet of the snapshot to be captured as a continue point for the snapshot. On replay if a continue point is available it is used which skips re-doing the previous work improving speed.
The name and behaviour of the appcontext switch being used will need agreement and work. This continue capability requires the partial packet capability so legacy mode enabled on partial packets will force this feature to be disables. I expect there to be some discussion around the names and exact details so the current PR is in dev form, easy to see and change.
Benchmarks for reading a 20Mib string from a local sql server. As payload sizes increase and latency increases speed differences should be more noticeable. I invite and request people to do some benchmarking to make sure we're aware of the performance characteristics of this change.
/cc @MichelZ and @ErikEJ for interest