-
Notifications
You must be signed in to change notification settings - Fork 207
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Increase in lost lock after upgrading to .NET 7, EF Core 7 and Microsoft.Data.SqlClient 5.0.1 #160
Comments
Have you tried with Microsoft.Data.SqlClient 5.1.0-preview2 ? https://github.com/dotnet/SqlClient/releases/tag/v5.1.0-preview2 |
No, I haven't. For some reason we don't lose locks in our test env but in our acceptance test env (and production, which the graph is from). I'll deploy the two worst offenders with the preview version to the acceptance test env and we'll see in a few days if it makes any changes. |
Can you share more detail?
|
Confirm that you are using the latest DistributedLock.SqlServer How are you creating the lock? private async Task<IDistributedSynchronizationHandle?> GetHandleAsync(CancellationToken cancellationToken)
{
using var _ = _logger.BeginScope(_state);
while (!cancellationToken.IsCancellationRequested)
{
var failure = true;
try
{
_logger.LogInformation("Trying to acquire lock for {Job Name}", _jobName);
var handle = await _distributedLockProvider.AcquireLockAsync(
_lockName,
timeout: null,
cancellationToken: cancellationToken);
failure = false;
return handle;
}
// Multiple catch clauses
finally
{
if (failure)
{
await Delay.TryWait(TimeSpan.FromSeconds(5), cancellationToken);
}
}
}
return null;
} How are you detecting lost locks? HandleLostToken I assume? cancellationTokenRegistration = handle.HandleLostToken.Register(() =>
{
// HandleLostToken.Register is to slow to use for anything else than logging.
_logger.LogError("Lost lock for job {Job Name}.", _jobName);
_telemetryClient.TrackEvent(new($"Lost lock for {_jobName}")
{
Properties =
{
{ "Job Name", _jobname },
{ "Lock State", "Lost" }
},
});
}); What exceptions are you seeing? I made this repo for a previous bug but that's pretty much still how we are using the library. |
I noticed something really interesting/weird. 2/3 of the losses of the two services that loses the most locks occurs around midnight. In one env it's 23:45-00:00 and in another it's 00:00-00:05. I have no clue if it's related to this issue though, probably not. I wonder if Azure is doing something around midnight... |
If the connection is broken, I would expect disposing the handle to throw an exception since it will try to execute the release sproc and that will fail. Are you seeing any exceptions there?
Are you able to repro this at all locally? I assume not since you said just in acceptance test...
Is it possible for you to try and repro this without subscribing to |
I made a mistake and didn't realize that there could be an exception thrown from there. I accidently suppress any exceptions thrown during dispose. I'll have to fix that and deploy the services.
Unfortunately, no. Maybe if I leave a job running for a few days on a computer but I'll hope I can get some exceptions that makes sense from production instead.
Sure, I think I could just use |
Sounds good @OskarKlintrot . FWIW I ran the unit test suite with the 5.0.1 upgrade and everything passes, so no luck there. Another thing that would be interesting to test on your end is to try different versions of Microsoft.Data.SqlClient and see which one introduced the behavior. There are a lot of different suggestions flying around here so I thought it might make sense to track with a list:
|
@OskarKlintrot We did see some weird SQL connectivity issues in one of our microservices after upgrade to net7 |
I let it run for two days and on day 2 the behaviour come back. So no luck there I would say.
Since I can't go any lower than the one required by EF Core 7 I don't think this is feasible, unfortunately.
It doesn't seem to be any exceptions from
Stack trace:
Do you still think this is worth doing? (see below)
On hold (see below) Given that the exceptions (there a some other as well from another occasion all related to db connection issues, Azure admits the DB became unavailable at that time) points to some hiccup with the db connection I doubt that this issue has anything to do with |
@hoerup Oh, that's very interesting! Are you also using Azure SQL perhaps? When I upgraded to the latest preview everything seemed fine for a day until the issues come back. Please let me know if it works for you! If it does we will probably also upgrade to 5.1.0. |
This issue seems somewhat similar to what I am seeing, not sure if this issue is related. Is there any retry-logic in |
I might have been to fast there - just saw some connection errors again :( we are using SQL server on-prem |
Too bad!
Sounds like the issue is with |
There isn't any retry logic. Especially once we grab the lock if the connection dies we can't really retry at that point. Even if we reaquire the lock someone else still could have held it during the blip so we want to report as a lost handle.
@OskarKlintrot that does seem likely if you're encountering a spike across uses of Microsoft.Data.SqlClient and not just in distributed lock As a next step, I think it makes sense to report an issue on the Microsoft.Data.SqlClient github and link it here. Do you agree? Then we can close this for now while that plays out. The issue you already linked does seem related but that was closed without action. If we were able to pin down the appearance of the problem to a specific release of the client that might help it gain traction, as would creating a standalone repro that can run against Azure. |
@OskarKlintrot any updates here? Did you end up filing an issue with Microsoft? |
Still working on it, got sidetracked with a man cold :) |
@OskarKlintrot sorry to hear that; hope you feel better soon :-) |
Sorry, got buried in other work. Closing this for now and tracking the issue here. Might re-open this issue if Microsoft.Data.SqlClient don't think it has anything to do with them. |
Thanks! |
@hoerup will you be able to provide a minimal sample that could repro the issue? |
We've seen a dramatic increase in number of lost locks since we upgraded to .NET 7 and EF Core 7. EF Core 7 uses
Microsoft.Data.SqlClient (>= 5.0.1)
but EF Core 6 usedMicrosoft.Data.SqlClient (>= 2.1.4)
(DistributedLock.SqlServer
have a dependency toMicrosoft.Data.SqlClient (>= 2.1.1)
).We started upgrading our services before Christmas and around the same time the amount of lost locks started to increase from a few a week to 10-50 a day.
The only change we have done that we figure might have an impact is upgrading from .NET 6 to .NET 7 and
Microsoft.Data.SqlClient
from4.1.0
to5.0.1
.The text was updated successfully, but these errors were encountered: