-
Notifications
You must be signed in to change notification settings - Fork 300
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Intermittent EF Core/Azure SQL Connection Issues #1527
Comments
Hi @jrpharis, can you provide your connection string options and the query that the failure happens on mostly? Also version of the driver you are using for |
Platform is Windows Not pulling in Here is the connection string we have set in our API's config
I'm not seeing any logs around the error that say what query it was that failed. The error will happen on a number of different APIs, which all hit the DB with different queries. So I don't think it's a single query resulting in this problem. |
This one isn't an sqlclient problem, it's a whole app problem you'll need to look into. The other one might has the same cause if there isn't enough memory available to create an instance of something. |
@Wraith2 Given that before every out-of-memory exception there is an error stating there was an issue trying to connect to the DB, seems to me it's likely the two are related. We don't ever see the out-of-memory problem outside of seeing issues connecting to the DB. So it doesn't really seem like it's a whole app problem. |
So we fail to connect and then you get an oom error and you think it's the connection failure causing the oom? Interesting perspective. |
@Wraith2 I'm not saying there isn't something I can investigate about the memory. But given I only see OOM exceptions when there is a DB connection issue, I find it hard to believe they aren't connected somehow. But what I'm really looking for is why I'm seeing the error to connect to the DB for 10 seconds to 4 minutes, then all of a sudden it's working again. All the while there are also other requests that are connecting to the DB just fine. The OOM exception is also not the only exception we're seeing after the log about not being able to connect. So it seems like there's some issue with how EF is connecting to SQL, or with how SQL Server is handling the connection. The crashes are infrequent, but I'll look into setting something up in our Azure App Service to capture the crash. |
@jrpharis by any chance, if the application structure allows it, is it possible to test with MARS set to false? |
Has any progress been made here? I am experiencing the same issue. |
Can you provide a reproduction? If not then there's still nowhere to start investigating from. |
Hello Can we get this issue addressed? Did the Triage Board ever happen? |
@creed-maxeta How similar is your issue? Do you see SNAT exhaustions as well? |
Sure, if can you provide a way to reproduce it. |
I am currently working with Azure Support through my company's Azure account on this issue. We continue to see random DB connection errors ("An error occurred using the connection to database '"*******"' on server '"***********"'." from Microsoft.EntityFrameworkCore.Database.Connection.ConnectionError) immediately followed by OOM exceptions usually thrown from within the We have seen app crashes in the past but have not seen it since I enabled the app crash monitoring. One current theory is SNAT port exhaustion, as we have seen errors for SNAT exhaustion that correlate to when we see the DB connection/OOM exception issues. However, more often than not we do not SNAT port exhaustion. Even still, I'm currently working on implementing VNET/NAT Gateway around our App Service/DB infrastructure to try to limit the SNAT exhaustion even (per Azure Support's suggestion) ** As @Wraith2 mentioned before, it's an interesting take to say the DB connection is causing the OOM exception. However, given we ONLY see the OOM when we see the DB connection issue, and that the stracktrace for the OOM exception is always deep into the |
Are you using Serilog? |
@ErikEJ Yes, we use Serilog |
Maybe this? RehanSaeed/Serilog.Exceptions#100 |
@ErikEJ I don't believe this would be related. We are not using the Serilog.Exceptions enricher from that ticket. And it doesn't seem like we're even hitting the DB due to the DB connection issue. However, the OOM exception does usually say it failed to iterate over the results. So I suppose the driving issue of the ticket linked above could possibly be at play. I've increased the level of logging for a few of the EF namespaces to see if I can get more context. |
Hello @ErikEJ Do you have other suggestions? It appears that issue occurs randomly. The only useful information I found was that certain options exist for connection issues. Here https://learn.microsoft.com/en-us/ef/core/miscellaneous/connection-resiliency. Which retries the connection after the failure. Do you have other docs that may help? |
I was researching on the problem I had today, where I started getting Login failed for user '' from my ef core db context and the error messages described in this post followed. I'm not sure if my issue is related to the one raised in this post but I'll contribute anyway just in case... In my DBContext I have this code: Just when I was about to try reboot my computer, I noticed I had a Remote Desktop Connection opened and connected to one of my other computers. I had a slight suspicion that this may be the cause of the issue so I closed the RD and tried to run my code again. It worked! The reason I had the suspicion is because I had noticed recently, when I have Remote Desktop Connection open, it affected YouTube video, in a way that it kept buffering the video eventhough I have a very high speed internet connection. I tried open the RD again but this time the code still runs without problem. So I think perhaps only when the RD is opened for a good amount of time then it would start affecting other programs that uses internet connection. May be total bull, but for a issue raised a year ago and still no solution, maybe worth to check when the issue occured, was an RD window opened? |
@ConaxLiu For the issue I raised, this was happening in an Azure Web App deployed to an Azure App Service Plan. So having a remote desktop connection open wouldn't be a possible culprit. |
Today, I had the same errors with .NET6 App on App Service for Windows Plan. |
@Wraith2 I'm having the same problem and it's hard to reproduce because it happens randomly. Also the exception itself is not much helpful: I only have another exception related to this request:
|
This is weird, I have the same problem in a simple CRUD API using EF core and SQL server .. :| |
This has to do with Task and async |
This makes sense, I enabled logging for Microsoft.EntityFrameworkCore.* and then these Exceptions started showing up, while my code ran just fine and no one was complaining about any errors. That Exception apparently happens when a task gets aborted via a CancellationToken. |
I have the sam problem on my MacOS.
Random error: need to wait 10 minutes.. then it works again... So probably some network issue? How can this even happen, if there is conneciton pool and 1000 connections are not so many. |
It was suggested on my original issue I move this here.
We have .NET 6 API using EF Core hosted in Azure App Services using Azure SQL DB hosted in the same Azure resource group and region. We're seeing intermittent connection issues where the API throws an exception not being able to connect to the DB. However, these issues last no more than a couple minutes at most and there are successful requests to the API that hit the DB immediately before and after, as well as during the issue (i.e. not all the requests into the API have this issue when we're seeing it). No action is needed to correct the issue, it resolves itself. However, we'd like to identify root cause given the fact its a production environment saying the database can't be reached.
We first see an error log (via Datadog) saying an error occurred trying to connect to our DB:
There will then be one of the following two errors in the logs for the request:
This error suggests making changes to the firewall rules to ensure the host can access the DB. The firewall rules are correct, but further more there are requests during the time period we're seeing this issue successfully connecting to the DB.
When this error occurs, we'll see the API crash and Azure will spin the instance back up.
It does not make any sense the DB connection issues pop up for some requests but not others, and then just simply stop sometimes in a couple seconds, but up to a couple minutes. Any direction to determine root cause of this would be helpful.
The text was updated successfully, but these errors were encountered: