-
Notifications
You must be signed in to change notification settings - Fork 205
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bloop server sometimes becomes unresponsive #2594
Comments
That looks like maybe we have a memory leak? We're you maybe able to check the Bloop JVM with visualvm or similar? The behaviour is usually connected to having a lot of GC run and using most of the JVM processing time |
Maybe this is causing the memory leak, we might have never optimized it properly.
I don't see any changes that could cause it in Bloop. Is it possible that copying resources to output dirs could have caused it? 🤔 |
I'm continuing to poke at this. Just now I managed to repro again. Attaching something like visualvm is a bit annoying because of the environment in which this is running, but I was able to get a JFR profile with
Looking into that, memory use does not seem problematic - seems actually pretty low. However, last time I managed to repro this, I got a bit more info: I noticed all interactions with the bloop server started hanging around 8:55. Metals got stuck importing a build, at the CLI
That error lines up with the time at which bloop was not responding to requests. Is there a way to increase the verbosity of this log file? (Is there anywhere else I can get better logs?) |
So it looks like the server connection somehow stopped working and after the reset everything went back to normal 🤔 I wonder why it took so long for that exception to happen. Were you able to get any stack traces of Bloop during that time? Might be possible to see some problematic threads. |
I think I ran into the same issue... Bloop compile very slow (both from Metals and from cli) and the process' memory usage goes up to 12GB (I guess that's all it can get, it's a 16GB machine). Eventually, when it finishes, memory usage goes down to 6GB. Clean + compile doesn't solve the issue. There are some compilation errors though (ambiguous implicits). Bloop v2.0.8 Output log is:
|
Hm, I turned off best effort compilation in metals, killed all java processes, and now it seems things are much quicker, and memory usage went back to normal after the compilation finished. I'll keep monitoring the situation. |
@voidcontext let us know if the issue comes up again. Maybe there is some memory leak for best effort 🤔 |
@tgodzik I haven't seen this issue since best effort compilation has been turned off. It seems I forgot to mention, I've seen this issue in Scala 3 projects (I don't work on any Scala 2 projects at the moment, so that might be affected too). |
Just flagging I'm still hitting this constantly. I've taken to programmatically detecting the bad state by polling a call to I'm only seeing this with Scala 2.12.17 at the moment. |
Did you manage to get stack traces of Bloop when the hang occurs? I wasn't able to reproduce the issue. |
@tgodzik Yeah, here's a jstack dump. It is very large - not sure if you have a way to cut to the interesting stuff.
|
One thing I noticed is that there is 18 CLI commands waiting, which is unexpected since it's rarely more than one. Is it possible that they are not cancelled when you cancel the command line request? I also have a couple on my local instance, but that's only 4 after a couple of days being run 🤔 There is also 12 processes waiting. on Forker.scala, which is also unexpected because that' usually tests or main classes being run this way. So that would make it 12 processes not finishing. Any idea if there are any child processes there? |
Is there a way to list out those active requests? The only CLI invocations that should be made are:
The processes are themselves probably mostly my Scala programs that themselves set up bloop clients, but the programs themselves are spawned with |
Here's another example with less processes running. Taken at the same time, here's the JPS output:
Here, I expect all of |
Not entirely sure, but it seems there is a lot of IO activity forwarding outputs from those programs. Might be worth eliminating running them and see if that is responsible for the freezes
That would explain the additional entries, they will most likely hang as well when there is an issue. Does the issue happen if that health check is not being run? Metals already sends requests to make sure Bloop is running and will change the status in case of any problems. Both of the above could only be symptoms though, but I can't find anything else going on aside from the bsp connection. |
I think that does avoid the issue. However, it is quite annoying in that now I need to recreate a potentially pretty long CLI to Given that plus your comment about the IO activity: is there a good way to get Bloop to just dump out the invocation it would make instead of actually running it? Then, I could just run it directly.
Yes, it does. |
Ok, I wonder whether we can fix that or the only option is not to run multiple apps. I will try and figure it out. Metals actually does run plain Java whenever only the run option is selected. We generate classpath jars in .metals directory to avoid issue with too long commands. I don't think it's easy to do from command line. |
I ended up generating manifest JARs (much like I think Metals does) by first creating a short-lived Bloop client which just makes a couple queries to determine the runtime classpath and JVM options then crafts the manifest JAR and command line for running Java. Then, that Now, I no longer get hanging behaviour. Prior to this, I also confirmed that the hang occurred in the Fingers crossed, this will put to bed this issue. If I don't run into issues in the next two weeks, I'll come back to close this (or feel free to close it for me). I do think there's something buggy somewhere (probably in |
Looks that way, but at least this should be in a way reproducible and we have a stack trace. I wonder whether is maybe some limit of thread number in some place? |
It doesn't seem to be bounded, so it should not cause any issues as far as I can see https://github.com/scalacenter/bloop/blob/main/frontend/src/main/scala/bloop/engine/ExecutionContext.scala#L50 |
Sorry for the very generic title - I'm just a bit at a loss to figure out what information would be most helpful to collect.
The failure
The failure mode is weird: the Bloop server becomes very slow to respond to requests, and at some point seems to not respond at all
bloop projects
from the CLI becomes super slow or hangs indefinitely!My setup that runs into this...
Some maybe useful context:
Some logs
(Anything else worth collecting here?)
Metals logs when I try to reconnect to Bloop as prompted
Bloop daemon logs
The text was updated successfully, but these errors were encountered: