-
-
Notifications
You must be signed in to change notification settings - Fork 1.7k
Potential infinite loop with canvas recording web worker #13743
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Hi folks, I reported this issue via support & they advised to come here. Happy to provide more information as needed. To explain in a bit more detail: We have an internal application built on node.js with Vue.js. We are running the Vue Sentry package, version 8.25.0 (which we realise is a couple versions behind current). This issue was reported by our internal users who are having tabs in Chrome (latest Chrome version, running on Chromebooks - Intel i5s with 8GB of RAM) freeze when performing certain actions. Some of these actions are triggering console errors, which might be contributing to the behaviour, but I'm not sure about this. When looking at a frozen tab, there's not much we can diagnose - devtools is locked up. We can see in Chrome Task Manager that there are many, many dedicated workers running under the frozen Chrome process, and memory usage seems significantly higher than normal. The tab will remain frozen, with periodic dialogs from Chrome asking if we want to wait or exit. I think waiting does nothing but simply try to spin up more dedicated workers, though it's hard to tell because the machine is a bit unwieldy by this point and there are so many it is hard to see what is going on - the only recovery is to close the tab. We made a little Chrome extension to override the 'new Worker' class just to see if we could identify the issue, and captured the stack trace when the workers were created, and it showed something like the following, repeated over and over again:
I was able to reproduce this on my machine by:
Doing that would regularly trigger a burst of worker creation. On my more powerful laptop (i7 / 32GB) I triggered about 100 workers being created at once, though it didn't cause any noticeable performance issues. My guess is that on the lower spec machines, when a lot of workers are created it simply crawls to a halt and then crashes, and that there is a loop or race condition that is triggering endless worker creations in the Sentry Replay code, either as a direct result of something weird in our code or just a random bug somewhere. There are two things we have on our TODO to try here:
Open to any other suggestions as well if it helps zero in on the issue. |
Thanks for the detailed description @trogau -- just want to clarify a few details:
|
I should note: I have not yet actually captured a stack trace from an actual crash; we haven't had one for a few days where the extension was actually running and logging data. The events we've been capturing so far - which again show up to around ~100 workers getting created, which doesn't seem like enough to cause a crash even on the Chromebooks - are happening relatively frequently though. |
We captured a stack trace from a freeze this morning & seems to confirm it is mass creation of workers that causes the problem. Attached is a log snippet showing about 1008 workers created in ~3 seconds, which froze the browser tab. Not sure how helpful it is but just thought I'd include it for reference. |
@trogau thanks for the insights, could you also specify which tasks you are running on the canvas? Is it like a continuous animation or a static canvas – this might help reproducing the issue. |
@chargome : I'm double checking with our team but AFAIK the pages where we're seeing this happen do not have any canvas elements at all. We do have /some/ pages with canvas (a MapBox map component) but this isn't loaded on the page where we're seeing the majority of these issues. We do have |
FYI we've upgraded to v8.31.0 and still seeing large numbers of workers created (just had one instance of 730 created in a few seconds - not enough for it to crash the tab so the user didn't notice but we see it in the logging. The magic number seems to be about 1000 workers being enough to freeze the tab on these devices. |
Hi @billyvg - we don't do anything custom with the Replay integration - just set it up in init and that's it. v8.25.0 is what we were using initially that definitely did have the problem - happy to downgrade if there's something specific we can test, but I can confirm v8.25.0 was where we first experienced the issue. |
@billyvg : FYI just had our first freeze on v8.34.0 - can see it triggered ~1000 workers created in ~2 seconds which crashed the machine. |
@trogau ok, can you try two things:
|
@billyvg : we've just deployed this & have it logging now. I've got one sample from an unrelated error but it contains a lot of info including stuff that might be sensitive/internal so I'm a bit reluctant to post it publicly - is there a way to send it to you directly? I assume the goal is to try to see if there's an error generated when we have another crash/freeze incident though? |
@trogau Yeah exactly, want to see if there are errors causing it to stop recording and freeze. Feel free to email me at billy at sentry.io |
Just sent the first log file through! |
Thanks! I'll take a look |
@billyvg : we are having an increase in the frequency & spread of this issue amongst our staff - not sure if it's due to code changes in the last couple versions, but it's causing increasing disruptions to their workflow, so we might have to disable it for a while unfortunately until there's some concrete progress. This will also help us confirm 100% that it's Replays that are responsible - it seems pretty likely but we haven't disabled them completely yet so this will at least rule that out. If we're doing this, in the interests of testing the most useful way: can we just set |
@trogau yeah that seems reasonable, an alternative would be to only remove the canvas recording (though I don't know how much of your replays depend on that). Setting the sample rates to 0 will be enough to turn off (provided you don't have any custom SDK calls to start it). |
@trogau sorry for the delay, we've released 8.39.0 which does not re-create workers once recording stops. |
Awesome thanks so much, we'll look at upgrading the next time we have a window (with Xmas approaching might not be til next year now unfortunately but will see how we go). FWIW we disabled the canvas recoding and as suspected that immediately made the problem go away. We actually don't really need the canvas recording so are likely to leave it disabled in any case, but we'll continue to upgrade. Thanks for the effort! |
@trogau glad to hear. I'll go ahead and close this issue then, please feel free to open up another one if you encounter problems after upgrading. |
Seeing a potential cycle with canvas replay web worker. Stack trace looks something like this:
Then the part that seems to cycle:
Customer says this generally happens after a user comes back to the tab from a long period of idling.
Zendesk ticket
The text was updated successfully, but these errors were encountered: