-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OOM blow up in the worker #888
Comments
step 482mb 🤔 |
We have a bunch of mitigations against this:
This isn't foolproof though: synchronous operations can consume memory way above the limit before the thread is killed. Synchronous operations like stringifying large objects (as the runtime goes. What I think we're seeing is that the parent process memory - the actual worker - is blowing up. And unfortunately when that happens there's not very much we can do. Because the worker has no persistent layer, it can't send any updates to lightning. Some thoughts:
|
There are several possible points of memory failure:
It's this third one that's happening now I think. This is the absolute worst case because if the worker itself blows up, all bets are off. I have a repro case here with a 20mb state object and a low 50mb limit on the main worker thread.
A couple of good questions to ask are:
EDIT: Oh but hang on, the payload size is enforced by the worker, right at the last minute. The engine sends all payloads up to the worker, it doesn't enforce the limit at all. And yet, thinking about it, this is putting the worker in jeopardy. It's blown up processing the payload before it's even decided whether to send it back to lightning. |
Yep, OK, zeroing in on this thing now. I've confirmed that if the payload is too big, it's possible to trigger an OOM exception in the main worker thread while receiving the data. Not sure if it's the worker thread or the child process causing the problem. So if I postMessage with a payload that's too big, we get a blow up. And there's no defense around this. Options as I see them right now:
For 2, I know I can use a shared memory buffer between the worker thread and its parenting child process. I'm more worried about the cross process messaging. I do think I can send my own socket or server there, presumably I can then use streaming to send messages more efficiently. Tricky one. I think we probably should explore 2) because it results in a more robust architecture. But also there's no point in emitting data if it's not going to be used downstream, so that suggests 1) is the correct approach. |
Had a little look at 2. You can apparently pass your own socket to handle the IPC messaging (as I understand it). See nodejs docs Perhaps I'm naive but I hoped this would stream messages more efficiently and let me do more with less memory. Maybe I'm just missing a step and I just don't get this for free. But I've got an implementation and it doesn't seem to help at all :( I think it's using the socket - I really don't get much feedback - but I still get the same sorts of blowups This is on branch So I'm gonna pause this and switch to approach 1, which to be fair we do need anyway. Trying this out makes me think that instead of using IPC for inter process messaging, I should just create a local websocket server and have worker threads call out to it directly. That still assumes we can stream JSON payloads with a much lower memory overhead. Which I think seems reasonable? |
We've caught a case of the a worker blowing up for OOM
Cloud link
The text was updated successfully, but these errors were encountered: