Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix CPU spin in the JavaScript implementation side #567

Closed
wants to merge 7 commits into from

Conversation

lifengl
Copy link

@lifengl lifengl commented Dec 5, 2022

This PR is to fix problems that getBufferFrom keeps spinning and consumes lots of CPU, also leads product to hang.

This happens when the stream has partial data ready to read, because read(size) returns null (unless the stream is closed), but because there are unread data in the stream, the code will not block waiting on anything but immediately tries to read again. This would consume lots of CPU. On the other hand, because the length of pipeline can be limited. when the reader wants a larger block over the size of the pipeline buffer, it would never get the data, because the writer cannot write any more data until reader takes some bytes out of the stream.

This PR is intended to use readableLength to read partial data out, and joins them on the reader side when it is necessary. However, the PR turns out to be more complicated due to this state is not defined in the ReadableStream interface, but in the implementation (Readable). Not sure why an important property like this is not included in the contract. So the code lands to keep the old behavior unless readableLength is available. Also, i kept running into memory issues in unit tests. There were some event handler which can leak memory, and was fixed. But it turned out that the real reason is that reableEnded can be false when streamEnded event is fired. Because the earlier changed code depends on the state, it ends up spinning in the function. Interestingly, it leads out of memory.

@lifengl lifengl marked this pull request as ready for review December 6, 2022 07:24
@AArnott AArnott marked this pull request as draft December 6, 2022 13:27
@AArnott AArnott changed the title Draft: try to fix CPU spin in the JavaScript implementation side Fix CPU spin in the JavaScript implementation side Dec 6, 2022
@AArnott
Copy link
Collaborator

AArnott commented Dec 6, 2022

@lifengl I converted your PR to be a 'draft' to match your PR's title, and then I stripped 'draft: ' from the title since those tend to be accidentally left there when we merge.
Just let me know when you're ready for me to look at this (or convert the PR back to "Ready for review" state.

@lifengl lifengl marked this pull request as ready for review December 6, 2022 18:04
@lifengl
Copy link
Author

lifengl commented Dec 7, 2022

@AArnott : i think the PR is now ready to be reviewed. Not sure why C# unit tests failed, i didn't touch code there.

@AArnott
Copy link
Collaborator

AArnott commented Dec 8, 2022

Not sure why C# unit tests failed, i didn't touch code there

When I look at the pipeline failures, there are no C# unit test failures. The failures or in the tslint step, presumably due to your code changes.

https://dev.azure.com/andrewarnott/OSS/_build/results?buildId=7445&view=logs&jobId=2d2b3007-3c5c-5840-9bb0-2b1ea49925f3&j=2d2b3007-3c5c-5840-9bb0-2b1ea49925f3&t=dd17a03b-e94c-5c24-6448-258c75164f22

Copy link
Collaborator

@AArnott AArnott left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to chat with you about how we can write a test that verifies the fix. Once we have that, I may try my hand at authoring a fix that doesn't triple the size of the function (and make it complicated enough that I am struggling to understand it).

@lifengl
Copy link
Author

lifengl commented Dec 9, 2022 via email

@AArnott
Copy link
Collaborator

AArnott commented Dec 9, 2022

Thanks for educating me on void 0 and more about ??. I've submitted a commit to revert my broken 'fix'.

@lifengl
Copy link
Author

lifengl commented Dec 11, 2022

@AArnott : I added an unit test to cover the fix. Somehow, I could not reproduce backpressure in the unit test, which might be the behavior of a PassThrough stream being used there. But it turned out that it was more straightforward to write a test for the CPU spin issue. It just need the write side to write block of data in two pieces with await delay in the middle. If the read side spins the CPU, it would not allow the write side to finish the work, which would lead the test to hang without the fix. (I verified it by taking the unit test to the main branch). With the fix, the same test passes.

I did some cleanup to remove some redundant condition checks.

@AArnott
Copy link
Collaborator

AArnott commented Dec 11, 2022

Thanks. I noticed your additional commits. Thank you very much for the test. I spent a good portion of Saturday working in and around your PR, with the goal to both understand and simplify the getBufferFrom method. I ended up adding a few more generally useful helper methods. I have almost everything working, but I discovered that Duplex.unshift doesn't work, which my new design depends on, so I'm looking for a solution to that before updating this PR and merging it.

@AArnott AArnott force-pushed the dev/lifengl/fixCPUSpin branch from 3121df7 to a075a7e Compare December 12, 2022 15:15
@AArnott AArnott added this to the v2.10 milestone Dec 12, 2022
Copy link
Collaborator

@AArnott AArnott left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I force-pushed to your branch to keep your test changes but rewrite the product changes (and add features too).
Can you confirm this still works for your scenario (as the changes are substantial) and then I'll merge the PR.

@lifengl
Copy link
Author

lifengl commented Dec 12, 2022

It doesn't work. The product would just hang.

@AArnott
Copy link
Collaborator

AArnott commented Dec 12, 2022

Rats. Any chance you can debug into it and at least write a test that fails?
I have been amazed at how difficult it is to understand and get node.js streams working, and all the tests passing at once.

@lifengl
Copy link
Author

lifengl commented Dec 12, 2022

BTW, using unshift is tricky, because it will fail once 'end' event is fired, in which case, it cannot unshift back a block of data for another reader to pick it up.

@AArnott
Copy link
Collaborator

AArnott commented Dec 12, 2022

oooh. That's a very interesting point. I wonder when the end event is fired, typically. Will it take one last read() call that returns null for end to fire? If so, unshift may be safe to call before that. But then, my own sliceStream reader will voluntarily throw in a push(null) if it knows nothing more is available later, which I guess would raise the end event and block unshift. But I guess the only time I use unshift is within sliceStream itself, so hopefully that wouldn't be a problem.

@lifengl
Copy link
Author

lifengl commented Dec 12, 2022

Rats. Any chance you can debug into it and at least write a test that fails? I have been amazed at how difficult it is to understand and get node.js streams working, and all the tests passing at once.

yes, with/without a later fix, there are still other problems. Basically after keep using for a while, connection will be suddenly closed. it is also hard to find out why.

@AArnott
Copy link
Collaborator

AArnott commented Dec 12, 2022

Do you recommend I bring back your version of the fix?

@lifengl
Copy link
Author

lifengl commented Dec 13, 2022

Do you recommend I bring back your version of the fix?

yes, originally I wanted to take a look whether I could spend more time on it, but our check point is couple days away. I agree we need do further cleaning up here, but i think the priority is to unblock other work to move forward. As the issue essentially blocks testing slightly bigger input, and made no meaningful way to see anything potentially affected by scale.

@lifengl lifengl closed this Dec 13, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants