Fix CPU spin in the JavaScript implementation side #567

lifengl · 2022-12-05T02:32:16Z

This PR is to fix problems that getBufferFrom keeps spinning and consumes lots of CPU, also leads product to hang.

This happens when the stream has partial data ready to read, because read(size) returns null (unless the stream is closed), but because there are unread data in the stream, the code will not block waiting on anything but immediately tries to read again. This would consume lots of CPU. On the other hand, because the length of pipeline can be limited. when the reader wants a larger block over the size of the pipeline buffer, it would never get the data, because the writer cannot write any more data until reader takes some bytes out of the stream.

This PR is intended to use readableLength to read partial data out, and joins them on the reader side when it is necessary. However, the PR turns out to be more complicated due to this state is not defined in the ReadableStream interface, but in the implementation (Readable). Not sure why an important property like this is not included in the contract. So the code lands to keep the old behavior unless readableLength is available. Also, i kept running into memory issues in unit tests. There were some event handler which can leak memory, and was fixed. But it turned out that the real reason is that reableEnded can be false when streamEnded event is fired. Because the earlier changed code depends on the state, it ends up spinning in the function. Interestingly, it leads out of memory.

AArnott · 2022-12-06T13:28:23Z

@lifengl I converted your PR to be a 'draft' to match your PR's title, and then I stripped 'draft: ' from the title since those tend to be accidentally left there when we merge.
Just let me know when you're ready for me to look at this (or convert the PR back to "Ready for review" state.

lifengl · 2022-12-07T01:12:16Z

@AArnott : i think the PR is now ready to be reviewed. Not sure why C# unit tests failed, i didn't touch code there.

AArnott · 2022-12-08T02:04:34Z

Not sure why C# unit tests failed, i didn't touch code there

When I look at the pipeline failures, there are no C# unit test failures. The failures or in the tslint step, presumably due to your code changes.

https://dev.azure.com/andrewarnott/OSS/_build/results?buildId=7445&view=logs&jobId=2d2b3007-3c5c-5840-9bb0-2b1ea49925f3&j=2d2b3007-3c5c-5840-9bb0-2b1ea49925f3&t=dd17a03b-e94c-5c24-6448-258c75164f22

AArnott

I'd like to chat with you about how we can write a test that verifies the fix. Once we have that, I may try my hand at authoring a fix that doesn't triple the size of the function (and make it complicated enough that I am struggling to understand it).

src/nerdbank-streams/src/tests/MultiplexingStream.SeededChannels.spec.ts

src/nerdbank-streams/src/Utilities.ts

lifengl · 2022-12-09T08:00:37Z

Replacing with || will break the function here, because it will not keep 0 ?? 5 === 0 but 0 || 5 === 5. It would make the logic block to check availableSize === 0 never returns true, and I believe this could leave the function to spin when the stream is closed. Your statement that the behavior of '??' expression in TypeScript is different from JavaScript doesn't match documents I read earlier. It would be a real problem if TypeScript to break JavaScript behavior. void 0 means 'undefined' in JavaScript. It was written in this way, because undefined can be used as a variable name, and assigned with a different value.

AArnott · 2022-12-09T18:08:26Z

Thanks for educating me on void 0 and more about ??. I've submitted a commit to revert my broken 'fix'.

Reproduces dotnet#557

Also export new `sliceStream` and `readAsync` functions.

lifengl · 2022-12-11T19:36:45Z

@AArnott : I added an unit test to cover the fix. Somehow, I could not reproduce backpressure in the unit test, which might be the behavior of a PassThrough stream being used there. But it turned out that it was more straightforward to write a test for the CPU spin issue. It just need the write side to write block of data in two pieces with await delay in the middle. If the read side spins the CPU, it would not allow the write side to finish the work, which would lead the test to hang without the fix. (I verified it by taking the unit test to the main branch). With the fix, the same test passes.

I did some cleanup to remove some redundant condition checks.

AArnott · 2022-12-11T19:39:20Z

Thanks. I noticed your additional commits. Thank you very much for the test. I spent a good portion of Saturday working in and around your PR, with the goal to both understand and simplify the getBufferFrom method. I ended up adding a few more generally useful helper methods. I have almost everything working, but I discovered that Duplex.unshift doesn't work, which my new design depends on, so I'm looking for a solution to that before updating this PR and merging it.

AArnott

I force-pushed to your branch to keep your test changes but rewrite the product changes (and add features too).
Can you confirm this still works for your scenario (as the changes are substantial) and then I'll merge the PR.

lifengl · 2022-12-12T19:00:43Z

It doesn't work. The product would just hang.

AArnott · 2022-12-12T19:05:20Z

Rats. Any chance you can debug into it and at least write a test that fails?
I have been amazed at how difficult it is to understand and get node.js streams working, and all the tests passing at once.

lifengl · 2022-12-12T20:00:08Z

BTW, using unshift is tricky, because it will fail once 'end' event is fired, in which case, it cannot unshift back a block of data for another reader to pick it up.

AArnott · 2022-12-12T20:03:25Z

oooh. That's a very interesting point. I wonder when the end event is fired, typically. Will it take one last read() call that returns null for end to fire? If so, unshift may be safe to call before that. But then, my own sliceStream reader will voluntarily throw in a push(null) if it knows nothing more is available later, which I guess would raise the end event and block unshift. But I guess the only time I use unshift is within sliceStream itself, so hopefully that wouldn't be a problem.

lifengl · 2022-12-12T20:04:01Z

Rats. Any chance you can debug into it and at least write a test that fails? I have been amazed at how difficult it is to understand and get node.js streams working, and all the tests passing at once.

yes, with/without a later fix, there are still other problems. Basically after keep using for a while, connection will be suddenly closed. it is also hard to find out why.

AArnott · 2022-12-12T20:05:25Z

Do you recommend I bring back your version of the fix?

lifengl · 2022-12-13T08:15:31Z

Do you recommend I bring back your version of the fix?

yes, originally I wanted to take a look whether I could spend more time on it, but our check point is couple days away. I agree we need do further cleaning up here, but i think the priority is to unblock other work to move forward. As the issue essentially blocks testing slightly bigger input, and made no meaningful way to see anything potentially affected by scale.

lifengl marked this pull request as ready for review December 6, 2022 07:24

AArnott marked this pull request as draft December 6, 2022 13:27

AArnott changed the title ~~Draft: try to fix CPU spin in the JavaScript implementation side~~ Fix CPU spin in the JavaScript implementation side Dec 6, 2022

lifengl marked this pull request as ready for review December 6, 2022 18:04

AArnott requested changes Dec 8, 2022

View reviewed changes

src/nerdbank-streams/src/tests/MultiplexingStream.SeededChannels.spec.ts Show resolved Hide resolved

src/nerdbank-streams/src/Utilities.ts Outdated Show resolved Hide resolved

Lifeng Lu and others added 4 commits December 10, 2022 11:55

Add test demonstrating spinning hang

e242a99

Reproduces dotnet#557

Fix memory leaks in event handlers

d5ffbe7

Simplify Substream tests

e999178

Fix readSubstream to not buffer everything first

9b792d2

Also export new `sliceStream` and `readAsync` functions.

AArnott added 3 commits December 12, 2022 07:16

Test and fix more use cases

3fa6850

Optimize

6586db6

Fix up linting

a075a7e

AArnott force-pushed the dev/lifengl/fixCPUSpin branch from 3121df7 to a075a7e Compare December 12, 2022 15:15

AArnott added this to the v2.10 milestone Dec 12, 2022

AArnott approved these changes Dec 12, 2022

View reviewed changes

lifengl closed this Dec 13, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix CPU spin in the JavaScript implementation side #567

Fix CPU spin in the JavaScript implementation side #567

lifengl commented Dec 5, 2022 •

edited

Loading

AArnott commented Dec 6, 2022

lifengl commented Dec 7, 2022

AArnott commented Dec 8, 2022

AArnott left a comment

lifengl commented Dec 9, 2022 via email •

edited

Loading

AArnott commented Dec 9, 2022

lifengl commented Dec 11, 2022

AArnott commented Dec 11, 2022

AArnott left a comment

lifengl commented Dec 12, 2022

AArnott commented Dec 12, 2022

lifengl commented Dec 12, 2022

AArnott commented Dec 12, 2022

lifengl commented Dec 12, 2022

AArnott commented Dec 12, 2022

lifengl commented Dec 13, 2022

Fix CPU spin in the JavaScript implementation side #567

Fix CPU spin in the JavaScript implementation side #567

Conversation

lifengl commented Dec 5, 2022 • edited Loading

AArnott commented Dec 6, 2022

lifengl commented Dec 7, 2022

AArnott commented Dec 8, 2022

AArnott left a comment

Choose a reason for hiding this comment

lifengl commented Dec 9, 2022 via email • edited Loading

AArnott commented Dec 9, 2022

lifengl commented Dec 11, 2022

AArnott commented Dec 11, 2022

AArnott left a comment

Choose a reason for hiding this comment

lifengl commented Dec 12, 2022

AArnott commented Dec 12, 2022

lifengl commented Dec 12, 2022

AArnott commented Dec 12, 2022

lifengl commented Dec 12, 2022

AArnott commented Dec 12, 2022

lifengl commented Dec 13, 2022

lifengl commented Dec 5, 2022 •

edited

Loading

lifengl commented Dec 9, 2022 via email •

edited

Loading