-
Notifications
You must be signed in to change notification settings - Fork 804
fix: improve state transitions of Windows named pipes #1903
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: improve state transitions of Windows named pipes #1903
Conversation
When we've previously failed to read, we may be in a failed state. Inspired by how `schedule_read` handles this case, we simply reset the state to `Err` and notify that we are able to deliver the error.
Applies the same (potential) fix from `read_done` to `write_done`.
| let mut io = me.io.lock().unwrap(); | ||
| let mut buf = match mem::replace(&mut io.read, State::None) { | ||
| State::Pending(buf, _) => buf, | ||
| State::Err(e) => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Specifically, I think this can be triggered here: https://github.com/thomaseizinger/mio/blob/64fc40ef248af392e1b01b0d4749506927cbc4cd/src/sys/windows/named_pipe.rs#L742
| // Move from the `Pending` to `Ok` state. | ||
| let mut io = me.io.lock().unwrap(); | ||
| let mut buf = match mem::replace(&mut io.read, State::None) { | ||
| State::Ok(buf, pos) => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure this can happen but it seems safe to just handle it. If read_done is called twice in a row, this could possibly trigger but I don't know enough about the internals here and how Windows handles this to say for sure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All we do further down is notify that the buffer needs to be consumed. If we are still in Ok, I am guessing this means it hasn't been consumed yet.
The `mio` library which underpins `tokio` has a bug on Windows in regards to named pipes where under certain circumstances an "unreachable code" section is entered. See tokio-rs/mio#1819 for the upstream bug report. In this PR, we patch in a fork of `mio` that aims to fix these issues by handling the state transitions more gracefully. I am not a Windows expert by any means so this will need some rigorous testing to make sure the IPC channel between GUI and Tunnel service still works reliably. Related: tokio-rs/mio#1903
The `mio` library which underpins `tokio` has a bug on Windows in regards to named pipes where under certain circumstances an "unreachable code" section is entered. See tokio-rs/mio#1819 for the upstream bug report. In this PR, we patch in a fork of `mio` that aims to fix these issues by handling the state transitions more gracefully. I am not a Windows expert by any means so this will need some rigorous testing to make sure the IPC channel between GUI and Tunnel service still works reliably. Related: tokio-rs/mio#1903
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still don't know if this fixes the problem. I don't know enough about Windows to determine it from the code, nor does this come with a test that failed before the change. Nevertheless it seems reasonable, so I'm going to merge it.
Thanks @thomaseizinger and apologies it took this long
|
In case you want to hold off with a release, I am currently rolling this change out to our users. It might take a while for them to upgrade but I should know in a week or two whether the crash regresses or is fixed. |
|
I actually merged this now so it can be included in the next release: #1909 😅 |
Fair enough! :) |
Disclaimer: I am by no means a Windows expert. The changes presented here are built from reading through the code and trying to reason about the possible state transitions. I've tried to explain in each commit, why I think the resulting change is safe to make.
I'll apply this to our codebase and see if it fixes the crash reports that we are getting from users. In the meantime, feedback is most welcome.
Resolves: #1819