-
Notifications
You must be signed in to change notification settings - Fork 3.9k
core: DelayedStream cancels provided stream if not using it. #2618
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Resolves grpc#1537 Also disallow cancel() before start(). DelayedClientTransport.shutdownNow() races with stream start(), thus it shouldn't call cancel() directly. It would delay the cancellation until the stream is started.
@ejona86 can you look at this, it is blocking 1.1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. This looks great.
started = true; | ||
savedPendingCancelReason = pendingCancelReason; | ||
} | ||
super.start(listener); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add a comment how this doesn't do any I/O if savedPendingCancelReady != null
, since there won't be a real transport/stream.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why does this matter?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we're immediately cancelling, it seems pretty weak to start if it is going to do I/O. Doing unnecessary I/O in presence of cancellation is generally a recipe of cascading failure. There are two things important here though: 1) the cancellation involved here is expected to be rare and only impact a few streams since it is only due to a race between the stream creation and the delayed client transport shutdown and 2) even when it happens, no I/O is done.
// ClientStream.cancel() must be called after start() | ||
stream.start(NOOP_STREAM_LISTENER); | ||
if (savedError != null) { | ||
stream.cancel(savedError); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's not cancel with savedError
. We're cancelling the stream because of the incorrect API usage, not because of a previous error.
At that point error
itself can be deleted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The misuse case is calling setStream()
twice, which is the else
branch.
This branch is for calling setStream()
after cancel()
, which is legit, and I think the real stream should be cancelled with savedError
which was used to cancel the delayed stream.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the real stream should be cancelled with savedError which was used to cancel the delayed stream.
That probably doesn't matter at all. A simple Status.CANCELLED.withDescription("cancelled before setStream") or "the stream was only started so it could be cancelled" would seem fine to me. The application won't end up seeing the status. When debugging, I would probably prefer to see the hard-coded status instead of savedError
as well, since the reason for the cancel is the limited API that required the start() in the first place.
This branch is for calling setStream() after cancel(), which is legit
Hmm... If it is legit, we have to make sure it doesn't happen frequently and could only impact a few streams. And it seems that doesn't hold for MetadataApplierImpl
and may not hold for DelayedClientTransport{,2}
. That means the fix could actually cause stability problems whereas the current code isn't causing any. Hmm...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For my own record, the issue is setStream()
may be scheduled for an unbounded period of time into the future. For example, CallCredentials
may take a long time to fetch the credentials before triggering setStream()
(in MetadataApplierImpl
), prior to which the RPC may be cancelled due to deadline-exceeded. In those cases, real stream are created before setStream()
is called, only to be started (as required by cancel()) and then cancelled. start()
typically involve I/O, e.g., sending headers, which are wasted. If the delay before setStream()
is long enough, this will have a high chance to happen, unlike typical race conditions with low likelihood.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't feel comfortable yet with the start-to-just-cancel of the real streams currently present, because it seems it could happen en masse.
// ClientStream.cancel() must be called after start() | ||
stream.start(NOOP_STREAM_LISTENER); | ||
if (savedError != null) { | ||
stream.cancel(savedError); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the real stream should be cancelled with savedError which was used to cancel the delayed stream.
That probably doesn't matter at all. A simple Status.CANCELLED.withDescription("cancelled before setStream") or "the stream was only started so it could be cancelled" would seem fine to me. The application won't end up seeing the status. When debugging, I would probably prefer to see the hard-coded status instead of savedError
as well, since the reason for the cancel is the limited API that required the start() in the first place.
This branch is for calling setStream() after cancel(), which is legit
Hmm... If it is legit, we have to make sure it doesn't happen frequently and could only impact a few streams. And it seems that doesn't hold for MetadataApplierImpl
and may not hold for DelayedClientTransport{,2}
. That means the fix could actually cause stability problems whereas the current code isn't causing any. Hmm...
As discussed in #1537, putting this on-hold. |
Resolves #1537
Also disallow cancel() before start().
DelayedClientTransport.shutdownNow() races with stream start(), thus it
shouldn't call cancel() directly. It would delay the cancellation until
the stream is started.