Skip to content

core: DelayedStream cancels provided stream if not using it. #2618

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 3 commits into from

Conversation

zhangkun83
Copy link
Contributor

Resolves #1537

Also disallow cancel() before start().
DelayedClientTransport.shutdownNow() races with stream start(), thus it
shouldn't call cancel() directly. It would delay the cancellation until
the stream is started.

Resolves grpc#1537

Also disallow cancel() before start().
DelayedClientTransport.shutdownNow() races with stream start(), thus it
shouldn't call cancel() directly.  It would delay the cancellation until
the stream is started.
@carl-mastrangelo
Copy link
Contributor

@ejona86 can you look at this, it is blocking 1.1

Copy link
Member

@ejona86 ejona86 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. This looks great.

started = true;
savedPendingCancelReason = pendingCancelReason;
}
super.start(listener);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add a comment how this doesn't do any I/O if savedPendingCancelReady != null, since there won't be a real transport/stream.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does this matter?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we're immediately cancelling, it seems pretty weak to start if it is going to do I/O. Doing unnecessary I/O in presence of cancellation is generally a recipe of cascading failure. There are two things important here though: 1) the cancellation involved here is expected to be rare and only impact a few streams since it is only due to a race between the stream creation and the delayed client transport shutdown and 2) even when it happens, no I/O is done.

// ClientStream.cancel() must be called after start()
stream.start(NOOP_STREAM_LISTENER);
if (savedError != null) {
stream.cancel(savedError);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's not cancel with savedError. We're cancelling the stream because of the incorrect API usage, not because of a previous error.

At that point error itself can be deleted.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The misuse case is calling setStream() twice, which is the else branch.
This branch is for calling setStream() after cancel(), which is legit, and I think the real stream should be cancelled with savedError which was used to cancel the delayed stream.

Copy link
Member

@ejona86 ejona86 Feb 6, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the real stream should be cancelled with savedError which was used to cancel the delayed stream.

That probably doesn't matter at all. A simple Status.CANCELLED.withDescription("cancelled before setStream") or "the stream was only started so it could be cancelled" would seem fine to me. The application won't end up seeing the status. When debugging, I would probably prefer to see the hard-coded status instead of savedError as well, since the reason for the cancel is the limited API that required the start() in the first place.

This branch is for calling setStream() after cancel(), which is legit

Hmm... If it is legit, we have to make sure it doesn't happen frequently and could only impact a few streams. And it seems that doesn't hold for MetadataApplierImpl and may not hold for DelayedClientTransport{,2}. That means the fix could actually cause stability problems whereas the current code isn't causing any. Hmm...

Copy link
Contributor Author

@zhangkun83 zhangkun83 Feb 7, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For my own record, the issue is setStream() may be scheduled for an unbounded period of time into the future. For example, CallCredentials may take a long time to fetch the credentials before triggering setStream() (in MetadataApplierImpl), prior to which the RPC may be cancelled due to deadline-exceeded. In those cases, real stream are created before setStream() is called, only to be started (as required by cancel()) and then cancelled. start() typically involve I/O, e.g., sending headers, which are wasted. If the delay before setStream() is long enough, this will have a high chance to happen, unlike typical race conditions with low likelihood.

Copy link
Member

@ejona86 ejona86 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't feel comfortable yet with the start-to-just-cancel of the real streams currently present, because it seems it could happen en masse.

// ClientStream.cancel() must be called after start()
stream.start(NOOP_STREAM_LISTENER);
if (savedError != null) {
stream.cancel(savedError);
Copy link
Member

@ejona86 ejona86 Feb 6, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the real stream should be cancelled with savedError which was used to cancel the delayed stream.

That probably doesn't matter at all. A simple Status.CANCELLED.withDescription("cancelled before setStream") or "the stream was only started so it could be cancelled" would seem fine to me. The application won't end up seeing the status. When debugging, I would probably prefer to see the hard-coded status instead of savedError as well, since the reason for the cancel is the limited API that required the start() in the first place.

This branch is for calling setStream() after cancel(), which is legit

Hmm... If it is legit, we have to make sure it doesn't happen frequently and could only impact a few streams. And it seems that doesn't hold for MetadataApplierImpl and may not hold for DelayedClientTransport{,2}. That means the fix could actually cause stability problems whereas the current code isn't causing any. Hmm...

@zhangkun83
Copy link
Contributor Author

As discussed in #1537, putting this on-hold.

@zhangkun83 zhangkun83 closed this Feb 7, 2017
@lock lock bot locked as resolved and limited conversation to collaborators Jan 21, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants