Skip to content

transport: Propagate status code on receiving RST_STREAM during message read #8289

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
May 12, 2025

Conversation

arjan-bal
Copy link
Contributor

@arjan-bal arjan-bal commented May 6, 2025

Fixes: #8281

RCA: #8281 (comment)

This PR uses a status error instead of an io.UnexpectedEOF to fail RPCs that have partially read a gRPC message.

RELEASE NOTES:

  • grpc: Fix bug that causes RPCs to fail with status INTERNAL instead of CANCELLED or DEADLINE_EXCEEDED when the client receives an RST_STREAM frame while reading a gRPC message.

Copy link

codecov bot commented May 6, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 82.27%. Comparing base (515f377) to head (4bf921e).
Report is 25 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #8289      +/-   ##
==========================================
+ Coverage   82.14%   82.27%   +0.13%     
==========================================
  Files         417      419       +2     
  Lines       41344    41992     +648     
==========================================
+ Hits        33961    34550     +589     
- Misses       5957     5982      +25     
- Partials     1426     1460      +34     
Files with missing lines Coverage Δ
internal/transport/http2_client.go 92.37% <100.00%> (-0.24%) ⬇️

... and 57 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@arjan-bal arjan-bal added this to the 1.72 Release milestone May 6, 2025
@arjan-bal arjan-bal requested review from easwars and dfawley May 6, 2025 09:44
@arjan-bal arjan-bal force-pushed the propagate-rst-code-on-unexpected-eof branch from b603bef to 78f39c7 Compare May 6, 2025 09:48
@arjan-bal arjan-bal changed the title grpc: Propagate rst code on receiving io.ErrUnexpectedEOF during message read grpc: Propagate RST_STREAM code on receiving io.ErrUnexpectedEOF during message read May 6, 2025
@arjan-bal arjan-bal force-pushed the propagate-rst-code-on-unexpected-eof branch from 78f39c7 to af8bb07 Compare May 6, 2025 09:56
@arjan-bal arjan-bal force-pushed the propagate-rst-code-on-unexpected-eof branch from af8bb07 to 6a24978 Compare May 6, 2025 13:36
@arjan-bal arjan-bal assigned arjan-bal and unassigned easwars and dfawley May 7, 2025
@@ -1242,7 +1242,8 @@ func (t *http2Client) handleRSTStream(f *http2.RSTStreamFrame) {
statusCode = codes.DeadlineExceeded
}
}
t.closeStream(s, io.EOF, false, http2.ErrCodeNo, status.Newf(statusCode, "stream terminated by RST_STREAM with error code: %v", f.ErrCode), nil, false)
st := status.Newf(statusCode, "stream terminated by RST_STREAM with error code: %v", f.ErrCode)
t.closeStream(s, st.Err(), false, http2.ErrCodeNo, st, nil, false)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dfawley I've changed the code to use a status error for unblocking the reader. This bypasses the conversion to io.ErrUnexpectedEOF and doesn't require reading the status in csAttempt.recvMsg().

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where is that conversion happening?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here:

grpc-go/rpc_util.go

Lines 674 to 680 in a43eba6

data, err := p.r.Read(int(length))
if err != nil {
if err == io.EOF {
err = io.ErrUnexpectedEOF
}
return 0, nil, err
}

@arjan-bal arjan-bal assigned dfawley and unassigned arjan-bal May 7, 2025
@@ -1242,7 +1242,8 @@ func (t *http2Client) handleRSTStream(f *http2.RSTStreamFrame) {
statusCode = codes.DeadlineExceeded
}
}
t.closeStream(s, io.EOF, false, http2.ErrCodeNo, status.Newf(statusCode, "stream terminated by RST_STREAM with error code: %v", f.ErrCode), nil, false)
st := status.Newf(statusCode, "stream terminated by RST_STREAM with error code: %v", f.ErrCode)
t.closeStream(s, st.Err(), false, http2.ErrCodeNo, st, nil, false)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where is that conversion happening?

case *http2.HeadersFrame:
// When the client creates a stream, write a partial gRPC
// message followed by an RST_STREAM.
go func() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this in a goroutine, but the ping ack is written inline?

Also...can we just not reply to the ping and then remove the lock?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed the ping handling and the mutex. Removed the separate goroutine used for writing data frames.
The removed things were unnecessary. I copied the loop that reads frames from an existing test and didn't remove the extra stuff.

for {
frame, err := sfr.ReadFrame()
if err != nil {
return
}
switch frame := frame.(type) {
case *http2.HeadersFrame:
// When the client creates a stream, violate the stream flow control.
go func() {
buf := make([]byte, http2MaxFrameLen)
for {
mu.Lock()
if err := sfr.WriteData(1, false, buf); err != nil {
mu.Unlock()
return
}
mu.Unlock()
// This for loop is capable of hogging the CPU and cause starvation
// in Go versions prior to 1.9,
// in single CPU environment. Explicitly relinquish processor.
runtime.Gosched()
}
}()
case *http2.RSTStreamFrame:
if frame.Header().StreamID != 1 || http2.ErrCode(frame.ErrCode) != http2.ErrCodeFlowControl {
t.Errorf("RST stream received with streamID: %d and code: %v, want streamID: 1 and code: http2.ErrCodeFlowControl", frame.Header().StreamID, http2.ErrCode(frame.ErrCode))
}
close(success)
return
case *http2.PingFrame:
mu.Lock()
sfr.WritePing(true, frame.Data)
mu.Unlock()
default:
}
}
}()

@dfawley dfawley assigned arjan-bal and unassigned dfawley May 7, 2025
arjan-bal and others added 2 commits May 9, 2025 09:31
Co-authored-by: Doug Fawley <dfawley@google.com>
@arjan-bal arjan-bal assigned dfawley and unassigned arjan-bal May 9, 2025
case *http2.RSTStreamFrame:
if frame.Header().StreamID != 1 || http2.ErrCode(frame.ErrCode) != http2.ErrCodeFlowControl {
t.Errorf("RST stream received with streamID: %d and code: %v, want streamID: 1 and code: http2.ErrCodeFlowControl", frame.Header().StreamID, http2.ErrCode(frame.ErrCode))
messageLen := 2048
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

const?

@dfawley dfawley removed their assignment May 9, 2025
@arjan-bal arjan-bal changed the title grpc: Propagate RST_STREAM code on receiving io.ErrUnexpectedEOF during message read grpc: Propagate status code on receiving RST_STREAM during message read May 12, 2025
@arjan-bal arjan-bal changed the title grpc: Propagate status code on receiving RST_STREAM during message read transport: Propagate status code on receiving RST_STREAM during message read May 12, 2025
@arjan-bal arjan-bal merged commit d36b02e into grpc:master May 12, 2025
15 checks passed
@arjan-bal arjan-bal deleted the propagate-rst-code-on-unexpected-eof branch May 12, 2025 17:53
@arjan-bal arjan-bal modified the milestones: 1.72 Release, 1.73 Release May 12, 2025
arjan-bal added a commit to arjan-bal/grpc-go that referenced this pull request May 13, 2025
@arjan-bal arjan-bal modified the milestones: 1.73 Release, 1.72 Release May 13, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

streaming: should return cancel error instead of Internal error with unexpected EOF
3 participants