Skip to content

xdsclient: fix TestServerFailureMetrics_BeforeResponseRecv test to wait for watch to start before stopping the listener #8217

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Apr 3, 2025

Conversation

purnesh42H
Copy link
Contributor

@purnesh42H purnesh42H commented Apr 2, 2025

Fixes: #8211

TestServerFailureMetrics_BeforeResponseRecv was flaking because of race between listener watch to start and closing the listener and if watch hasn't started the stream errors won't be received and server failure metric won't be emitted. Adding the hook to wait for stream to open before stopping the listener have fixed the race.

RELEASE NOTES: None

Successful forge run

@purnesh42H purnesh42H added Type: Security A bug or other problem affecting security Type: Testing and removed Type: Security A bug or other problem affecting security labels Apr 2, 2025
@purnesh42H purnesh42H added this to the 1.72 Release milestone Apr 2, 2025
@purnesh42H purnesh42H requested a review from dfawley April 2, 2025 17:53
Copy link

codecov bot commented Apr 2, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 82.08%. Comparing base (5edab9e) to head (9d2ba7f).
Report is 1 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #8217      +/-   ##
==========================================
- Coverage   82.17%   82.08%   -0.10%     
==========================================
  Files         410      410              
  Lines       40236    40236              
==========================================
- Hits        33065    33028      -37     
- Misses       5822     5852      +30     
- Partials     1349     1356       +7     

see 22 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

…it for watch to start before stopping the listener
@purnesh42H purnesh42H force-pushed the fix-xds-flaky-xds-metric-test branch from c31cf42 to 42a7de2 Compare April 2, 2025 18:00
Comment on lines +227 to +228
// Restart to prevent the attempt to create a new ADS stream after back off.
lis.Restart()
Copy link
Contributor

@arjan-bal arjan-bal Apr 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code is still racy because the backoff could have completed before the listener is restarted. There is an option that can be used to set a really long backoff duration instead.

// StreamBackoffAfterFailure is the backoff function used to determine the
// backoff duration after stream failures.
// If unspecified, uses the default value used in non-test code.
StreamBackoffAfterFailure func(int) time.Duration

Copy link
Contributor Author

@purnesh42H purnesh42H Apr 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, in this particular test case it won't happen because we are setting the watch expiry timeout which is way shorter than the back off. So, watch will just expire before backoff anyway but before that we would received the stream error.

Comment on lines 222 to 226
select {
case <-ctx.Done():
t.Fatal("Timeout when waiting for ADS stream to close")
default:
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does ~nothing now - delete it?

And before, it should have been written as: if ctx.Err() != nil {...}

Copy link
Contributor Author

@purnesh42H purnesh42H Apr 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah not needed because both lis.Stop() and lis.Restart() are serialized by lock.

Replaced the similar select block in the other test with if ctx.Err() != nil {...}

@dfawley dfawley assigned purnesh42H and unassigned dfawley Apr 2, 2025
@purnesh42H purnesh42H merged commit 57a2605 into grpc:master Apr 3, 2025
15 checks passed
purnesh42H added a commit to purnesh42H/grpc-go that referenced this pull request Apr 13, 2025
…it for watch to start before stopping the listener (grpc#8217)
janardhanvissa pushed a commit to janardhanvissa/grpc-go that referenced this pull request Apr 23, 2025
…it for watch to start before stopping the listener (grpc#8217)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Flaky Test: Test/ServerFailureMetrics_BeforeResponseRecv
3 participants