-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add grpc_resolver using external discovery service #1498
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
partial review, but as I'm in meetings for the next N hours, wanted to give preliminary feedback
btw, why do you need to change |
hmmm. There seems to be discrepency error in travis yesterday but now that I checked again it was gone somehow. Will revert it back. |
Codecov Report
@@ Coverage Diff @@
## master #1498 +/- ##
==========================================
- Coverage 98.79% 98.79% -0.01%
==========================================
Files 189 190 +1
Lines 8986 9058 +72
==========================================
+ Hits 8878 8949 +71
- Misses 84 85 +1
Partials 24 24
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regarding the waitGroup. I think it makes sure we exit watcher goroutine before we exit the main program, so generally a good cleanup order? Otherwise the the main program could potentially exit before watcher finishes.
@yurishkuro Any further comments? |
connections = backendCount | ||
} | ||
|
||
makeSureConnectionsUp(t, connections, testc) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this should be called with backendCount
, not connections
. Or even better - it should be outside of the loop, since you're not restarting servers (which is why I suggested making it part of startTestServers, since the expectation of that function is that "servers are ready"). And you don't need to use testc with the load balancer for this health check.
@yurishkuro I'm not clear why we don't need to use testc for the health check. If we don't do a initial load balancing and make sure every server has been reached at least once, what's the point of having |
we do need to use a connection, but it can be a different connection constructed without load balancer, since you are giving it explicit peer. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not clear why we don't need to use testc for the health check.
we do need to use a connection, but it can be a different connection constructed without load balancer, since you are giving it explicit peer.
Are you referring to testc.EmptyCall
? I think we are populating peer instead of "giving". The peers will only be decided by resolver and minPeers
we passed to it.
@yurishkuro Any comment on my last reply? |
testc is created with your custom load balancer. A plain connection without load balancer is sufficient for server liveness checks. Connection with LB is only needed in the actual unit test. |
I'm not certain that's going to work. I tested the resolver without roundrobin balancer(i.e without Also taking a look at |
You have two rounds of network calls:
|
But still, load balancing at step 2 does not necessarily work even if the first step works. Cause now they are from two different connections? I tested the idea here but it some times still hits only 4 out of 5. |
Step 1 has nothing to do with step 2, it's preparation of servers and a check that they are alive. If step 2 fails after that (using a different connection) then it's either a bug in the LB or in the test, and makes it even more sense to use different connection vs. step 1. |
I tested against the original grpc-go roundrobin test they make by swapping the "making connection up" part with directly calling peers, it would fail. I posted the question on their github page and will continue on this once they replied. |
can you link that issue here? |
Ah, the explanation in that ticket makes sense - if you create a new connection for the LB test, you'd still need to ensure that it is connected to each peer before you can execute the test. We should add this to the comments in the test. |
sg. Will add some comment in. |
Signed-off-by: Jude Wang <judew@uber.com>
Signed-off-by: Jude Wang <judew@uber.com>
Signed-off-by: Jude Wang <judew@uber.com>
Signed-off-by: Jude Wang <judew@uber.com>
Signed-off-by: Jude Wang <judew@uber.com>
Signed-off-by: Jude Wang <judew@uber.com>
Signed-off-by: Jude Wang <judew@uber.com>
Signed-off-by: Jude Wang <judew@uber.com>
Signed-off-by: Jude Wang <judew@uber.com>
Signed-off-by: Jude Wang <judew@uber.com>
Signed-off-by: Jude Wang <judew@uber.com>
Signed-off-by: Jude Wang judew@uber.com
Which problem is this PR solving?
Short description of the changes